Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You and Your Team.

Learn More →

PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing PLDA+: Parallel Latent Dirichlet Allocation with Data Placement and Pipeline Processing ZHIYUAN LIU, YUZHOU ZHANG, and EDWARD Y. CHANG, Google Inc. MAOSONG SUN, Tsinghua University Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies: data placement, pipeline processing, word bundling, and priority-based scheduling. Experiments show that our strategies signi cantly reduce the unparallelizable communication bottleneck and achieve good load balancing, and hence improve scalability of LDA. Categories and Subject Descriptors: G.3 [Mathematics of Computing]: Probability and Statistics ”Probabilistic algorithms; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval ” Clustering; I.2.7 [Arti cial Intelligence]: Natural Language Processing ”Text analysis General Terms: Algorithms Additional Key Words and Phrases: Topic models, Gibbs sampling, latent Dirichlet allocation, distributed parallel computations ACM Reference Format: Liu, Z., Zhang, Y., Chang, E. Y., and Sun, M. 2011. PLDA+: Parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. 2, 3, Article 26 (April 2011), 18 pages. DOI = 10.1145/1961189.1961198 http://doi.acm.org/10.1145/1961189.1961198 1. INTRODUCTION Latent Dirichlet Allocation (LDA) was rst proposed by Blei et al. [2003] to model documents. Each document is modeled as a mixture of K http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Intelligent Systems and Technology (TIST) Association for Computing Machinery

PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

Loading next page...
 
/lp/association-for-computing-machinery/plda-parallel-latent-dirichlet-allocation-with-data-placement-and-QBLqo7lx2o
Publisher
Association for Computing Machinery
Copyright
Copyright © 2011 by ACM Inc.
ISSN
2157-6904
DOI
10.1145/1961189.1961198
Publisher site
See Article on Publisher Site

Abstract

PLDA+: Parallel Latent Dirichlet Allocation with Data Placement and Pipeline Processing ZHIYUAN LIU, YUZHOU ZHANG, and EDWARD Y. CHANG, Google Inc. MAOSONG SUN, Tsinghua University Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies: data placement, pipeline processing, word bundling, and priority-based scheduling. Experiments show that our strategies signi cantly reduce the unparallelizable communication bottleneck and achieve good load balancing, and hence improve scalability of LDA. Categories and Subject Descriptors: G.3 [Mathematics of Computing]: Probability and Statistics ”Probabilistic algorithms; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval ” Clustering; I.2.7 [Arti cial Intelligence]: Natural Language Processing ”Text analysis General Terms: Algorithms Additional Key Words and Phrases: Topic models, Gibbs sampling, latent Dirichlet allocation, distributed parallel computations ACM Reference Format: Liu, Z., Zhang, Y., Chang, E. Y., and Sun, M. 2011. PLDA+: Parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. 2, 3, Article 26 (April 2011), 18 pages. DOI = 10.1145/1961189.1961198 http://doi.acm.org/10.1145/1961189.1961198 1. INTRODUCTION Latent Dirichlet Allocation (LDA) was rst proposed by Blei et al. [2003] to model documents. Each document is modeled as a mixture of K

Journal

ACM Transactions on Intelligent Systems and Technology (TIST)Association for Computing Machinery

Published: Apr 1, 2011

There are no references for this article.