Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics

BEAGLE: An Application Programming Interface and High-Performance Computing Library for... Syst. Biol. 61(1):170–173, 2012 c The Author(s) 2011. Published by Oxford University Press on behalf of Society of Systematic Biologists. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI:10.1093/sysbio/syr100 Advance Access publication on October 1, 2011 BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics 1,∗ 2 3 4 D ANIEL L. AYRES , A ARON D ARLING , D ERRICK J. Z WICKL , P ETER B EERLI , 3 5 6 7 M ARK T. H OLDER , PAUL O. L EWIS , J OHN P. H UELSENBECK , F REDRIK R ONQUIST , 8 1 9,10 11,12,13 D AVID L. S WOFFORD , M ICHAEL P. C UMMINGS , A NDREW R AMBAUT , AND M ARC A. S UCHARD 1 2 Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA; Genome Center, University of 3 4 California, Davis, CA 95616, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Department of Scientific Computing, Florida State University, Tallahassee, FL 32306, USA; Department of Ecology and Evolutionary Biology, University of 6 7 Connecticut, Storrs, CT 06269, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Swedish Museum of Natural History, 114 18 Stockholm, Sweden; Center for Evolutionary Genomics, Institute for Genome Sciences & Policy, Duke University, Durham, NC 9 10 27708, USA; Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UK; E-mail: [email protected]; Fogarty International 11 12 13 Center, National Institutes of Health, Bethesda, MD 20892, USA; Department of Biomathematics; Department of Biostatistics; and Department of Human Genetics, University of California, Los Angeles, CA 90095, USA; E-mail: [email protected]; Correspondence to be sent to Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA; E-mail: [email protected]. Received 14 July 2011; reviews returned 6 September 2011; accepted 26 September 2011 Associate Editor: David Posada Abstract.—Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emer- gence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a com- mon library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently ex- ploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software. [Bayesian phylogenetics; GPU; maximum likelihood; parallel computing.] Most modern approaches to statistical phylogenetic for each site separately. The product of site likelihoods inference involve computing the probability of observed yields the likelihood for the alignment. In models that character data for a set of taxa given a phylogenetic include among-site rate variation via a finite mixture, model—often a tree and continuous-time Markov chain it is often possible to calculate conditional likelihoods model of character state evolution. Felsenstein (1981) given each rate category in parallel. Several other op- demonstrated an efficient algorithm to calculate this portunities for parallelism exist at a finer scale. probability, which is often referred to as the likelihood We have developed the software library BEAGLE: of the model. His algorithm recursively computes par- Broad-platform Evolutionary Analysis General Likeli- tial likelihoods via simple sums and products. These hood Evaluator. BEAGLE provides a uniform interface partial likelihoods track the probability of the observed for calculating phylogenetic likelihoods under a vari- data descended from an internal node conditional on ety of different phylogenetic models. The library im- a particular state at that internal node. A library that plements parallelism in the likelihood calculation on implements the calculations required by Felsenstein’s important emerging computer hardware technology, in- algorithm is appealing because this procedure accounts cluding graphics processing units (GPUs) and multicore for the majority of computing time in most likelihood- central processing units (CPUs). We intend for users to based phylogenetic operations. Furthermore, the algo- install the library as a shared resource to be used by any rithm offers opportunities for parallelization. phylogenetic software that supports the library. This In typical phylogenetic models, likelihood calcula- approach allows developers of phylogenetic software to tion operations assume independence at several levels. share any optimizations of the core calculations and any These independencies provide the opportunity to per- package that uses BEAGLE will automatically benefit form operations in parallel. For example, models often from the improvements to the library. For researchers, assume that sites in a sequence alignment evolve in- this centralization provides a single installation to dependently, so that one can compute the likelihood take advantage of new hardware and parallelization 170 2012 SOFTWARE FOR SYSTEMATICS AND EVOLUTION 171 techniques. We now describe the interface to the library are loaded directly if considering a nondiagonalizable and some details regarding its implementation. model or calculated in parallel from the eigen decompo- sition and edge lengths specified. This is performed within BEAGLE’s memory space to minimize data transfers. A single function call will then request one A PPLICATION P ROGRAMMING I NTERFACE or more integration operations to calculate partial like- Key Concepts lihoods over some or all nodes. The operations are performed in the order they are provided, typically dic- The key to BEAGLE performance lies in delivering tated by a postorder traversal of the tree topology. The fine-scale parallelization while minimizing data trans- client needs only specify nodes for which the partial fer and memory copy overhead. To accomplish this, the likelihoods need updating, but it is up to the calling library lacks the concept or data structure for a tree, software to keep track of these dependencies. The final in spite of the intended use for phylogenetic analysis. step in evaluating the phylogenetic model is done using Instead, BEAGLE acts directly on flexibly indexed data an API call that yields a single log likelihood for the storage (called buffers) for observed character states and model given the data. partial likelihoods. The client program can set the input Aspects of the BEAGLE API design support both buffers to reflect the data and can calculate the likeli- maximum likelihood (ML) and Bayesian phylogenetic hood of a particular phylogeny by invoking likelihood tree inference. For ML inference, API calls can calculate calculations on the appropriate input and output buffers first and second derivatives of the likelihood with re- in the correct order. Because of this design simplicity, spect to the lengths of edges (branches). In both cases, the library can support many different tree inference BEAGLE provides the ability to cache and reuse pre- algorithms and likelihood calculation on a variety of viously computed partial likelihood results, which can models. Arbitrary numbers of states can be used, as can yield a tremendous speedup over recomputing the en- nonreversible substitution matrices via complex eigen tire likelihood every time a new phylogenetic model is decompositions, and mixture models with multiple evaluated. rate categories and/or multiple eigen decompositions. Finally, BEAGLE application programming interface (API) calls can be asynchronous, allowing the calling program to implement other coarse-scale paralleliza- M ATERIALS AND M ETHODS tion schemes such as evaluating independent genes or The core BEAGLE library is implemented in C++ with running concurrent Markov chains. C and Java JNI interfaces. BEAGLE uses a runtime mod- ule loading system to load hardware-specific plugins (shared libraries) when suitable hardware is available. Current plugins implement BEAGLE on GPUs using Usage CUDA and OpenCL (in development), CPUs with vec- To use the library, a client program first creates an tor instructions using Streaming SIMD Extensions (SSE), instance of BEAGLE by calling beagleCreateInstance and multicore systems via OpenMP. BEAGLE is avail- (further API method names can be found in the able for Linux, Mac, and Windows operating systems documentation distributed with the library); multiple and is packaged with conventional installer methods instances per client are possible and encouraged. All for each. additional functions are called with a reference to this instance. The client program can optionally request that an instance run on certain hardware (e.g., a GPU) or GPU Implementation have particular features (e.g., double-precision math). Next, the client program must specify the data dimen- The GPU implementation of BEAGLE supports both sions and specify key aspects of the phylogenetic model. single- and double-precision arithmetic. Single preci- Character state data are then loaded and can be in the sion requires more frequent use of a rescaling scheme form of discrete observed states or partial likelihoods to avoid underflow but allows BEAGLE to run on a for ambiguous characters. The observed data are usually greater variety of graphics processors since initial gen- unchanging and loaded only once at the start to mini- erations of such hardware did not include support for mize memory copy overhead. The character data can be double-precision math. The GPU does fine-scale par- compressed into unique “site patterns” and associated allelization of the likelihood calculation, primarily by weights for each. The parameters of the substitution parallelizing across alignment sites, rate categories, and process can then be specified, including the equilibrium state values. Models such as amino acid (20 states) or state frequencies, the rates for one or more substitution codon models (64 states), therefore, permit a greater de- rate categories and their weights, and finally, the eigen gree of parallelization than nucleotide models (4 states) decomposition for the substitution process. and also yield the most notable speedups on GPU hard- In order to calculate the likelihood of a particular tree, ware (Suchard and Rambaut 2009). The CUDA kernels the client program then specifies a series of integration load using the CUDA driver API, which enables them operations that correspond to steps in Felsenstein’s algo- to be compiled at runtime and utilize features specific rithm. Finite-time transition probabilities for each edge to the particular hardware and CUDA version installed. 172 SYSTEMATIC BIOLOGY VOL. 61 Multiple GPUs can be seamlessly utilized simultane- ously via multiple BEAGLE instances. CPU-based Implementations In addition to a standard serial CPU implementation, BEAGLE includes two other CPU-based implemen- tations that exploit parallelism in different ways. An SSE implementation in double precision uses vector processing extensions present in many CPUs to paral- lelize computation across character state values. Single- precision SSE vectorization has not been a BEAGLE priority as other phylogenetic tools already provide this feature (Ronquist and Huelsenbeck 2003; Swof- ford 2003) and, so, is not yet available in BEAGLE. The OpenMP implementation uses multiple threads to par- allelize computation across rate categories. Although finer-scale parallelization, equivalent to that achieved for GPU devices, could be attempted, it is unlikely to yield significant speedups due to the thread synchro- nization overhead in the OpenMP model. E XAMPLE Program Speedups Currently, three popular phylogenetic software packages interface with BEAGLE: MrBayes (Ronquist and Huelsenbeck 2003) and BEAST (Drummond and Rambaut 2007), which use Bayesian inference, and GARLI (Zwickl 2006), which uses an ML approach. We benchmarked each of these programs to compare the speed of their native likelihood calculators to the BEAGLE implementations. In order to better exploit the parallelism offered by the GPU implementation, we used a data set with a large number of alignment sites and ran it under both nucleotide and codon models. More specifically, the data set used had 15 taxa and 18,792 nucleotide columns, 8558 of which were unique; for the codon model, 6080 of the 6264 site patterns were unique. This data set was a subset of a larger arthro- pod data set (Regier et al. 2010). We performed these benchmarks on a standard desktop PC with a 2.9 GHz Intel Core i7-930 CPU and 6 GB of 1.6 GHz DDR3 RAM. The PC was equipped with an NVIDIA GTX 580 GPU, with 1.5 GB of RAM and 512 processing cores running at 1.5 GHz. Figure 1 shows runtime speedups for each program when using BEAGLE CPU, SSE, and GPU implementations under nucleotide and codon models. For the GPU implementation, we also benchmarked in single-precision mode. Reported speedups are rel- ative to the runtime when using the native sequential CPU implementation of each program. We note that FIGURE 1. Performance using the BEAGLE library relative to the the GARLI interface with BEAGLE is not fully opti- native sequential CPU implementations of phylogenetic analysis pro- mized. Although we expect that further integration grams GARLI, MrBayes, and BEAST. Speedup factors are on a log scale. work will produce positive results, in our tests, only the GPU implementation achieved effective speedups. We have thus omitted the results from the CPU-based implementations. 2012 SOFTWARE FOR SYSTEMATICS AND EVOLUTION 173 For the BEAGLE GPU implementation, we observe BEAGLE is freely available from http://beagle-lib. significant speedups across all programs. The speedups googlecode.com under the GNU Lesser General Pub- are largest under the codon models, as they allow for lic License and new collaborators are welcome. better utilization of the GPU cores. We also observe the higher performance cost of double-precision calcula- F UNDING tion on the GPU relative to single precision. Overall, the highest speedup is 71-fold, for the BEAGLE GPU This work was supported by the National Science single-precision implementation when compared with Foundation [grant numbers DBI-0755048, DEB-0732920, the BEAST native implementation, under the codon DEB-1036448, DMS-0931642, EF-0331495, EF-0905606, model. EF-0949453]; the National Institutes of Health [grant We note that not every analysis run on a GPU will numbers R01-HG006139, R01-GM037841, R01-GM078985, achieve the same speedups we report, and, in some cir- R01-GM086887, R01-NS063897]; the Biotechnology and cumstances, using the BEAGLE GPU implementation Biological Sciences Research Council [grant number may result in a slower overall runtime than using a BB/H011285/1]; the Wellcome Trust [grant number CPU implementation. Several factors affect the relative WT092807MA]; and Google Summer of Code. performance. Beyond state-space size and numerical precision, the number of unique alignment columns A CKNOWLEDGMENTS and the hardware specifications of the GPU, espe- cially numbers of cores and memory bandwidth, are Some of the development of BEAGLE occurred at important factors. We recommend that users first as- meetings at the University of Maryland, the National sess the relative performance of the GPU implementa- Evolutionary Synthesis Center through a working group tion with their setup by performing short comparative (Software for Bayesian Evolutionary Analysis), the runs, which specify a smaller chain length or fewer Mathematical Biosciences Institute at The Ohio State generations. University, and several instances of the Workshop on Molecular Evolution (Europe and North America). We thank Jerome Regier for providing the data set used in C ONCLUSION evaluating the programs. BEAGLE is an API and library for high-performance evaluation of phylogenetic likelihoods. The API pro- R EFERENCES vides a uniform interface for performing calculations on an expanding variety of computer hardware platforms Drummond A.J., Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7:214. including GPUs, multicore CPUs, and SSE vectoriza- Felsenstein J. 1981. Evolutionary trees from DNA sequences: a maxi- tion. On GPUs, the library provides novel algorithms mum likelihood approach. J. Mol. Evol. 17:368–376. and methods for evaluating likelihoods under arbi- Regier J., Shultz J., Zwick A., Hussey A., Ball B., Wetzer R., Martin J., Cunningham C. 2010. Arthropod relationships revealed trary molecular evolutionary models, harnessing the by phylogenomic analysis of nuclear protein-coding sequences. Na- large number of processing cores to efficiently paral- ture. 463:1079–1083. lelize calculations. Current results show speedups of Ronquist F., Huelsenbeck J.P. 2003. Mr Bayes 3: Bayesian phylogenetic up to 71-fold on a single GPU over CPU-based likeli- inference under mixed models. Bioinformatics. 19:1572–1574. Suchard M.A., Rambaut A. 2009. Many-core algorithms for statistical hood calculators. BEAGLE is currently integrated with phylogenetics. Bioinformatics. 25:1370–1376. three state-of-the-art phylogenetic software packages: Swofford D.L. 2003. PAUP*: phylogenetic analysis using parsimony MrBayes, BEAST, and GARLI, and compatible with (* and other methods). Version 4. Sunderland (MA): Sinauer Asso- ciates. many more. Forthcoming extensions include OpenCL Zwickl D.J. 2006. Genetic algorithm approaches for the phylogenetic support, single-precision SSE vectorization, improved analysis of large biological sequence datasets under the maximum performance for highly partitioned data sets, and addi- likelihood criterion [PhD dissertation]. Austin (TX): University of tional high-level language wrappers, such as Python. Texas. p. 1–115. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Systematic Biology Pubmed Central

BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics

Loading next page...
 
/lp/pubmed-central/beagle-an-application-programming-interface-and-high-performance-0Vv6NavGJH

References (9)

Publisher
Pubmed Central
Copyright
© The Author(s) 2011. Published by Oxford University Press on behalf of the Society of Systematic Biologists.
ISSN
1063-5157
eISSN
1076-836X
DOI
10.1093/sysbio/syr100
Publisher site
See Article on Publisher Site

Abstract

Syst. Biol. 61(1):170–173, 2012 c The Author(s) 2011. Published by Oxford University Press on behalf of Society of Systematic Biologists. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. DOI:10.1093/sysbio/syr100 Advance Access publication on October 1, 2011 BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics 1,∗ 2 3 4 D ANIEL L. AYRES , A ARON D ARLING , D ERRICK J. Z WICKL , P ETER B EERLI , 3 5 6 7 M ARK T. H OLDER , PAUL O. L EWIS , J OHN P. H UELSENBECK , F REDRIK R ONQUIST , 8 1 9,10 11,12,13 D AVID L. S WOFFORD , M ICHAEL P. C UMMINGS , A NDREW R AMBAUT , AND M ARC A. S UCHARD 1 2 Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA; Genome Center, University of 3 4 California, Davis, CA 95616, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Department of Scientific Computing, Florida State University, Tallahassee, FL 32306, USA; Department of Ecology and Evolutionary Biology, University of 6 7 Connecticut, Storrs, CT 06269, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Swedish Museum of Natural History, 114 18 Stockholm, Sweden; Center for Evolutionary Genomics, Institute for Genome Sciences & Policy, Duke University, Durham, NC 9 10 27708, USA; Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UK; E-mail: [email protected]; Fogarty International 11 12 13 Center, National Institutes of Health, Bethesda, MD 20892, USA; Department of Biomathematics; Department of Biostatistics; and Department of Human Genetics, University of California, Los Angeles, CA 90095, USA; E-mail: [email protected]; Correspondence to be sent to Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA; E-mail: [email protected]. Received 14 July 2011; reviews returned 6 September 2011; accepted 26 September 2011 Associate Editor: David Posada Abstract.—Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emer- gence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a com- mon library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently ex- ploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software. [Bayesian phylogenetics; GPU; maximum likelihood; parallel computing.] Most modern approaches to statistical phylogenetic for each site separately. The product of site likelihoods inference involve computing the probability of observed yields the likelihood for the alignment. In models that character data for a set of taxa given a phylogenetic include among-site rate variation via a finite mixture, model—often a tree and continuous-time Markov chain it is often possible to calculate conditional likelihoods model of character state evolution. Felsenstein (1981) given each rate category in parallel. Several other op- demonstrated an efficient algorithm to calculate this portunities for parallelism exist at a finer scale. probability, which is often referred to as the likelihood We have developed the software library BEAGLE: of the model. His algorithm recursively computes par- Broad-platform Evolutionary Analysis General Likeli- tial likelihoods via simple sums and products. These hood Evaluator. BEAGLE provides a uniform interface partial likelihoods track the probability of the observed for calculating phylogenetic likelihoods under a vari- data descended from an internal node conditional on ety of different phylogenetic models. The library im- a particular state at that internal node. A library that plements parallelism in the likelihood calculation on implements the calculations required by Felsenstein’s important emerging computer hardware technology, in- algorithm is appealing because this procedure accounts cluding graphics processing units (GPUs) and multicore for the majority of computing time in most likelihood- central processing units (CPUs). We intend for users to based phylogenetic operations. Furthermore, the algo- install the library as a shared resource to be used by any rithm offers opportunities for parallelization. phylogenetic software that supports the library. This In typical phylogenetic models, likelihood calcula- approach allows developers of phylogenetic software to tion operations assume independence at several levels. share any optimizations of the core calculations and any These independencies provide the opportunity to per- package that uses BEAGLE will automatically benefit form operations in parallel. For example, models often from the improvements to the library. For researchers, assume that sites in a sequence alignment evolve in- this centralization provides a single installation to dependently, so that one can compute the likelihood take advantage of new hardware and parallelization 170 2012 SOFTWARE FOR SYSTEMATICS AND EVOLUTION 171 techniques. We now describe the interface to the library are loaded directly if considering a nondiagonalizable and some details regarding its implementation. model or calculated in parallel from the eigen decompo- sition and edge lengths specified. This is performed within BEAGLE’s memory space to minimize data transfers. A single function call will then request one A PPLICATION P ROGRAMMING I NTERFACE or more integration operations to calculate partial like- Key Concepts lihoods over some or all nodes. The operations are performed in the order they are provided, typically dic- The key to BEAGLE performance lies in delivering tated by a postorder traversal of the tree topology. The fine-scale parallelization while minimizing data trans- client needs only specify nodes for which the partial fer and memory copy overhead. To accomplish this, the likelihoods need updating, but it is up to the calling library lacks the concept or data structure for a tree, software to keep track of these dependencies. The final in spite of the intended use for phylogenetic analysis. step in evaluating the phylogenetic model is done using Instead, BEAGLE acts directly on flexibly indexed data an API call that yields a single log likelihood for the storage (called buffers) for observed character states and model given the data. partial likelihoods. The client program can set the input Aspects of the BEAGLE API design support both buffers to reflect the data and can calculate the likeli- maximum likelihood (ML) and Bayesian phylogenetic hood of a particular phylogeny by invoking likelihood tree inference. For ML inference, API calls can calculate calculations on the appropriate input and output buffers first and second derivatives of the likelihood with re- in the correct order. Because of this design simplicity, spect to the lengths of edges (branches). In both cases, the library can support many different tree inference BEAGLE provides the ability to cache and reuse pre- algorithms and likelihood calculation on a variety of viously computed partial likelihood results, which can models. Arbitrary numbers of states can be used, as can yield a tremendous speedup over recomputing the en- nonreversible substitution matrices via complex eigen tire likelihood every time a new phylogenetic model is decompositions, and mixture models with multiple evaluated. rate categories and/or multiple eigen decompositions. Finally, BEAGLE application programming interface (API) calls can be asynchronous, allowing the calling program to implement other coarse-scale paralleliza- M ATERIALS AND M ETHODS tion schemes such as evaluating independent genes or The core BEAGLE library is implemented in C++ with running concurrent Markov chains. C and Java JNI interfaces. BEAGLE uses a runtime mod- ule loading system to load hardware-specific plugins (shared libraries) when suitable hardware is available. Current plugins implement BEAGLE on GPUs using Usage CUDA and OpenCL (in development), CPUs with vec- To use the library, a client program first creates an tor instructions using Streaming SIMD Extensions (SSE), instance of BEAGLE by calling beagleCreateInstance and multicore systems via OpenMP. BEAGLE is avail- (further API method names can be found in the able for Linux, Mac, and Windows operating systems documentation distributed with the library); multiple and is packaged with conventional installer methods instances per client are possible and encouraged. All for each. additional functions are called with a reference to this instance. The client program can optionally request that an instance run on certain hardware (e.g., a GPU) or GPU Implementation have particular features (e.g., double-precision math). Next, the client program must specify the data dimen- The GPU implementation of BEAGLE supports both sions and specify key aspects of the phylogenetic model. single- and double-precision arithmetic. Single preci- Character state data are then loaded and can be in the sion requires more frequent use of a rescaling scheme form of discrete observed states or partial likelihoods to avoid underflow but allows BEAGLE to run on a for ambiguous characters. The observed data are usually greater variety of graphics processors since initial gen- unchanging and loaded only once at the start to mini- erations of such hardware did not include support for mize memory copy overhead. The character data can be double-precision math. The GPU does fine-scale par- compressed into unique “site patterns” and associated allelization of the likelihood calculation, primarily by weights for each. The parameters of the substitution parallelizing across alignment sites, rate categories, and process can then be specified, including the equilibrium state values. Models such as amino acid (20 states) or state frequencies, the rates for one or more substitution codon models (64 states), therefore, permit a greater de- rate categories and their weights, and finally, the eigen gree of parallelization than nucleotide models (4 states) decomposition for the substitution process. and also yield the most notable speedups on GPU hard- In order to calculate the likelihood of a particular tree, ware (Suchard and Rambaut 2009). The CUDA kernels the client program then specifies a series of integration load using the CUDA driver API, which enables them operations that correspond to steps in Felsenstein’s algo- to be compiled at runtime and utilize features specific rithm. Finite-time transition probabilities for each edge to the particular hardware and CUDA version installed. 172 SYSTEMATIC BIOLOGY VOL. 61 Multiple GPUs can be seamlessly utilized simultane- ously via multiple BEAGLE instances. CPU-based Implementations In addition to a standard serial CPU implementation, BEAGLE includes two other CPU-based implemen- tations that exploit parallelism in different ways. An SSE implementation in double precision uses vector processing extensions present in many CPUs to paral- lelize computation across character state values. Single- precision SSE vectorization has not been a BEAGLE priority as other phylogenetic tools already provide this feature (Ronquist and Huelsenbeck 2003; Swof- ford 2003) and, so, is not yet available in BEAGLE. The OpenMP implementation uses multiple threads to par- allelize computation across rate categories. Although finer-scale parallelization, equivalent to that achieved for GPU devices, could be attempted, it is unlikely to yield significant speedups due to the thread synchro- nization overhead in the OpenMP model. E XAMPLE Program Speedups Currently, three popular phylogenetic software packages interface with BEAGLE: MrBayes (Ronquist and Huelsenbeck 2003) and BEAST (Drummond and Rambaut 2007), which use Bayesian inference, and GARLI (Zwickl 2006), which uses an ML approach. We benchmarked each of these programs to compare the speed of their native likelihood calculators to the BEAGLE implementations. In order to better exploit the parallelism offered by the GPU implementation, we used a data set with a large number of alignment sites and ran it under both nucleotide and codon models. More specifically, the data set used had 15 taxa and 18,792 nucleotide columns, 8558 of which were unique; for the codon model, 6080 of the 6264 site patterns were unique. This data set was a subset of a larger arthro- pod data set (Regier et al. 2010). We performed these benchmarks on a standard desktop PC with a 2.9 GHz Intel Core i7-930 CPU and 6 GB of 1.6 GHz DDR3 RAM. The PC was equipped with an NVIDIA GTX 580 GPU, with 1.5 GB of RAM and 512 processing cores running at 1.5 GHz. Figure 1 shows runtime speedups for each program when using BEAGLE CPU, SSE, and GPU implementations under nucleotide and codon models. For the GPU implementation, we also benchmarked in single-precision mode. Reported speedups are rel- ative to the runtime when using the native sequential CPU implementation of each program. We note that FIGURE 1. Performance using the BEAGLE library relative to the the GARLI interface with BEAGLE is not fully opti- native sequential CPU implementations of phylogenetic analysis pro- mized. Although we expect that further integration grams GARLI, MrBayes, and BEAST. Speedup factors are on a log scale. work will produce positive results, in our tests, only the GPU implementation achieved effective speedups. We have thus omitted the results from the CPU-based implementations. 2012 SOFTWARE FOR SYSTEMATICS AND EVOLUTION 173 For the BEAGLE GPU implementation, we observe BEAGLE is freely available from http://beagle-lib. significant speedups across all programs. The speedups googlecode.com under the GNU Lesser General Pub- are largest under the codon models, as they allow for lic License and new collaborators are welcome. better utilization of the GPU cores. We also observe the higher performance cost of double-precision calcula- F UNDING tion on the GPU relative to single precision. Overall, the highest speedup is 71-fold, for the BEAGLE GPU This work was supported by the National Science single-precision implementation when compared with Foundation [grant numbers DBI-0755048, DEB-0732920, the BEAST native implementation, under the codon DEB-1036448, DMS-0931642, EF-0331495, EF-0905606, model. EF-0949453]; the National Institutes of Health [grant We note that not every analysis run on a GPU will numbers R01-HG006139, R01-GM037841, R01-GM078985, achieve the same speedups we report, and, in some cir- R01-GM086887, R01-NS063897]; the Biotechnology and cumstances, using the BEAGLE GPU implementation Biological Sciences Research Council [grant number may result in a slower overall runtime than using a BB/H011285/1]; the Wellcome Trust [grant number CPU implementation. Several factors affect the relative WT092807MA]; and Google Summer of Code. performance. Beyond state-space size and numerical precision, the number of unique alignment columns A CKNOWLEDGMENTS and the hardware specifications of the GPU, espe- cially numbers of cores and memory bandwidth, are Some of the development of BEAGLE occurred at important factors. We recommend that users first as- meetings at the University of Maryland, the National sess the relative performance of the GPU implementa- Evolutionary Synthesis Center through a working group tion with their setup by performing short comparative (Software for Bayesian Evolutionary Analysis), the runs, which specify a smaller chain length or fewer Mathematical Biosciences Institute at The Ohio State generations. University, and several instances of the Workshop on Molecular Evolution (Europe and North America). We thank Jerome Regier for providing the data set used in C ONCLUSION evaluating the programs. BEAGLE is an API and library for high-performance evaluation of phylogenetic likelihoods. The API pro- R EFERENCES vides a uniform interface for performing calculations on an expanding variety of computer hardware platforms Drummond A.J., Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7:214. including GPUs, multicore CPUs, and SSE vectoriza- Felsenstein J. 1981. Evolutionary trees from DNA sequences: a maxi- tion. On GPUs, the library provides novel algorithms mum likelihood approach. J. Mol. Evol. 17:368–376. and methods for evaluating likelihoods under arbi- Regier J., Shultz J., Zwick A., Hussey A., Ball B., Wetzer R., Martin J., Cunningham C. 2010. Arthropod relationships revealed trary molecular evolutionary models, harnessing the by phylogenomic analysis of nuclear protein-coding sequences. Na- large number of processing cores to efficiently paral- ture. 463:1079–1083. lelize calculations. Current results show speedups of Ronquist F., Huelsenbeck J.P. 2003. Mr Bayes 3: Bayesian phylogenetic up to 71-fold on a single GPU over CPU-based likeli- inference under mixed models. Bioinformatics. 19:1572–1574. Suchard M.A., Rambaut A. 2009. Many-core algorithms for statistical hood calculators. BEAGLE is currently integrated with phylogenetics. Bioinformatics. 25:1370–1376. three state-of-the-art phylogenetic software packages: Swofford D.L. 2003. PAUP*: phylogenetic analysis using parsimony MrBayes, BEAST, and GARLI, and compatible with (* and other methods). Version 4. Sunderland (MA): Sinauer Asso- ciates. many more. Forthcoming extensions include OpenCL Zwickl D.J. 2006. Genetic algorithm approaches for the phylogenetic support, single-precision SSE vectorization, improved analysis of large biological sequence datasets under the maximum performance for highly partitioned data sets, and addi- likelihood criterion [PhD dissertation]. Austin (TX): University of tional high-level language wrappers, such as Python. Texas. p. 1–115.

Journal

Systematic BiologyPubmed Central

Published: Oct 1, 2011

There are no references for this article.