The design and synthesis of novel genes and deoxyribonucleic acid (DNA) sequences is a central technique in synthetic biol- ogy. Current methods of high throughput gene synthesis use pooled oligonucleotides obtained from custom-designed DNA microarray chips, and rely on orthogonal (non-interacting) polymerase chain reaction primers to speciﬁcally de-multiplex, by ampliﬁcation, the precise subset of oligonucleotides necessary to assemble a full length gene. The availability of a large validated set of mutually orthogonal primers is therefore a crucial reagent for high-throughput gene synthesis. Here, we present a set of 166 20-nucleotide primers that are experimentally veriﬁed to be non-interacting, capable of specifying 13 695 unique genes. These primers represent a valuable resource to the synthetic biology community for specifying genetic components that can be assembled through a scalable and modular architecture. Key words: orthogonal primers; DNA assembly; modular genetic engineering; genetic circuit synthesis costs, while still permitting specific amplification of 1. Introduction thousands of genes, (ii) they should be ‘well-behaved’ polymerase Advances in high-throughput gene synthesis (1) and multiplex chain reaction (PCR) primers (that is, should not form dimers and automated genome engineering (MAGE) (2, 3) technologies have hairpins) and (iii) primer pairs must be mutually orthogonal or made it possible to assemble gene libraries and genomes using non-interacting, such that a given primer pair does not amplify custom oligonucleotide pools as starting reagents. A typical gene fragments specified by another pair, which could potentially inter- assembly workflow begins with gene-specific primer pairs that fere with the subsequent assembly steps. Here, we report a library are used to selectively amplify the precise subset of oligonucleoti- of 166 experimentally validated primers that satisfy these criteria, des constituting a single gene, and ends with their assembly to with the capacity to uniquely specify 13 695 genes. build the full-length gene (1)(Figure 1A). This workflow is imple- mented in parallel for the simultaneous assembly of thousands of genes, with each gene specified by a unique combination of 2. Materials and methods gene-specific primers. The set of gene-specific primers is thus 2.1 Algorithm parameters to design orthogonal primers crucial for deoxyribonucleic acid (DNA) assembly. The gene-specific primers have the following design con- Filter 1 retains sequences ending in G/C, with composition of A straints: (i) their total number should be small in order to minimize <45%, GC between 40 and 60%, C between 20 and 30%, Submitted: 23 June 2017; Received (in revised form): 17 November 2017; Accepted: 17 November 2017 V C The Author(s) 2018. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact firstname.lastname@example.org Downloaded from https://academic.oup.com/synbio/article-abstract/3/1/ysx008/4817474 by Ed 'DeepDyve' Gillespie user on 16 March 2018 2| Synthetic Biology, 2018, Vol. 3, No. 1 Figure 1. The need for a veriﬁed orthogonal primer set, illustrated for gene construction (A): Parts required to construct each of the N genes are speciﬁed by a unique pair of primers from the set of mutually orthogonal primers. All components of gene 2 are speciﬁcally ampliﬁed by the purple-orange primer pair, followed by (for instance) removal of the priming region using a type IIS restriction enzyme, and subsequent PCR overlap assembly to generate the full-length gene. (B) Schematic of the primer interaction matrix. The interaction between primer j and priming sequence for primer k is represented in pixel (j, k). The schematic contains 0.2% of the off- diagonal space for 100 primers marked randomly as interactions. (C) Representative interactions in B depicted in tree form, with branches connecting primers with similar interaction proﬁles. Orthogonal primers score above a dissimilarity threshold of 0.95. (D) The fraction of genetic components speciﬁable by the set of primers, as a function of inter-primer cross-talk. Contours connect points with equal coding capacity (denoted above the contour). With 185 primers, the coding capacity is 17 000 components with no cross-talk. Capacity drops to 10 000 with 0.16% primer cross-talk. annealing temperature for primers in PCR (T ) 58 C, and do and normalized such that the relative counts of jth barcode for not form hairpins/dimers involving 5 or more nucleotides (4). the jth amplicon (self-interaction) is 1000. The earlier network Sequences containing restriction enzyme recognition motifs elimination algorithm was used to identify the orthogonal pri- (BamH1, EcoR1, HindIII, and BsrD1) were removed. Other param- mer set. eters for Filter 1 and Filter 2 (network elimination) were kept unchanged from the original algorithm (4). 3. Results 2.2 Validating primer interactions A relatively small number of primers can specify a large number of genes if used combinatorially. For example, n primers can The 185 PCRs, one reaction per gene specific primer, were per- encode pairwise combinations (Number of ways to choose 2 formed with 30 rounds of amplification, 58 C annealing tempera- V components from a total of n components. Mathematically, it is ture and 10 s extension time using Q5 Hot Start High-Fidelity DNA 0:5nðÞ n 1 ), rather than only n/2 pairs if each primer is used polymerase system. Buffer and primer concentration (0.5 uM each) only once. The risk of the combinatorial approach, however, is were as recommended in the manual. All reactions had the same that the effects of cross-talk between primers are exaggerated template (the oligonucleotide pools at 7 pg/ul) and reverse primer since each primer is used with every other in combination. (CTTCTCCTTTACTAGTGAATTC). The amplicons were then inde- Unintended cross-talk would reduce the coding capacity—the pendently taken through a final PCR to added Illumina Truseq number of genes that can be uniquely specified by a primer set. adaptors (Figure 2B, binding at gray boxes) and sequenced on two While existing primer design algorithms attempt to minimize sequencing runs with the 300 cycle MiSeq Reagent kit. cross-talk, they have not been experimentally validated for orthogonality in PCR. We performed simulations to illustrate 2.3 Calculating the cross interaction matrix the impact of primer cross-talk on the coding capacity. The sim- The number of occurrences of each of 185 12-mer barcodes in ulation randomly assigns cross-talk between primers, graphi- each of the amplicons was calculated using custom shell scripts cally represented in a symmetric primer interaction matrix Downloaded from https://academic.oup.com/synbio/article-abstract/3/1/ysx008/4817474 by Ed 'DeepDyve' Gillespie user on 16 March 2018 S. K. Subramanian et al. | 3 Figure 2. Design and validation of orthogonal primers. (A) Flowchart to computationally design non-interacting primers, (B) Template oligonucleotide schematic. A unique barcode is associated with each orthogonal primer. The barcode is ampliﬁed by PCR with the orthogonal primer and a universal reverse primer (purple boxes). In a subsequent PCR, sequencing primers (which anneal to gray boxes) prepare the barcodes for sequencing. (C) Experimentally determined interaction matrix. Self-pri- ming of primers (principal diagonal) is set to 1. Signiﬁcant (>0.05) off-diagonal elements denoting interactions are highlighted as orange pixels. (D) Distribution of cross-talk, off-diagonal elements in D. (E) Number of primers in the orthogonal set, as a function of dis-similarity cut-off. We have chosen a cut-off value ¼ 0.95 for dis- cussion (F) Representation of the raw data in C, used to identify the set of orthogonal primers. (Figure 1B). Element (j, k) in the interaction matrix denotes the The simulations reveal two main findings. First, coding extent of amplification of primer binding site k when amplified capacity depends sensitively on the percentage of cross-talk by primer j. Row j thus denotes the amplicon profile of primer in the primer set; for example, in a 100 primer interaction j—that is, the relative frequencies of amplifying all primer bind- matrix, just ten significant off-diagonal pixels—0.2% cross ing sites for primer j. Orthogonal primer profiles can be identi- talk—excludes five primers from the orthogonal set, decreas- fied by first constructing a similarity tree (Figure 1C) in which ing coding capacity by 10%. Second, for a given number primers with similar amplicon profiles are clustered together, of specifiable genes (contours, Figure 1D), the allowable cross- and using a high dissimilarity score (0.95) as a threshold. talk reaches a maximum as a function of the number of Note that the interaction matrix can be asymmetric. primers used. Based on these observations, we decided to Although the binding profiles of primers are expected to be design and validate a set of 185 primers. Even with maximum symmetric (if primer j can anneal to primer k, primer k anneals allowable cross-talk, this primer set should have a coding to primer j with identical energetics), primer extension by DNA capacity of at least 10 000 genes, a value sufficient for the polymerase occurs only if the 3 end is double stranded. current capacity of high-throughput gene construction proj- Therefore, the 3 end of primer j annealing to the middle of pri- ects (Custom Array - Oligo Pools, http://www.customarrayinc. mer k will show a j ! k interaction but not a k ! j interaction. com/oligos_main.htm). Downloaded from https://academic.oup.com/synbio/article-abstract/3/1/ysx008/4817474 by Ed 'DeepDyve' Gillespie user on 16 March 2018 4| Synthetic Biology, 2018, Vol. 3, No. 1 3.1 Primer design Synthetic biology is the design and construction of biological systems to perform novel and useful functions. The synthetic Our algorithm for primer design is based on the DeLOB algo- biology community has contributed a large library of genetic rithm (4) that has been used to design orthogonal oligonucleoti- parts (7), including standardized libraries like BioBricks (8) and des for microarray hybridization. We modified the algorithm to YeastFab (9), encompassing a range of applications including generate 20-nucleotide primers, each with an annealing tem- biosensors (10) and programmable genetic circuits (11, 12). The perature of 58 C(Figure 2A). The algorithm starts by generating next step in engineering biological systems is to use existing 10 million random 20-mers followed by two sequential filters to parts to compose complex and novel functions. The orthogonal remove suboptimal primers. The first filter selects sequences primers reported here will enable experiments to specify gene with ‘good primer characteristics’ (see Section 2) including or DNA fragments for high-throughput gene construction and favorable annealing temperatures and low propensities to form to uniquely address genetic components (like biosensors and dimers and hairpins. The second filter—network elimination— genetic circuits) in a genetic ‘breadboard’ background (13). The uses BLAST (5) based pairwise sequence similarity scores to use of these primers can also be extended to orthogonal and exclude similar sequences. We experimentally tested the top modular CRISPR mediated gene regulation (14). 185 dissimilar sequences (Supplementary Table S1) for orthogonality. Supplementary data 3.2 Experimental validation Supplementary Data are available at SYNBIO Online. We designed and purchased a pool of template oligonucleotides (Custom Array - Oligo Pools, http://www.customarrayinc.com/ Acknowledgments oligos_main.htm), each containing a primer binding site for one We thank members of the Ranganathan Laboratory for dis- of the 185 gene-specific primers, an associated unique 12 cussions and critical reading of the manuscript. nucleotide barcode, a common reverse priming site, and flank- ing adaptors for Illumina sequencing. Using the oligonucleotide pool as template, we performed individual PCR reactions with Funding each of the 185 gene-specific primers and sequenced the ampli- Robert A. Welch Foundation [I-1366], the Lyda Hill con (28 000 reads each) by high-throughput sequencing to Endowment for Systems Biology, the Green Center for identify which of the 185 unique barcodes are amplified by each Systems Biology and the National Institutes of Health gene-specific primer. A binarized matrix of normalized interac- through the NIH Director’s Transformative Research Award tion profile for each primer is shown in Figure 2C (for unnormal- [RO1-GM123456 to R.R.]. ized counts, Supplementary Table S2) and the distribution of cross-talk pixels in Figure 2D. The interaction space (off-diago- Conﬂict of interest statement. None declared. nal elements of the interaction matrix) is sparse, with only a few primers amplifying templates corresponding to other pri- References mers. To identify orthogonal primers, we calculated a dissimi- larity score and generated the corresponding interaction tree 1. Kosuri,S., Eroshenko,N., Leproust,E.M., Super,M., Way,J., (Figure 2F, methods in Supplementary Information). A dissimi- Li,J.B. and Church,G.M. (2010) Scalable gene synthesis by larity threshold is used to identify orthogonal primers—primers selective ampliﬁcation of DNA pools from high-ﬁdelity above the threshold are directly assigned to the set of orthogo- microchips. Nat. Biotechnol., 28, 1295–1299. nal primers. Among the primers that were observed to interact 2. Wang,H.H., Isaacs,F.J., Carr,P.A., Sun,Z.Z., Xu,G., Forest,C.R. (below the dissimilarity threshold), we randomly chose one and Church,G.M. (2009) Programming cells by multiplex from each interacting clique to append to the orthogonal primer genome engineering and accelerated evolution. Nature, 460, set. The total number of orthogonal primers depends on the dis- 894–898. similarity threshold (Figure 2E). Using a threshold of 0.95 results 3. and Church,G.M. (2011) Multiplexed genome engineer- in a set of 166 orthogonal primers (Supplementary Table S1), ing and genotyping methods: applications for synthetic biol- modifying the dissimilarity threshold (MATLAB scripts in ogy and metabolic engineering. Methods Enzymol., 498, Supplementary Information) will change the number of orthog- 409–426. onal primers (Figure 2E). 4. Xu,Q., Schlabach,M.R., Hannon,G.J. and Elledge,S.J. (2009) Design of 240, 000 orthogonal 25mer DNA barcode probes. Proc. Natl. Acad. Sci. U S A, 106, 2289–2294. 4. Discussion 5. Altschul,S.F., Gish,W., Miller,W.T., Myers,E.W. and Here, we report a framework to design and validate a relatively Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. small orthogonal PCR primer library to specify a large number Biol., 215, 403–410. of genetic components. Our goal was to design a primer set to 6. Stifﬂer,M.A., Subramanian,S.K., Salinas,V.H. and encode a component library size of at least 10 000. We designed Ranganathan,R. (2016) A protocol for functional assessment and tested 185 primers (theoretical capacity of 17 020 compo- of whole-protein saturation mutagenesis libraries utilizing nents) for cross-talk and observed cross-talk in 0.11% of possible high-throughput sequencing. J. Vis. Exp. doi:10.3791/54119. combinations. From this, we identified 166 mutually orthogonal 7. Voigt,C.A. (2006) Genetic parts to program bacteria. Curr. Opin. primers, with a coding capacity of 13 695 components Biotechnol., 17, 548–557. (Supplementary Tables S1 and S2). This is the first report of a 8. Smolke,C.D. (2009) Building outside of the box: iGEM and the validated primer set with a coding capacity >100 genes, a BioBricks Foundation. Nat. Biotechnol., 27, 1099–1102. resource that should be broadly useful in high-throughput gene 9. Guo,Y., Dong,J., Zhou,T., Auxillos,J., Li,T., Zhang,W., Wang,L., synthesis and multiplexed screening of DNA libraries (6). Shen,Y., Luo,Y., Zheng,Y. et al. (2015) YeastFab: the design Downloaded from https://academic.oup.com/synbio/article-abstract/3/1/ysx008/4817474 by Ed 'DeepDyve' Gillespie user on 16 March 2018 S. K. Subramanian et al. | 5 12. Gupta,S., Bram,E.E. and Weiss,R. (2013) Genetically program- and construction of standard biological parts for metabolic engineering in Saccharomyces cerevisiae. Nucleic Acids Res., mable pathogen sense and destroy. ACS Synth. Biol., 2, 715–723. 43, e88. 13. Wei,X., Syed,A., Mao,P., Han,J. and Song,Y-A. (2016) Creating 10. Prindle,A., Samayoa,P., Razinkov,I., Danino,T., Tsimring,L.S. sub-50 nm nanoﬂuidic junctions in PDMS microﬂuidic chip and Hasty,J. (2012) A sensing array of radically coupled via self-assembly process of colloidal particles. J. Vis. Exp. doi: genetic “biopixels”. Nature, 481, 39–44. 10.3791/54145. 11. Padirac,A., Fujii,T. and Rondelez,Y. (2012) PNAS Plus: 14. Didovyk,A., Borek,B., Hasty,J. and Tsimring,L. (2016) bottom-up construction of in vitro switchable memories. Orthogonal modular gene repression in escherichia coli using Proc. Natl. Acad. Sci. U S A, 109, 4–6. engineered CRISPR/Cas9. ACS Synth. Biol., 5, 81–88. Downloaded from https://academic.oup.com/synbio/article-abstract/3/1/ysx008/4817474 by Ed 'DeepDyve' Gillespie user on 16 March 2018
Synthetic Biology – Oxford University Press
Published: Jan 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud