An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experimentsDuitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R.; Verstrepen, Kevin J.; Thevelein, Johan M.; Tohme, Joe
doi: 10.1093/nar/gkt1381pmid: 24413664
Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.
Bisulfighter: accurate detection of methylated cytosines and differentially methylated regionsSaito, Yutaka; Tsuji, Junko; Mituyama, Toutai
doi: 10.1093/nar/gkt1373pmid: 24423865
Analysis of bisulfite sequencing data usually requires two tasks: to call methylated cytosines (mCs) in a sample, and to detect differentially methylated regions (DMRs) between paired samples. Although numerous tools have been proposed for mC calling, methods for DMR detection have been largely limited. Here, we present Bisulfighter, a new software package for detecting mCs and DMRs from bisulfite sequencing data. Bisulfighter combines the LAST alignment tool for mC calling, and a novel framework for DMR detection based on hidden Markov models (HMMs). Unlike previous attempts that depend on empirical parameters, Bisulfighter can use the expectation-maximization algorithm for HMMs to adjust parameters for each data set. We conduct extensive experiments in which accuracy of mC calling and DMR detection is evaluated on simulated data with various mC contexts, read qualities, sequencing depths and DMR lengths, as well as on real data from a wide range of biological processes. We demonstrate that Bisulfighter consistently achieves better accuracy than other published tools, providing greater sensitivity for mCs with fewer false positives, more precise estimates of mC levels, more exact locations of DMRs and better agreement of DMRs with gene expression and DNase I hypersensitivity. The source code is available at http://epigenome.cbrc.jp/bisulfighter.
Purification, characterization and crystallization of the human 80S ribosomeKhatter, Heena; Myasnikov, Alexander G.; Mastio, Leslie; Billas, Isabelle M. L.; Birck, Catherine; Stella, Stefano; Klaholz, Bruno P.
doi: 10.1093/nar/gkt1404pmid: 24452798
Ribosomes are key macromolecular protein synthesis machineries in the cell. Human ribosomes have so far not been studied to atomic resolution because of their particularly complex structure as compared with other eukaryotic or prokaryotic ribosomes, and they are difficult to prepare to high homogeneity, which is a key requisite for high-resolution structural work. We established a purification protocol for human 80S ribosomes isolated from HeLa cells that allows obtaining large quantities of homogenous samples as characterized by biophysical methods using analytical ultracentrifugation and multiangle laser light scattering. Samples prepared under different conditions were characterized by direct single particle imaging using cryo electron microscopy, which helped optimizing the preparation protocol. From a small data set, a 3D reconstruction at subnanometric resolution was obtained showing all prominent structural features of the human ribosome, and revealing a salt concentration dependence of the presence of the exit site tRNA, which we show is critical for obtaining crystals. With these well-characterized samples first human 80S ribosome crystals were obtained from several crystallization conditions in capillaries and sitting drops, which diffract to 26 Å resolution at cryo temperatures and for which the crystallographic parameters were determined, paving the way for future high-resolution work.
Synthetic biology tools for programming gene expression without nutritional perturbations in Saccharomyces cerevisiaeMcIsaac, R. Scott; Gibney, Patrick A.; Chandran, Sunil S.; Benjamin, Kirsten R.; Botstein, David
doi: 10.1093/nar/gkt1402pmid: 24445804
A conditional gene expression system that is fast-acting, is tunable and achieves single-gene specificity was recently developed for yeast. A gene placed directly downstream of a modified GAL1 promoter containing six Zif268 binding sequences (with single nucleotide spacing) was shown to be selectively inducible in the presence of β-estradiol, so long as cells express the artificial transcription factor, Z3EV (a fusion of the Zif268 DNA binding domain, the ligand binding domain of the human estrogen receptor and viral protein 16). We show the strength of Z3EV-responsive promoters can be modified using straightforward design principles. By moving Zif268 binding sites toward the transcription start site, expression output can be nearly doubled. Despite the reported requirement of estrogen receptor dimerization for hormone-dependent activation, a single binding site suffices for target gene activation. Target gene expression levels correlate with promoter binding site copy number and we engineer a set of inducible promoter chassis with different input–output characteristics. Finally, the coupling between inducer identity and gene activation is flexible: the ligand specificity of Z3EV can be re-programmed to respond to a non-hormone small molecule with only five amino acid substitutions in the human estrogen receptor domain, which may prove useful for industrial applications.
LORD-Q: a long-run real-time PCR-based DNA-damage quantification method for nuclear and mitochondrial genome analysisLehle, Simon; Hildebrand, Dominic G.; Merz, Britta; Malak, Peter N.; Becker, Michael S.; Schmezer, Peter; Essmann, Frank; Schulze-Osthoff, Klaus; Rothfuss, Oliver
doi: 10.1093/nar/gkt1349pmid: 24371283
DNA damage is tightly associated with various biological and pathological processes, such as aging and tumorigenesis. Although detection of DNA damage is attracting increasing attention, only a limited number of methods are available to quantify DNA lesions, and these techniques are tedious or only detect global DNA damage. In this study, we present a high-sensitivity long-run real-time PCR technique for DNA-damage quantification (LORD-Q) in both the mitochondrial and nuclear genome. While most conventional methods are of low-sensitivity or restricted to abundant mitochondrial DNA samples, we established a protocol that enables the accurate sequence-specific quantification of DNA damage in >3-kb probes for any mitochondrial or nuclear DNA sequence. In order to validate the sensitivity of this method, we compared LORD-Q with a previously published qPCR-based method and the standard single-cell gel electrophoresis assay, demonstrating a superior performance of LORD-Q. Exemplarily, we monitored induction of DNA damage and repair processes in human induced pluripotent stem cells and isogenic fibroblasts. Our results suggest that LORD-Q provides a sequence-specific and precise method to quantify DNA damage, thereby allowing the high-throughput assessment of DNA repair, genotoxicity screening and various other processes for a wide range of life science applications.
SAPTA: a new design tool for improving TALE nuclease activityLin, Yanni; Fine, Eli J.; Zheng, Zhilan; Antico, Christopher J.; Voit, Richard A.; Porteus, Matthew H.; Cradick, Thomas J.; Bao, Gang
doi: 10.1093/nar/gkt1363pmid: 24442582
Transcription activator-like effector nucleases (TALENs) have become a powerful tool for genome editing due to the simple code linking the amino acid sequences of their DNA-binding domains to TALEN nucleotide targets. While the initial TALEN-design guidelines are very useful, user-friendly tools defining optimal TALEN designs for robust genome editing need to be developed. Here we evaluated existing guidelines and developed new design guidelines for TALENs based on 205 TALENs tested, and established the scoring algorithm for predicting TALEN activity (SAPTA) as a new online design tool. For any input gene of interest, SAPTA gives a ranked list of potential TALEN target sites, facilitating the selection of optimal TALEN pairs based on predicted activity. SAPTA-based TALEN designs increased the average intracellular TALEN monomer activity by >3-fold, and resulted in an average endogenous gene-modification frequency of 39% for TALENs containing the repeat variable di-residue NK that favors specificity rather than activity. It is expected that SAPTA will become a useful and flexible tool for designing highly active TALENs for genome-editing applications. SAPTA can be accessed via the website at http://baolab.bme.gatech.edu/Research/BioinformaticTools/TAL_targeter.html.
Genome-wide quantification of homeolog expression ratio revealed nonstochastic gene regulation in synthetic allopolyploid ArabidopsisAkama, Satoru; Shimizu-Inatsugi, Rie; Shimizu, Kentaro K.; Sese, Jun
doi: 10.1093/nar/gkt1376pmid: 24423873
Genome duplication with hybridization, or allopolyploidization, occurs commonly in plants, and is considered to be a strong force for generating new species. However, genome-wide quantification of homeolog expression ratios was technically hindered because of the high homology between homeologous gene pairs. To quantify the homeolog expression ratio using RNA-seq obtained from polyploids, a new method named HomeoRoq was developed, in which the genomic origin of sequencing reads was estimated using mismatches between the read and each parental genome. To verify this method, we first assembled the two diploid parental genomes of Arabidopsis halleri subsp. gemmifera and Arabidopsis lyrata subsp. petraea (Arabidopsis petraea subsp. umbrosa), then generated a synthetic allotetraploid, mimicking the natural allopolyploid Arabidopsis kamchatica. The quantified ratios corresponded well to those obtained by Pyrosequencing. We found that the ratios of homeologs before and after cold stress treatment were highly correlated (r = 0.870). This highlights the presence of nonstochastic polyploid gene regulation despite previous research identifying stochastic variation in expression. Moreover, our new statistical test incorporating overdispersion identified 226 homeologs (1.11% of 20 369 expressed homeologs) with significant ratio changes, many of which were related to stress responses. HomeoRoq would contribute to the study of the genes responsible for polyploid-specific environmental responses.