A comprehensive comparison of general RNA–RNA interaction prediction methodsLai, Daniel; Meyer, Irmtraud M.
doi: 10.1093/nar/gkv1477pmid: 26673718
AbstractRNA–RNA interactions are fast emerging as a major functional component in many newly discovered non-coding RNAs. Basepairing is believed to be a major contributor to the stability of these intermolecular interactions, much like intramolecular basepairs formed in RNA secondary structure. As such, using algorithms similar to those for predicting RNA secondary structure, computational methods have been recently developed for the prediction of RNA–RNA interactions.We provide the first comprehensive comparison comprising 14 methods that predict general intermolecular basepairs. To evaluate these, we compile an extensive data set of 54 experimentally confirmed fungal snoRNA–rRNA interactions and 102 bacterial sRNA–mRNA interactions. We test the performance accuracy of all methods, evaluating the effects of tool settings, sequence length, and multiple sequence alignment usage and quality.Our results show that—unlike for RNA secondary structure prediction—the overall best performing tools are non-comparative energy-based tools utilizing accessibility information that predict short interactions on this data set. Furthermore, we find that maintaining high accuracy across biologically different data sets and increasing input lengths remains a huge challenge, causing implications for de novo transcriptome-wide searches. Finally, we make our interaction data set publicly available for future development and benchmarking efforts.
SimRNA: a coarse-grained method for RNA folding simulations and 3D structure predictionBoniecki, Michal J.; Lach, Grzegorz; Dawson, Wayne K.; Tomala, Konrad; Lukasz, Pawel; Soltysinski, Tomasz; Rother, Kristian M.; Bujnicki, Janusz M.
doi: 10.1093/nar/gkv1479pmid: 26687716
AbstractRNA molecules play fundamental roles in cellular processes. Their function and interactions with other biomolecules are dependent on the ability to form complex three-dimensional (3D) structures. However, experimental determination of RNA 3D structures is laborious and challenging, and therefore, the majority of known RNAs remain structurally uncharacterized. Here, we present SimRNA: a new method for computational RNA 3D structure prediction, which uses a coarse-grained representation, relies on the Monte Carlo method for sampling the conformational space, and employs a statistical potential to approximate the energy and identify conformations that correspond to biologically relevant structures. SimRNA can fold RNA molecules using only sequence information, and, on established test sequences, it recapitulates secondary structure with high accuracy, including correct prediction of pseudoknots. For modeling of complex 3D structures, it can use additional restraints, derived from experimental or computational analyses, including information about secondary structure and/or long-range contacts. SimRNA also can be used to analyze conformational landscapes and identify potential alternative structures.
Enhanced sequencing coverage with digital droplet multiple displacement amplificationSidore, Angus M.; Lan, Freeman; Lim, Shaun W.; Abate, Adam R.
doi: 10.1093/nar/gkv1493pmid: 26704978
AbstractSequencing small quantities of DNA is important for applications ranging from the assembly of uncultivable microbial genomes to the identification of cancer-associated mutations. To obtain sufficient quantities of DNA for sequencing, the small amount of starting material must be amplified significantly. However, existing methods often yield errors or non-uniform coverage, reducing sequencing data quality. Here, we describe digital droplet multiple displacement amplification, a method that enables massive amplification of low-input material while maintaining sequence accuracy and uniformity. The low-input material is compartmentalized as single molecules in millions of picoliter droplets. Because the molecules are isolated in compartments, they amplify to saturation without competing for resources; this yields uniform representation of all sequences in the final product and, in turn, enhances the quality of the sequence data. We demonstrate the ability to uniformly amplify the genomes of single Escherichia coli cells, comprising just 4.7 fg of starting DNA, and obtain sequencing coverage distributions that rival that of unamplified material. Digital droplet multiple displacement amplification provides a simple and effective method for amplifying minute amounts of DNA for accurate and uniform sequencing.
Integrating gene synthesis and microfluidic protein analysis for rapid protein engineeringBlackburn, Matthew C.; Petrova, Ekaterina; Correia, Bruno E.; Maerkl, Sebastian J.
doi: 10.1093/nar/gkv1497pmid: 26704969
AbstractThe capability to rapidly design proteins with novel functions will have a significant impact on medicine, biotechnology and synthetic biology. Synthetic genes are becoming a commodity, but integrated approaches have yet to be developed that take full advantage of gene synthesis. We developed a solid-phase gene synthesis method based on asymmetric primer extension (APE) and coupled this process directly to high-throughput, on-chip protein expression, purification and characterization (via mechanically induced trapping of molecular interactions, MITOMI). By completely circumventing molecular cloning and cell-based steps, APE-MITOMI reduces the time between protein design and quantitative characterization to 3–4 days. With APE-MITOMI we synthesized and characterized over 400 zinc-finger (ZF) transcription factors (TF), showing that although ZF TFs can be readily engineered to recognize a particular DNA sequence, engineering the precise binding energy landscape remains challenging. We also found that it is possible to engineer ZF–DNA affinity precisely and independently of sequence specificity and that in silico modeling can explain some of the observed affinity differences. APE-MITOMI is a generic approach that should facilitate fundamental studies in protein biophysics, and protein design/engineering.
ChIP-BIT: Bayesian inference of target genes using a novel joint probabilistic model of ChIP-seq profilesChen, Xi; Jung, Jin-Gyoung; Shajahan-Haq, Ayesha N.; Clarke, Robert; Shih, Ie-Ming; Wang, Yue; Magnani, Luca; Wang, Tian-Li; Xuan, Jianhua
doi: 10.1093/nar/gkv1491pmid: 26704972
AbstractChromatin immunoprecipitation with massively parallel DNA sequencing (ChIP-seq) has greatly improved the reliability with which transcription factor binding sites (TFBSs) can be identified from genome-wide profiling studies. Many computational tools are developed to detect binding events or peaks, however the robust detection of weak binding events remains a challenge for current peak calling tools. We have developed a novel Bayesian approach (ChIP-BIT) to reliably detect TFBSs and their target genes by jointly modeling binding signal intensities and binding locations of TFBSs. Specifically, a Gaussian mixture model is used to capture both binding and background signals in sample data. As a unique feature of ChIP-BIT, background signals are modeled by a local Gaussian distribution that is accurately estimated from the input data. Extensive simulation studies showed a significantly improved performance of ChIP-BIT in target gene prediction, particularly for detecting weak binding signals at gene promoter regions. We applied ChIP-BIT to find target genes from NOTCH3 and PBX1 ChIP-seq data acquired from MCF-7 breast cancer cells. TF knockdown experiments have initially validated about 30% of co-regulated target genes identified by ChIP-BIT as being differentially expressed in MCF-7 cells. Functional analysis on these genes further revealed the existence of crosstalk between Notch and Wnt signaling pathways.
Standardizing chromatin research: a simple and universal method for ChIP-seqArrigoni, Laura; Richter, Andreas S.; Betancourt, Emily; Bruder, Kerstin; Diehl, Sarah; Manke, Thomas; Bönisch, Ulrike
doi: 10.1093/nar/gkv1495pmid: 26704968
AbstractChromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) is a key technique in chromatin research. Although heavily applied, existing ChIP-seq protocols are often highly fine-tuned workflows, optimized for specific experimental requirements. Especially the initial steps of ChIP-seq, particularly chromatin shearing, are deemed to be exceedingly cell-type-specific, thus impeding any protocol standardization efforts. Here we demonstrate that harmonization of ChIP-seq workflows across cell types and conditions is possible when obtaining chromatin from properly isolated nuclei. We established an ultrasound-based nuclei extraction method (NEXSON: Nuclei EXtraction by SONication) that is highly effective across various organisms, cell types and cell numbers. The described method has the potential to replace complex cell-type-specific, but largely ineffective, nuclei isolation protocols. By including NEXSON in ChIP-seq workflows, we completely eliminate the need for extensive optimization and sample-dependent adjustments. Apart from this significant simplification, our approach also provides the basis for a fully standardized ChIP-seq and yields highly reproducible transcription factor and histone modifications maps for a wide range of different cell types. Even small cell numbers (∼10 000 cells per ChIP) can be easily processed without application of modified chromatin or library preparation protocols.
TopDom: an efficient and deterministic method for identifying topological domains in genomesShin, Hanjun; Shi, Yi; Dai, Chao; Tjong, Harianto; Gong, Ke; Alber, Frank; Zhou, Xianghong Jasmine
doi: 10.1093/nar/gkv1505pmid: 26704975
AbstractGenome-wide proximity ligation assays allow the identification of chromatin contacts at unprecedented resolution. Several studies reveal that mammalian chromosomes are composed of topological domains (TDs) in sub-mega base resolution, which appear to be conserved across cell types and to some extent even between organisms. Identifying topological domains is now an important step toward understanding the structure and functions of spatial genome organization. However, current methods for TD identification demand extensive computational resources, require careful tuning and/or encounter inconsistencies in results. In this work, we propose an efficient and deterministic method, TopDom, to identify TDs, along with a set of statistical methods for evaluating their quality. TopDom is much more efficient than existing methods and depends on just one intuitive parameter, a window size, for which we provide easy-to-implement optimization guidelines. TopDom also identifies more and higher quality TDs than the popular directional index algorithm. The TDs identified by TopDom provide strong support for the cross-tissue TD conservation. Finally, our analysis reveals that the locations of housekeeping genes are closely associated with cross-tissue conserved TDs. The software package and source codes of TopDom are available at http://zhoulab.usc.edu/TopDom/.