Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme
doi: 10.1093/nar/gkw217pmid: 27095200
AbstractSystematic structure probing experiments (e.g. SHAPE) of RNA mutants such as the mutate-and-map (MaM) protocol give us a direct access into the genetic robustness of ncRNA structures. Comparative studies of homologous sequences provide a distinct, yet complementary, approach to analyze structural and functional properties of non-coding RNAs. In this paper, we introduce a formal framework to combine the biochemical signal collected from MaM experiments, with the evolutionary information available in multiple sequence alignments. We apply neutral theory principles to detect complex long-range dependencies between nucleotides of a single stranded RNA, and implement these ideas into a software called aRNhAck. We illustrate the biological significance of this signal and show that the nucleotides networks calculated with aRNhAck are correlated with nucleotides located in RNA–RNA, RNA–protein, RNA–DNA and RNA–ligand interfaces. aRNhAck is freely available at http://csb.cs.mcgill.ca/arnhack.
Dumat, Blaise; Larsen, Anders Foller; Wilhelmsson, L. Marcus
doi: 10.1093/nar/gkw114pmid: 26896804
AbstractHerein, we report on the use of a tricyclic cytosine FRET pair, incorporated into DNA with different base pair separations, to study Z-DNA and B-Z DNA junctions. With its position inside the DNA structure, the FRET pair responds to a B- to Z-DNA transition with a distinct change in FRET efficiency for each donor/acceptor configuration allowing reliable structural probing. Moreover, we show how fluorescence spectroscopy and our cytosine analogues can be used to determine rate constants for the B- to Z-DNA transition mechanism. The modified cytosines have little influence on the transition and the FRET pair is thus an easily implemented and virtually non-perturbing fluorescence tool to study Z-DNA. This nucleobase analogue FRET pair represents a valuable addition to the limited number of fluorescence methods available to study Z-DNA and we suggest it will facilitate, for example, deciphering the B- to Z-DNA transition mechanism and investigating the interaction of DNA with Z-DNA binding proteins.
Ståhlberg, Anders; Krzyzanowski, Paul M.; Jackson, Jennifer B.; Egyud, Matthew; Stein, Lincoln; Godfrey, Tony E.
doi: 10.1093/nar/gkw224pmid: 27060140
AbstractDetection of cell-free DNA in liquid biopsies offers great potential for use in non-invasive prenatal testing and as a cancer biomarker. Fetal and tumor DNA fractions however can be extremely low in these samples and ultra-sensitive methods are required for their detection. Here, we report an extremely simple and fast method for introduction of barcodes into DNA libraries made from 5 ng of DNA. Barcoded adapter primers are designed with an oligonucleotide hairpin structure to protect the molecular barcodes during the first rounds of polymerase chain reaction (PCR) and prevent them from participating in mis-priming events. Our approach enables high-level multiplexing and next-generation sequencing library construction with flexible library content. We show that uniform libraries of 1-, 5-, 13- and 31-plex can be generated. Utilizing the barcodes to generate consensus reads for each original DNA molecule reduces background sequencing noise and allows detection of variant alleles below 0.1% frequency in clonal cell line DNA and in cell-free plasma DNA. Thus, our approach bridges the gap between the highly sensitive but specific capabilities of digital PCR, which only allows a limited number of variants to be analyzed, with the broad target capability of next-generation sequencing which traditionally lacks the sensitivity to detect rare variants.
Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J. Carl; Dry, Jonathan R.
doi: 10.1093/nar/gkw227pmid: 27060149
AbstractAccurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research.
Rykunov, Dmitry; Beckmann, Noam D.; Li, Hui; Uzilov, Andrew; Schadt, Eric E.; Reva, Boris
doi: 10.1093/nar/gkw269pmid: 27098033
AbstractAssigning cancer patients to the most effective treatments requires an understanding of the molecular basis of their disease. While DNA-based molecular profiling approaches have flourished over the past several years to transform our understanding of driver pathways across a broad range of tumors, a systematic characterization of key driver pathways based on RNA data has not been undertaken. Here we introduce a new approach for predicting the status of driver cancer pathways based on signature functions derived from RNA sequencing data. To identify the driver cancer pathways of interest, we mined DNA variant data from TCGA and nominated driver alterations in seven major cancer pathways in breast, ovarian and colon cancer tumors. The activation status of these driver pathways were then characterized using RNA sequencing data by constructing classification signature functions in training datasets and then testing the accuracy of the signatures in test datasets. The signature functions differentiate well tumors with nominated pathway activation from tumors with no signs of activation: average AUC equals to 0.83. Our results confirm that driver genomic alterations are distinctively displayed at the transcriptional level and that the transcriptional signatures can generally provide an alternative to DNA sequencing methods in detecting specific driver pathways.
Niekamp, Stefan; Blumer, Katy; Nafisi, Parsa M.; Tsui, Kathy; Garbutt, John; Douglas, Shawn M.
doi: 10.1093/nar/gkw208pmid: 27036861
AbstractScalable production of DNA nanostructures remains a substantial obstacle to realizing new applications of DNA nanotechnology. Typical DNA nanostructures comprise hundreds of DNA oligonucleotide strands, where each unique strand requires a separate synthesis step. New design methods that reduce the strand count for a given shape while maintaining overall size and complexity would be highly beneficial for efficiently producing DNA nanostructures. Here, we report a method for folding a custom template strand by binding individual staple sequences to multiple locations on the template. We built several nanostructures for well-controlled testing of various design rules, and demonstrate folding of a 6-kb template by as few as 10 unique strand sequences binding to 10 ± 2 locations on the template strand.
doi: 10.1093/nar/gkw226pmid: 27084946
AbstractModeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory ‘grammar’ to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ.
Chen, Yong; Wang, Yunfei; Xuan, Zhenyu; Chen, Min; Zhang, Michael Q.
doi: 10.1093/nar/gkw225pmid: 27060148
AbstractDefining chromatin interaction frequencies and topological domains is a great challenge for the annotations of genome structures. Although the chromosome conformation capture (3C) and its derivative methods have been developed for exploring the global interactome, they are limited by high experimental complexity and costs. Here we describe a novel computational method, called CITD, for de novo prediction of the chromatin interaction map by integrating histone modification data. We used the public epigenomic data from human fibroblast IMR90 cell and embryonic stem cell (H1) to develop and test CITD, which can not only successfully reconstruct the chromatin interaction frequencies discovered by the Hi-C technology, but also provide additional novel details of chromosomal organizations. We predicted the chromatin interaction frequencies, topological domains and their states (e.g. active or repressive) for 98 additional cell types from Roadmap Epigenomics and ENCODE projects. A total of 131 protein-coding genes located near 78 preserved boundaries among 100 cell types are found to be significantly enriched in functional categories of the nucleosome organization and chromatin assembly. CITD and its predicted results can be used for complementing the topological domains derived from limited Hi-C data and facilitating the understanding of spatial principles underlying the chromosomal organization.
Sood, Sanjana; Szkop, Krzysztof J.; Nakhuda, Asif; Gallagher, Iain J.; Murie, Carl; Brogan, Robert J.; Kaprio, Jaakko; Kainulainen, Heikki; Atherton, Philip J.; Kujala, Urho M.; Gustafsson, Thomas; Larsson, Ola; Timmons, James A.
doi: 10.1093/nar/gkw263pmid: 27095197
AbstractDNA microarrays and RNAseq are complementary methods for studying RNA molecules. Current computational methods to determine alternative exon usage (AEU) using such data require impractical visual inspection and still yield high false-positive rates. Integrated Gene and Exon Model of Splicing (iGEMS) adapts a gene-level residuals model with a gene size adjusted false discovery rate and exon-level analysis to circumvent these limitations. iGEMS was applied to two new DNA microarray datasets, including the high coverage Human Transcriptome Arrays 2.0 and performance was validated using RT-qPCR. First, AEU was studied in adipocytes treated with (n = 9) or without (n = 8) the anti-diabetes drug, rosiglitazone. iGEMS identified 555 genes with AEU, and robust verification by RT-qPCR (∼90%). Second, in a three-way human tissue comparison (muscle, adipose and blood, n = 41) iGEMS identified 4421 genes with at least one AEU event, with excellent RT-qPCR verification (95%, n = 22). Importantly, iGEMS identified a variety of AEU events, including 3′UTR extension, as well as exon inclusion/exclusion impacting on protein kinase and extracellular matrix domains. In conclusion, iGEMS is a robust method for identification of AEU while the variety of exon usage between human tissues is 5–10 times more prevalent than reported by the Genotype-Tissue Expression consortium using RNA sequencing.
Showing 1 to 10 of 54 Articles