GenomeView: a next-generation genome browserAbeel, Thomas; Van Parys, Thomas; Saeys, Yvan; Galagan, James; Van de Peer, Yves
doi: 10.1093/nar/gkr995pmid: 22102585
Due to ongoing advances in sequencing technologies, billions of nucleotide sequences are now produced on a daily basis. A major challenge is to visualize these data for further downstream analysis. To this end, we present GenomeView, a stand-alone genome browser specifically designed to visualize and manipulate a multitude of genomics data. GenomeView enables users to dynamically browse high volumes of aligned short-read data, with dynamic navigation and semantic zooming, from the whole genome level to the single nucleotide. At the same time, the tool enables visualization of whole genome alignments of dozens of genomes relative to a reference sequence. GenomeView is unique in its capability to interactively handle huge data sets consisting of tens of aligned genomes, thousands of annotation features and millions of mapped short reads both as viewer and editor. GenomeView is freely available as an open source software package.
Quantitative model of R-loop forming structures reveals a novel level of RNADNA interactome complexityWongsurawat, Thidathip; Jenjaroenpun, Piroon; Kwoh, Chee Keong; Kuznetsov, Vladimir
doi: 10.1093/nar/gkr1075pmid: 22121227
R-loop is the structure co-transcriptionally formed between nascent RNA transcript and DNA template, leaving the non-transcribed DNA strand unpaired. This structure can be involved in the hyper-mutation and dsDNA breaks in mammalian immunoglobulin (Ig) genes, oncogenes and neurodegenerative disease related genes. R-loops have not been studied at the genome scale yet. To identify the R-loops, we developed a computational algorithm and mapped R-loop forming sequences (RLFS) onto 66803 sequences defined by UCSC as known genes. We found that 59 of these transcribed sequences contain at least one RLFS. We created R-loopDB (http://rloop.bii.a-star.edu.sg/), the database that collects all RLFS identified within over half of the human genes and links to the UCSC Genome Browser for information integration and visualisation across a variety of bioinformatics sources. We found that many oncogenes and tumour suppressors (e.g. Tp53, BRCA1, BRCA2, Kras and Ptprd) and neurodegenerative diseases related genes (e.g. ATM, Park2, Ptprd and GLDC) could be prone to significant R-loop formation. Our findings suggest that R-loops provide a novel level of RNADNA interactome complexity, playing key roles in gene expression controls, mutagenesis, recombination process, chromosomal rearrangement, alternative splicing, DNA-editing and epigenetic modifications. RLFSs could be used as a novel source of prospective therapeutic targets.
MetaQC: objective quality control and inclusion/exclusion criteria for genomic meta-analysisKang, Dongwan D.; Sibille, Etienne; Kaminski, Naftali; Tseng, George C.
doi: 10.1093/nar/gkr1071pmid: 22116060
Genomic meta-analysis to combine relevant and homogeneous studies has been widely applied, but the quality control (QC) and objective inclusion/exclusion criteria have been largely overlooked. Currently, the inclusion/exclusion criteria mostly depend on ad-hoc expert opinion or nave threshold by sample size or platform. There are pressing needs to develop a systematic QC methodology as the decision of study inclusion greatly impacts the final meta-analysis outcome. In this article, we propose six quantitative quality control measures, covering internal homogeneity of coexpression structure among studies, external consistency of coexpression pattern with pathway database, and accuracy and consistency of differentially expressed gene detection or enriched pathway identification. Each quality control index is defined as the minus log transformed P values from formal hypothesis testing. Principal component analysis biplots and a standardized mean rank are applied to assist visualization and decision. We applied the proposed method to 4 large-scale examples, combining 7 brain cancer, 9 prostate cancer, 8 idiopathic pulmonary fibrosis and 17 major depressive disorder studies, respectively. The identified problematic studies were further scrutinized for potential technical or biological causes of their lower quality to determine their exclusion from meta-analysis. The application and simulation results concluded a systematic quality assessment framework for genomic meta-analysis.
Enhanced analysis of real-time PCR data by using a variable efficiency model: FPK-PCRLievens, Antoon; Van Aelst, S.; Van den Bulcke, M.; Goetghebeur, E.
doi: 10.1093/nar/gkr775pmid: 22102586
Current methodology in real-time Polymerase chain reaction (PCR) analysis performs well provided PCR efficiency remains constant over reactions. Yet, small changes in efficiency can lead to large quantification errors. Particularly in biological samples, the possible presence of inhibitors forms a challenge. We present a new approach to single reaction efficiency calculation, called Full Process Kinetics-PCR (FPK-PCR). It combines a kinetically more realistic model with flexible adaptation to the full range of data. By reconstructing the entire chain of cycle efficiencies, rather than restricting the focus on a window of application, one extracts additional information and loses a level of arbitrariness. The maximal efficiency estimates returned by the model are comparable in accuracy and precision to both the golden standard of serial dilution and other single reaction efficiency methods. The cycle-to-cycle changes in efficiency, as described by the FPK-PCR procedure, stay considerably closer to the data than those from other S-shaped models. The assessment of individual cycle efficiencies returns more information than other single efficiency methods. It allows in-depth interpretation of real-time PCR data and reconstruction of the fluorescence data, providing quality control. Finally, by implementing a global efficiency model, reproducibility is improved as the selection of a window of application is avoided.
i-ADHoRe 3.0fast and sensitive detection of genomic homology in extremely large data setsProost, Sebastian; Fostier, Jan; De Witte, Dieter; Dhoedt, Bart; Demeester, Piet; Van de Peer, Yves; Vandepoele, Klaas
doi: 10.1093/nar/gkr955pmid: 22102584
Comparative genomics is a powerful means to gain insight into the evolutionary processes that shape the genomes of related species. As the number of sequenced genomes increases, the development of software to perform accurate cross-species analyses becomes indispensable. However, many implementations that have the ability to compare multiple genomes exhibit unfavorable computational and memory requirements, limiting the number of genomes that can be analyzed in one run. Here, we present a software package to unveil genomic homology based on the identification of conservation of gene content and gene order (collinearity), i-ADHoRe 3.0, and its application to eukaryotic genomes. The use of efficient algorithms and support for parallel computing enable the analysis of large-scale data sets. Unlike other tools, i-ADHoRe can process the Ensembl data set, containing 49 species, in 1h. Furthermore, the profile search is more sensitive to detect degenerate genomic homology than chaining pairwise collinearity information based on transitive homology. From ultra-conserved collinear regions between mammals and birds, by integrating coexpression information and proteinprotein interactions, we identified more than 400 regions in the human genome showing significant functional coherence. The different algorithmical improvements ensure that i-ADHoRe 3.0 will remain a powerful tool to study genome evolution.
Novel reporter systems for facile evaluation of I-SceI-mediated genome editingMuoz, Nina M.; Beard, Brian C.; Ryu, Byoung Y.; Luche, Ralf M.; Trobridge, Grant D.; Rawlings, David J.; Scharenberg, Andrew M.; Kiem, Hans-Peter
doi: 10.1093/nar/gkr897pmid: 22110042
Two major limitations to achieve efficient homing endonuclease-stimulated gene correction using retroviral vectors are low frequency of gene targeting and random integration of the targeting vectors. To overcome these issues, we developed a reporter system for quick and facile testing of novel strategies to promote the selection of cells that undergo targeted gene repair and to minimize the persistence of random integrations and non-homologous end-joining events. In this system, the gene target has an I-SceI site upstream of an EGFP reporter; and the repair template includes a non-functional EGFP gene, the positive selection transgene MGMTP140K tagged with mCherry, and the inducible Caspase-9 suicide gene. Using this dual fluorescent reporter system it is possible to detect properly targeted integration. Furthermore, this reporter system provides an efficient approach to enrich for gene correction events and to deplete events produced by random integration. We have also developed a second reporter system containing MGMTP140K in the integrated target locus, which allows for selection of primary cells with the integrated gene target after transplantation. This system is particularly useful for testing repair strategies in primary hematopoietic stem cells. Thus, our reporter systems should allow for more efficient gene correction with less unwanted off target effects.
In vitro quantification of specific microRNA using molecular beaconsBaker, Meredith B.; Bao, Gang; Searles, Charles D.
doi: 10.1093/nar/gkr1016pmid: 22110035
MicroRNAs (miRNAs), a class of non-coding RNAs, have become a major focus of molecular biology research because of their diverse genomic origin and ability to regulate an array of cellular processes. Although the biological functions of miRNA are yet to be fully understood, tissue levels of specific miRNAs have been shown to correlate with pathological development of disease. Here, we demonstrate that molecular beacons can readily distinguish mature- and pre-miRNAs, and reliably quantify miRNA expression. We found that molecular beacons with DNA, RNA and combined locked nucleic acid (LNA)DNA backbones can all detect miRNAs of low (<1nM) concentrations in vitro, with RNA beacons having the highest detection sensitivity. Furthermore, we found that molecular beacons have the potential to distinguish miRNAs that have slight variations in their nucleotide sequence. These results suggest that the molecular beacon-based approach to assess miRNA expression and distinguish mature and precursor miRNA species is quite robust, and has the promise for assessing miRNA levels in biological samples.