Adler, Priit; Peterson, Hedi; Agius, Phaedra; Reimand, Jüri; Vilo, Jaak
doi: 10.1111/j.1749-6632.2008.03747.xpmid: 19348627
Cellular processes are often carried out by intricate systems of interacting genes and proteins. Some of these systems are rather well studied and described in pathway databases, while the roles and functions of the majority of genes are poorly understood. A large compendium of public microarray data is available that covers a variety of conditions, samples, and tissues and provides a rich source for genome‐scale information. We focus our study on the analysis of 35 curated biological pathways in the context of gene co‐expression over a large variety of biological conditions. By defining a global co‐expression similarity rank for each gene and pathway, we perform exhaustive leave‐one‐out computations to describe existing pathway memberships using other members of the corresponding pathway as reference. We demonstrate that while successful in recovering biological base processes such as metabolism and translation, the global correlation measure fails to detect gene memberships in signaling pathways where co‐expression is less evident. Our results also show that pathway membership detection is more effective when using only a subset of corresponding pathway members as reference, supporting the existence of more tightly co‐expressed subsets of genes within pathways. Our study assesses the predictive power of global gene expression correlation measures in reconstructing biological systems of various functions and specificity. The developed computational network has immediate applications in detecting dubious pathway members and predicting novel member candidates.
Krallinger, Martin; Rojas, Ana María; Valencia, Alfonso
doi: 10.1111/j.1749-6632.2008.03750.xpmid: 19348628
High‐throughput experimental techniques are generating large data collections with the aim of identifying novel entities involved in fundamental cellular processes as well as drawing a systematic picture of the relationships between individual components. Determining the accuracy of the resulting data and the selection of a subset of targets for more careful characterizations often requires relying on information provided by manually annotated data repositories. These repositories are incomplete and cover only a small fraction of the knowledge contained in the literature. We propose in this paper the use of text‐mining technologies to extract, organize, and present information relevant for a particular biological topic. The aims of the resulting approach are (1) to enable topic‐centric biological literature navigation, (2) to assist in the construction of manually revised data repositories, (3) to provide prioritization of biological entities for experimental studies, and (4) to enable human interpretation of large‐scale experiments by providing direct links of bio‐entities to relevant descriptions in the literature.
Lemmens, Karen; De Bie, Tijl; Dhollander, Thomas; Monsieurs, Pieter; De Moor, Bart; Collado‐Vides, Julio; Engelen, Kristof; Marchal, Kathleen
doi: 10.1111/j.1749-6632.2008.03746.xpmid: 19348629
Thanks to the availability of high‐throughput omics data, bioinformatics approaches are able to hypothesize thus‐far undocumented genetic interactions. However, due to the amount of noise in these data, inferences based on a single data source are often unreliable. A popular approach to overcome this problem is to integrate different data sources. In this study, we describe DISTILLER, a novel framework for data integration that simultaneously analyzes microarray and motif information to find modules that consist of genes that are co‐expressed in a subset of conditions, and their corresponding regulators. By applying our method on publicly available data, we evaluated the condition‐specific transcriptional network of Escherichia coli. DISTILLER confirmed 62% of 736 interactions described in RegulonDB, and 278 novel interactions were predicted.
Michoel, Tom; De Smet, Riet; Joshi, Anagha; Marchal, Kathleen; Van de Peer, Yves
doi: 10.1111/j.1749-6632.2008.03943.xpmid: 19348630
“Module networks” are a framework to learn gene regulatory networks from expression data using a probabilistic model in which coregulated genes share the same parameters and conditional distributions. We present a method to infer ensembles of such networks and an averaging procedure to extract the statistically most significant modules and their regulators. We show that the inferred probabilistic models extend beyond the dataset used to learn the models.
Lipshtat, Azi; Neves, Susana R.; Iyengar, Ravi
doi: 10.1111/j.1749-6632.2008.03748.xpmid: 19348631
Graph theory provides a useful and powerful tool for the analysis of cellular signaling networks. Intracellular components such as cytoplasmic signaling proteins, transcription factors, and genes are connected by links, representing various types of chemical interactions that result in functional consequences. However, these graphs lack important information regarding the spatial distribution of cellular components. The ability of two cellular components to interact depends not only on their mutual chemical affinity but also on colocalization to the same subcellular region. Localization of components is often used as a regulatory mechanism to achieve specific effects in response to different receptor signals. Here we describe an approach for incorporating spatial distribution into graphs and for the development of mixed graphs where links are specified by mutual chemical affinity as well as colocalization. We suggest that such mixed graphs will provide more accurate descriptions of functional cellular networks and their regulatory capabilities and aid in the development of large‐scale predictive models of cellular behavior.
Hoffmann, Sabrina; Holzhütter, Hermann‐Georg
doi: 10.1111/j.1749-6632.2008.03753.xpmid: 19348632
Expression profiling and proteomic techniques reveal significant variations in the levels of thousands of mRNAs and proteins in response to environmental changes such as substrate depletion, oxidative stress, and hormonal stimulation. However, in most cases the functional implications of these variations remain elusive. One crucial problem complicating the functional interpretation of high‐throughput data is that changes of protein levels do not simply translate into equivalent changes in the rate of the associated chemical processes due to various modes of enzyme regulation and the instantaneous effect of changed metabolite concentrations on adjacent flux rates. Here, we outline a theoretical concept to exploit information on (relative) changes in the level of metabolic enzymes for the prediction of (relative) flux changes in the underlying metabolic network. Our approach rests on the assumption that size and direction of fluxes (flux distribution) in the network are determined by an optimization principle in that the production of the physiologically relevant output metabolites is accomplished with minimal total flux. The prediction method comprises two main steps. First, we approximate (unknown) flux changes by a linear combination of so‐called minimal flux modes, each representing a specific flux distribution minimally required to accomplish the production of only one of the numerous functionally relevant output metabolites. Second, the unknown coefficients of this decomposition are chosen such that a maximal correlation with observed differential expression data is obtained. Based on simulated enzyme expression scenarios in a metabolic model of the human red blood cell, we demonstrate the predictive capacity of our method.
Gowda, Tejaswi; Vrudhula, Sarma; Kim, Seungchan
doi: 10.1111/j.1749-6632.2008.03754.xpmid: 19348633
Gene regulation modeling is one of the most active research topics in systems biology. The aim of modeling gene regulation is to understand how individual genes function and interact with each other to create complex biological phenomena. In this paper we propose a novel gene regulatory model based on threshold logic. The approach is developed by a combination of threshold logic properties and perceptron learning techniques. This work does not focus on determination of the pair‐wise interactions among genes. Instead, the objective of this work is to generate a model that will describe and predict phenomena associated with a biological system. The utility of the approach is demonstrated by modeling a cellular system of 50 genes. The model could effectively replicate both the steady state and the transient behavior of genes.
doi: 10.1111/j.1749-6632.2008.03752.xpmid: 19348634
We are interested in the relationships among network topology, robustness, and identifiability, and their implications in improving network reconstruction. We used three different types of artificial gene networks (AGNs) with distinct topologies: topologies random (RND), scale‐free (SF), and small‐world (SW), to investigate their robustness and identifiability. The robustness of a network is represented by structural reachability (existence of pathways between two nodes) and dynamic reachability (response on one node upon perturbation on another node). The identifiability of the network edges is assessed in silico with an established reverse‐engineering algorithm. We found that (1) structural reachability does not always lead to dynamic reachability; (2) network robustness is high and identifiability is low in all surveyed AGNs; (3) robustness is more sensitive to network topologies than is identifiability. We also devised a method for network dissection in which three subnets (set of alternative pathways or feedbacks, referred to as pathnet) are related to each node pair. This method allows us to identify the fine structural features underlying the distinct behaviors of the networks. For example, pathnet of the edge tail negatively contributes to the edge identifiability, and it is likely that extra perturbation at this pathnet would improve edge identifiability. We provide a case study to prove that double perturbations decrease the edge robustness and increase structural identifiability with a T helper cell–differentiation network model.
doi: 10.1111/j.1749-6632.2008.03749.xpmid: 19348635
Using microarray experiments, we can model causal relationships of genes measured through mRNA expression levels. To this end, it is desirable to compare experiments of the system under complete interventions of some genes, such as by knock out of some genes, with experiments of the system under no interventions. However, it is expensive and difficult to conduct wet lab experiments of complete interventions of genes in a biological system. Thus, it will be helpful if we can discover promising causal relationships among genes with no interventions or incomplete interventions, such as by applying a treatment that has unknown effects to modeled genes, in order to identify promising genes to perturb in the system that can later be verified in wet laboratories. In this paper we use causal Bayesian networks to implement a causal discovery algorithm—the equivalence local implicit latent variable scoring method (EquLIM)—that identifies promising causal relationships even with a small dataset generated from no or incomplete interventions. We then apply EquLIM to analyze the five‐gene‐network data and compare EquLIM's predictions with true causal pairwise relationships between the genes.
Showing 1 to 10 of 27 Articles