MS-specific noise model reveals the potential of iTRAQ in quantitative proteomicsHundertmark, C.; Fischer, R.; Reinl, T.; May, S.; Klawonn, F.; Jänsch, L.
doi: 10.1093/bioinformatics/btn551pmid: 18952628
Motivation: Mass spectrometry (MS) data are impaired by noise similar to many other analytical methods. Therefore, proteomics requires statistical approaches to determine the reliability of regulatory information if protein quantification is based on ion intensities observed in MS.Results: We suggest a procedure to model instrument and workflow-specific noise behaviour of iTRAQ™ reporter ions that can provide regulatory information during automated peptide sequencing by LC-MS/MS. The established mathematical model representatively predicts possible variations of iTRAQ™ reporter ions in an MS data-dependent manner. The model can be utilized to calculate the robustness of regulatory information systematically at the peptide level in so-called bottom-up proteome approaches. It allows to determine the best fitting regulation factor and in addition to calculate the probability of alternative regulations. The result can be visualized as likelihood curves summarizing both the quantity and quality of regulatory information. Likelihood curves basically can be calculated from all peptides belonging to different regions of proteins if they are detected in LC-MS/MS experiments. Therefore, this approach renders excellent opportunities to detect and statistically validate dynamic post-translational modifications usually affecting only particular regions of the whole protein. The detection of known phosphorylation events at protein kinases served as a first proof of concept in this study and underscores the potential for noise models in quantitative proteomics.Contact: [email protected]; [email protected] information: Supplementary data are available at Bioinformatics online.
Predicting the binding preference of transcription factors to individual DNA k-mersAlleyne, Trevis M.; Peña-Castillo, Lourdes; Badis, Gwenael; Talukder, Shaheynoor; Berger, Michael F.; Gehrke, Andrew R.; Philippakis, Anthony A.; Bulyk, Martha L.; Morris, Quaid D.; Hughes, Timothy R.
doi: 10.1093/bioinformatics/btn645pmid: 19088121
Motivation: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA–protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members.Results: We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF–DNA recognition, and suggest a rational approach for future analyses of TF families.Contact: [email protected] information: Supplementary data are available at Bioinformatics online.
Bayesian robust analysis for genetic architecture of quantitative traitsYang, Runqing; Wang, Xin; Li, Jian; Deng, Hongwen
doi: 10.1093/bioinformatics/btn558pmid: 18974168
Motivation: In most quantitative trait locus (QTL) mapping studies, phenotypes are assumed to follow normal distributions. Deviations from this assumption may affect the accuracy of QTL detection and lead to detection of spurious QTLs. To improve the robustness of QTL mapping methods, we replaced the normal distribution for residuals in multiple interacting QTL models with the normal/independent distributions that are a class of symmetric and long-tailed distributions and are able to accommodate residual outliers. Subsequently, we developed a Bayesian robust analysis strategy for dissecting genetic architecture of quantitative traits and for mapping genome-wide interacting QTLs in line crosses.Results: Through computer simulations, we showed that our strategy had a similar power for QTL detection compared with traditional methods assuming normal-distributed traits, but had a substantially increased power for non-normal phenotypes. When this strategy was applied to a group of traits associated with physical/chemical characteristics and quality in rice, more main and epistatic QTLs were detected than traditional Bayesian model analyses under the normal assumption.Contact: [email protected]; [email protected] information: Supplementary data are available at Bioinformatics online.
Stochastic modelling of genotypic drug-resistance for human immunodeficiency virus towards long-term combination therapy optimizationProsperi, Mattia C. F.; D'Autilia, Roberto; Incardona, Francesca; De Luca, Andrea; Zazzi, Maurizio; Ulivi, Giovanni
doi: 10.1093/bioinformatics/btn568pmid: 18977781
Motivation: Several mathematical models have been investigated for the description of viral dynamics in the human body: HIV-1 infection is a particular and interesting scenario, because the virus attacks cells of the immune system that have a role in the antibody production and its high mutation rate permits to escape both the immune response and, in some cases, the drug pressure. The viral genetic evolution is intrinsically a stochastic process, eventually driven by the drug pressure, dependent on the drug combinations and concentration: in this article the viral genotypic drug resistance onset is the main focus addressed. The theoretical basis is the modelling of HIV-1 population dynamics as a predator–prey system of differential equations with a time-dependent therapy efficacy term, while the viral genome mutation evolution follows a Poisson distribution. The instant probabilities of drug resistance are estimated by means of functions trained from in vitro phenotypes, with a roulette-wheel-based mechanisms of resistant selection. Simulations have been designed for treatments made of one and two drugs as well as for combination antiretroviral therapies. The effect of limited adherence to therapy was also analyzed. Sequential treatment change episodes were also exploited with the aim to evaluate optimal synoptic treatment scenarios.Results: The stochastic predator–prey modelling usefully predicted long-term virologic outcomes of evolved HIV-1 strains for selected antiretroviral therapy combinations. For a set of widely used combination therapies, results were consistent with findings reported in literature and with estimates coming from analysis on a large retrospective data base (EuResist).Contact: [email protected] information: Supplementary data are available at Bioinformatics online.
Graphical methods for quantifying macromolecules through bright field imagingChang, Hang; DeFilippis, Rosa Anna; Tlsty, Thea D.; Parvin, Bahram
doi: 10.1093/bioinformatics/btn426pmid: 18703588
Bright field imaging of biological samples stained with antibodies and/or special stains provides a rapid protocol for visualizing various macromolecules. However, this method of sample staining and imaging is rarely employed for direct quantitative analysis due to variations in sample fixations, ambiguities introduced by color composition and the limited dynamic range of imaging instruments. We demonstrate that, through the decomposition of color signals, staining can be scored on a cell-by-cell basis. We have applied our method to fibroblasts grown from histologically normal breast tissue biopsies obtained from two distinct populations. Initially, nuclear regions are segmented through conversion of color images into gray scale, and detection of dark elliptic features. Subsequently, the strength of staining is quantified by a color decomposition model that is optimized by a graph cut algorithm. In rare cases where nuclear signal is significantly altered as a result of sample preparation, nuclear segmentation can be validated and corrected. Finally, segmented stained patterns are associated with each nuclear region following region-based tessellation. Compared to classical non-negative matrix factorization, proposed method: (i) improves color decomposition, (ii) has a better noise immunity, (iii) is more invariant to initial conditions and (iv) has a superior computing performance.contact: [email protected]