The HapMap Resource is Providing New Insights into Ourselves and its Application to Pharmacogenomics: Zhang, Wei; Ratain, Mark J.; Dolan, M. Eileen
doi: 10.4137/bbi.s455pmid: 18392109
The exploration of quantitative variation in complex traits such as gene expression and drug response in human populations has become one of the major priorities for medical genetics. The International HapMap Project provides a key resource of genotypic data on human lymphoblastoid cell lines derived from four major world populations of European, African, Chinese and Japanese ancestry for researchers to associate with various phenotypic data to find genes affecting health, disease and response to drugs. Recent progress in dissecting genetic contribution to natural variation in gene expression within and among human populations and variation in drug response are two examples in which researchers have utilized the HapMap resource. The HapMap Project provides new insights into the human genome and has applicability to pharmacogenomics studies leading to personalized medicine.
A New Test Statistic Based on Shrunken Sample Variance for Identifying Differentially Expressed Genes in Small Microarray Experiments: Hirakawa, Akihiro; Sato, Yasunori; Hamada, Chikuma; Yoshimura, Isao
doi: 10.4137/bbi.s473pmid: 19812772
Choosing an appropriate statistic and precisely evaluating the false discovery rate (FDR) are both essential for devising an effective method for identifying differentially expressed genes in microarray data. The t-type score proposed by Pan et al. (2003) succeeded in suppressing false positives by controlling the underestimation of variance but left the overestimation uncontrolled. For controlling the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score. Since the relative superiority of the mean and median FDRs was unclear in the widely adopted Significance Analysis of Microarrays (SAM), we conducted simulation studies to examine the performance of the variance stabilized t-type score and the characteristics of the two FDRs. The variance stabilized t-type score was generally better than or at least as good as the t-type score, irrespective of the sample size and proportion of differentially expressed genes. In terms of accuracy, the median FDR was superior to the mean FDR when the proportion of differentially expressed genes was large. The variance stabilized t-type score with the median FDR was applied to actual colorectal cancer data and yielded a reasonable result.
A Unified Discussion on the Concept of Score Functions Used in the Context of Nonparametric Linkage Analysis: Ängquist, Lars
doi: 10.1177/117793220800200001pmid: N/A
In this article we try to discuss nonparametric linkage (NPL) score functions within a broad and quite general framework. The main focus of the paper is the structure, derivation principles and interpretations of the score function entity itself. We define and discuss several families of one-locus score function definitions, i.e. the implicit, explicit and optimal ones. Some generalizations and comments to the two-locus, unconditional and conditional, cases are included as well. Although this article mainly aims at serving as an overview, where the concept of score functions are put into a covering context, we generalize the noncentrality parameter (NCP) optimal score functions in Ängquist et al. (2007) to facilitate—through weighting—for incorporation of several plausible distinct genetic models. Since the genetic model itself most oftenly is to some extent unknown this facilitates weaker prior assumptions with respect to plausible true disease models without loosing the property of NCP-optimality. Moreover, we discuss general assumptions and properties of score functions in the above sense. For instance, the concept of identical by descent (IBD) sharing structures and score function equivalence are discussed in some detail.
Distribution of Polymorphic and Non-Polymorphic Microsatellite Repeats in Xenopus tropicalis: Xu, Zhenkang; Gutierrez, Laura; Hitchens, Matthew; Scherer, Steve; Sater, Amy K.; Wells, Dan E.
doi: 10.4137/bbi.s561pmid: 19812773
The results of our bioinformatics analysis have found over 91,000 di-, tri-, and tetranucleotide microsatellites in our survey of 25% of the X. tropicalis genome, suggesting there may be over 360,000 within the entire genome. Within the X. tropicalis genome, dinucleotide (78.7%) microsatellites vastly out numbered tri- and tetranucleotide microsatellites. Similarly, AT-rich repeats are overwhelmingly dominant. The four AT-only motifs (AT, AAT, AAAT, and AATT) account for 51,858 out of 91,304 microsatellites found. Individually, AT microsatellites were the most common repeat found, representing over half of all di-, tri-, and tetranucleotide microsatellites. This contrasts with data from other studies, which show that AC is the most frequent microsatellite in vertebrate genomes (Toth et al. 2000). In addition, we have determined the rate of polymorphism for 5,128 non-redundant microsatellites, embedded in unique sequences. Interestingly, this subgroup of microsatellites was determined to have significantly longer repeats than genomic microsatellites as a whole. In addition, microsatellite loci with tandem repeat lengths more than 30 bp exhibited a significantly higher degree of polymorphism than other loci. Pairwise comparisons show that tetranucleotide microsatellites have the highest polymorphic rates. In addition, AAT and ATC showed significant higher polymorphism than other trinucleotide microsatellites, while AGAT and AAAG were significantly more polymorphic than other tetranucleotide microsatellites.
Using a Seed-Network to Query Multiple Large-Scale Gene Expression Datasets from the Developing Retina in Order to Identify and Prioritize Experimental Targets: Hecker, Laura A.; Alcon, Timothy C.; Honavar, Vasant G.; Greenlee, M. Heather West
doi: 10.4137/bbi.s417pmid: 19812791
Understanding the gene networks that orchestrate the differentiation of retinal progenitors into photoreceptors in the developing retina is important not only due to its therapeutic applications in treating retinal degeneration but also because the developing retina provides an excellent model for studying CNS development. Although several studies have profiled changes in gene expression during normal retinal development, these studies offer at best only a starting point for functional studies focused on a smaller subset of genes. The large number of genes profiled at comparatively few time points makes it extremely difficult to reliably infer gene networks from a gene expression dataset. We describe a novel approach to identify and prioritize from multiple gene expression datasets, a small subset of the genes that are likely to be good candidates for further experimental investigation. We report progress on addressing this problem using a novel approach to querying multiple large-scale expression datasets using a ‘seed network’ consisting of a small set of genes that are implicated by published studies in rod photoreceptor differentiation. We use the seed network to identify and sort a list of genes whose expression levels are highly correlated with those of multiple seed network genes in at least two of the five gene expression datasets. The fact that several of the genes in this list have been demonstrated, through experimental studies reported in the literature, to be important in rod photoreceptor function provides support for the utility of this approach in prioritizing experimental targets for further experimental investigation. Based on Gene Ontology and KEGG pathway annotations for the list of genes obtained in the context of other information available in the literature, we identified seven genes or groups of genes for possible inclusion in the gene network involved in differentiation of retinal progenitor cells into rod photoreceptors. Our approach to querying multiple gene expression datasets using a seed network constructed from known interactions between specific genes of interest provides a promising strategy for focusing hypothesis-driven experiments using large-scale ‘omics’ data.
Structural Re-Alignment in an Immunogenic Surface Region of Ricin a Chain: Zemla, Adam T.; Zhou, Carol L. Ecale
doi: 10.4137/bbi.s437pmid: 19812763
We compared structure alignments generated by several protein structure comparison programs to determine whether existing methods would satisfactorily align residues at a highly conserved position within an immunogenic loop in ribosome inactivating proteins (RIPs). Using default settings, structure alignments generated by several programs (CE, DaliLite, FATCAT, LGA, MAMMOTH, MATRAS, SHEBA, SSM) failed to align the respective conserved residues, although LGA reported correct residue-residue (R-R) correspondences when the beta-carbon (Cb) position was used as the point of reference in the alignment calculations. Further tests using variable points of reference indicated that points distal from the beta carbon along a vector connecting the alpha and beta carbons yielded rigid structural alignments in which residues known to be highly conserved in RIPs were reported as corresponding residues in structural comparisons between ricin A chain, abrin-A, and other RIPs. Results suggest that approaches to structure alignment employing alternate point representations corresponding to side chain position may yield structure alignments that are more consistent with observed conservation of functional surface residues than do standard alignment programs, which apply uniform criteria for alignment (i.e. alpha carbon (Ca) as point of reference) along the entirety of the peptide chain. We present the results of tests that suggest the utility of allowing user-specified points of reference in generating alternate structural alignments, and we present a web server for automatically generating such alignments: http://as2ts.llnl.gov/AS2TS/LGA/lga_pdblist_plots.html.