Access the full text.
Sign up today, get DeepDyve free for 14 days.
I. Fokkema, P. Taschner, Gerard Schaafsma, J. Celli, J. Laros, J. Dunnen (2011)
LOVD v.2.0: the next generation in gene variant databasesHuman Mutation, 32
M. Speir, A. Zweig, K. Rosenbloom, B. Raney, B. Paten, Parisa Nejad, Brian Lee, K. Learned, D. Karolchik, A. Hinrichs, S. Heitner, R. Harte, M. Haeussler, L. Guruvadoo, P. Fujita, Christopher Eisenhart, M. Diekhans, H. Clawson, J. Casper, G. Barber, D. Haussler, R. Kuhn, W. Kent (2007)
The UCSC Genome Browser Database: 2008 updateNucleic Acids Research, 36
T. Lowe, S. Eddy (1997)
tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.Nucleic acids research, 25 5
work was supported by the National Human Genome Research Institute
K. Rosenbloom, C. Sloan, V. Malladi, T. Dreszer, K. Learned, V. Kirkup, M. Wong, M. Maddren, Ruihua Fang, S. Heitner, Brian Lee, G. Barber, R. Harte, M. Diekhans, J. Long, S. Wilder, A. Zweig, D. Karolchik, R. Kuhn, D. Haussler, W. Kent (2012)
ENCODE Data in the UCSC Genome Browser: year 5 updateNucleic Acids Research, 41
P. Stenson, M. Mort, E. Ball, K. Howells, A. Phillips, N. Thomas, D. Cooper (2009)
The Human Gene Mutation Database: 2008 updateGenome Medicine, 1
J. Harrow, A. Frankish, J. Gonzalez, Electra Tapanari, M. Diekhans, F. Kokocinski, Bronwen Aken, D. Barrell, A. Zadissa, S. Searle, If Barnes, A. Bignell, Veronika Boychenko, T. Hunt, M. Kay, Gaurab Mukherjee, J. Rajan, Gloria Despacio-Reyes, Gary Saunders, C. Steward, R. Harte, Michael Lin, C. Howald, Andrea Tanzer, T. Derrien, Jacqueline Chrast, Nathalie Walters, S. Balasubramanian, Baikang Pei, M. Tress, J. Rodríguez, Iakes Ezkurdia, Jeltje Baren, M. Brent, D. Haussler, Manolis Kellis, A. Valencia, A. Reymond, M. Gerstein, R. Guigó, T. Hubbard (2012)
GENCODE: The reference human genome annotation for The ENCODE ProjectGenome Research, 22
K. Lindblad-Toh, Manuel Garber, O. Zuk, Michael Lin, B. Parker, Stefan Washietl, P. Kheradpour, J. Ernst, Gregory Jordan, E. Mauceli, L. Ward, C. Lowe, A. Holloway, M. Clamp, S. Gnerre, J. Alfoldi, Kathryn Beal, Jean Chang, H. Clawson, James Cuff, F. Palma, Stephen Fitzgerald, P. Flicek, M. Guttman, M. Hubisz, D. Jaffe, Irwin Jungreis, W. Kent, Dennis Kostka, M. Lara, André Martins, Tim Massingham, Ida Moltke, B. Raney, Matthew Rasmussen, Jim Robinson, A. Stark, Albert Vilella, Jiayu Wen, Xiaohui Xie, M. Zody, K. Worley, C. Kovar, D. Muzny, R. Gibbs, W. Warren, E. Mardis, G. Weinstock, R. Wilson, E. Birney, E. Margulies, Javier Herrero, E. Green, D. Haussler, A. Siepel, N. Goldman, K. Pollard, J. Pedersen, E. Lander, Manolis Kellis (2011)
A high-resolution map of human evolutionary constraint using 29 mammalsNature, 478
ENCODEConsortium, Martin Min (2012)
An Integrated Encyclopedia of DNA Elements in the Human GenomeNature, 489
Gil Abec, G. McVean, David Be, David (Co-Chair), Richard (Co-Chair), G. Abecasis, D. Bentley, A. Chakravarti, A. Clark, P. Donnelly, Evan Eichler, Paul Flicek, S. Gabriel, Richard Gibbs, E. Green, M. Hurles, B. Knoppers, J. Korbel, E. Lander, Charles Lee, H. Lehrach, E. Mardis, Gabor Marth, G. McVean, D. Nickerson, Jeanette Schmidt, S. Sherry, Jun Wang, R. Wilson, Richard Lewi, Richard Lewi, Richard Investigator), H. Dinh, C. Kovar, Sandy Lee, L. Lewis, D. Muzny, Jeff Reid, Min Wang, Jun Jiang, Jun Investigator), X. Fang, Xiaosen Guo, Min Jian, Hui Jiang, Xin Jin, Guoqing Li, Jingxiang Li, Yingrui Li, Zhuo Li, Xiao Liu, Yao Lu, Xuedi Ma, Zheng Su, S. Tai, M. Tang, Bo Wang, Guangbiao Wang, Honglong Wu, Renhua Wu, Ye Yin, Wenwei Zhang, Jiao Zhao, Meiru Zhao, Xiaole Zheng, Yan Zhou, Eric Gabriel, Eric Investigator), D. Altshuler, Stacey (Co-Chair), N. Gupta, Paul Sm, Paul Investigator), Laura Clarke, R. Leinonen, Richard Smith, Xiangqun Zheng-Bradley, David Humphray, David Investigator), R. Grocock, S. Humphray, Terena James, Z. Kingsbury, Hans (Project, Hans Investigator), Ralf Leader), Marcus Albrecht, V. Amstislavskiy, T. Borodina, M. Lienhard, F. Mertes, M. Sultan, B. Timmermann, M. Yaspo, Stephen Investigator), Gil Investigator), Elaine Wils, Elaine (Co-Chair), Richard Investigator), L. Fulton, R. Fulton, G. Weinstock, Richard Bu, Richard Investigator), Senduran Balasubramaniam, J. Burton, P. Danecek, Thomas Keane, Anja Kolb-Kokocinski, Shane McCarthy, J. Stalker, Michael Quail, Jeanette Web, Jeanette Web, Jeanette Investigator), C. Davies, J. Gollub, Teresa Webster, Brant Wong, Yiping Zhan, Adam Investigator), Richard Leader), Fuli Leader), M. Bainbridge, Danny Challis, U. Evani, James Lu, U. Nagaswamy, A. Sabo, Yi Wang, Jin Yu, Jun Li, L. Coin, L. Fang, Qibin Li, Zhenyu Li, Haoxiang Lin, Binghang Liu, Ruibang Luo, Nan Qin, Haojing Shao, Bingqiang Wang, Yinlong Xie, C. Ye, Chang Yu, Fan Zhang, Hancheng Zheng, Hongmei Zhu, Gabor Lee, Gabor Investigator), Erik Garrison, Deniz Kural, Wan-Ping Lee, Wen Leong, Alistair Ward, Jiantao Wu, Mengyao Zhang, Charles S, Charles Investigator), Lauren Griffin, Chih-heng Hsieh, Ryan Mills, Xinghua Shi, Marcin Grotthuss, Chengsheng Zhang, Mark Le, Mark Investigator), Mark Leader), E. Banks, G. Bhatia, Mauricio Carneiro, G. Angel, G. Genovese, R. Handsaker, C. Hartl, S. Mccarroll, J. Nemesh, R. Poplin, S. Schaffner, Khalid Shakir, Seungtai Makarov, Seungtai Investigator), J. Lihm, Vladimir Makarov, Hanjun Kim, Hanjun Investigator), Wook Kim, Ki Kim, Jan Rausch, Jan Investigator), T. Rausch, Paul Cunnin, Kathryn Beal, Fiona Cunningham, Javier Herrero, W. McLaren, G. Ritchie, Andrew Ro, Andrew Investigator), S. Gottipati, A. Keinan, J. Rodriguez-Flores, Pardis T, Pardis Investigator), S. Grossman, S. Tabrizi, Ridhi Tariyal, David Stenson, David Investigator), E. Ball, P. Stenson, David Keir, B. Barnes, Markus Bauer, R. Cheetham, Tony Cox, M. Eberle, Scott Kahn, L. Murray, J. Peden, Richard Shaw, Kai Investigator), Mark Walker, Mark Investigator), Miriam Konkel, Jerilyn Walker, Daniel Lek, Daniel Investigator), M. Lek, Vyacheslav Herwig, Sudbrak Leader), R. Herwig, Mark Investigator), Carlos V, Carlos Investigator), J. Byrnes, Francisco Vega, S. Gravel, E. Kenny, J. Kidd, P. Lacroute, B. Maples, A. Moreno-Estrada, Fouad Zakharia, Eran Baran, Eran Investigator), Yael Baran, David Home, David Investigator), Alexis Christoforides, Nils Homer, Tyler Izatt, Ahmet Kurdoglu, Shripad Sinari, Kevin Squire, Stephen Xiao, Chunlin Xiao, Jonathan Ye, Jonathan Investigator), V. Bafna, Kenny Ye, Esteban (Princ, Esteban Investigator), Ryan Investigator), Christopher Gignoux, David Ke, David Investigator), Sol Katzman, W. Kent, Bryan Howie, Andres Investigator), Emmanouil Lappalainen, Emmanouil Investigator), T. Lappalainen, Scott Tallon, Scott Investigator), Xinyue Liu, A. Maroo, L. Tallon, Jeffrey Michelson, Jeffrey Investigator), L. Michelson, Gonçalo K, Gonçalo (Co-Chair), Hyun Leader), Paul Anderson, A. Angius, A. Bigham, T. Blackwell, F. Busonero, F. Cucca, C. Fuchsberger, Chris Jones, G. Jun, Yun Li, R. Lyons, A. Maschio, E. Porcu, F. Reinier, S. Sanna, D. Schlessinger, C. Sidore, Adrian Tan, Mary Trost, Philip Hodgkinson, Philip Investigator), A. Hodgkinson, Gerton (Principal, Gerton Investigator), Gil (Co-Chair), Jonathan Investigator), Simon Investigator), C. Churchhouse, O. Delaneau, Anjali Gupta-Hinch, Z. Iqbal, I. Mathieson, A. Rimmer, Dionysia Xifara, Taras Investigator), Yunxin Xiong, Yunxin Investigator), Xiaoming Liu, Momiao Xiong, Lynn Xing, Lynn Investigator), D. Witherspoon, Jinchuan Xing, Evan (Princip, Evan Investigator), Brian Investigator), C. Alkan, Iman Hajirasouliha, F. Hormozdiari, Arthur Ko, Peter Sudmant, Elaine Chinwalla, Elaine Investigator), Ken Chen, A. Chinwalla, L. Ding, D. Dooling, D. Koboldt, M. McLellan, J. Wallis, M. Wendl, Qunyuan Zhang, Richard (Principal, Matthew Investigator), Chris Investigator), C. Albers, Q. Ayub, Yuan Chen, A. Coffey, V. Colonna, N. Huang, L. Jostins, Heng Li, A. Scally, Klaudia Walter, Yali Xue, Yujun Zhang, Mark Balasubra, Mark Investigator), A. Abyzov, S. Balasubramanian, Jieming Chen, Declan Clarke, Yao Fu, L. Habegger, A. Harmanci, Mike Jin, Ekta Khurana, Xinmeng Mu, Cristina Sisu, Yingrui (Co-Chair), Yingrui Zhu, Charles Hs, Charles (Co-Chair), Gabor Lee, Steven Ang, Steven Leader), Jeremiah Degenhardt, Paul Zheng, Jan Rausch, Jan (Co-Chair), A. Stütz, David Chee, David Homer, Deanna Xiao, D. Church, Jonathan Ye, J. Michaelson, Gerton (Principal, David Xing, Evan Alkan, Evan (Co-Chair), Ken Wallis, Matthew Blackbu, Matthew (Co-Chair), Benjamin Blackburne, S. Lindsay, Z. Ning, Mark Clar, Mark Investigator), Richard (Proj, Richard (Proj, Richard (Co-Chair), Xiaosen Wu, Gabor Garrison, Gabor (Co-Chair), Guillermo Poplin, M. DePristo, Andrew Rodriguez-Flores, Carlos Gravel, David Home, Gonçalo Kang, Gonçalo Investigator), Hyun Kang, Elaine Ful, Elaine Investigator), Richard Ke, Mark Balasubramanian, Erik Bainbridge, Richard Yu, F. Yu, Guillermo Handsaker, Paul Cunnin, Carlos Vega, David Kurdoglu, Chris Ch, Chris (Co-Chair), A. Frankish, J. Harrow, Mark Abyzo, Mark (Co-Chair), Richard K, Richard K, G. Fowler, Walker Hale, D. Kalra, Jun Zheng, Paul Clarke, Paul (Co-Chair), Laura Leader), Jonathan Barker, G. Kelman, Eugene Kulesha, Rajesh Radhakrishnan, Asier Roa, Dmitriy Smirnov, Ian Streeter, I. Toneva, B. Vaughan, David Kahn, Ralf Lienhard, David Kurdoglu, Stephen Ananiev, Stephen (Co-Chair), Victor Ananiev, Zinaida Belaia, Dimitriy Beloslyudtsev, Nathan Bouk, Chao Chen, Robert Cohen, Charles Cook, John Garner, T. Hefferon, M. Kimelman, Chunlei Liu, John Lopez, Peter Meric, Christa O'Sullivan, Yu. Ostapchuk, Lon Phan, Sergiy Ponomarov, Valerie Schneider, Eugene Shekhtman, K. Sirotkin, D. Slotta, Hua Zhang, Can Ko, Aravinda Abecasi, Aravinda (Co-Chair), Bartha (Co-Chair), G. Abecasis, K. Barnes, C. Beiswanger, E. Burchard, C. Bustamante, Hongyu Cai, H. Cao, R. Durbin, N. Gharani, Richard Gibbs, B. Henn, Danielle Jones, L. Jorde, J. Kaye, A. Kent (2012)
An integrated map of genetic variation from 1,092 human genomesNature, 491
David Wheeler, D. Church, Ron Edgar, S. Federhen, W. Helmberg, Thomas Madden, J. Pontius, G. Schuler, L. Schriml, Edwin Sequeira, Tugba Suzek, T. Tatusova, L. Wagner (2004)
Database resources of the National Center for Biotechnology Information: updateNucleic Acids Research, 32
(2010)
BigWig and BigBed: enabling browsing of large distributed datasets
D. Cooper, E. Ball, M. Krawczak (1998)
The human gene mutation databaseNucleic acids research, 26 1
(2010)
Deriving the consequences of genomic variants with the Ensembl API and SNP Effect PredictorBMC Bioinformatics, 26
The Consortium (2012)
Update on activities at the Universal Protein Resource (UniProt) in 2013Nucleic Acids Research, 41
Heng Li, R. Handsaker, Alec Wysoker, T. Fennell, Jue Ruan, Nils Homer, Gabor Marth, G. Abecasis, R. Durbin (2009)
The Sequence Alignment/Map format and SAMtoolsBioinformatics, 25
B. Paten, Dent Earl, N. Nguyen, M. Diekhans, D. Zerbino, D. Haussler (2011)
Cactus: Algorithms for genome multiple sequence alignment.Genome research, 21 9
W. Kent, C. Sugnet, T. Furey, K. Roskin, Tom Pringle, A. Zahler, D. Haussler (2002)
The human genome browser at UCSC.Genome research, 12 6
D. Karolchik, R. Baertsch, M. Diekhans, T. Furey, A. Hinrichs, Y. Lu, K. Roskin, M. Schwartz, C. Sugnet, D. Thomas, R. Weber, D. Haussler, W. Kent (2003)
The UCSC Genome Browser DatabaseNucleic acids research, 31 1
G. Hickey, B. Paten, Dent Earl, D. Zerbino, D. Haussler (2013)
HAL: a hierarchical format for storing and analyzing multiple genome alignmentsBioinformatics, 29
M. Haeussler, Martin Gerner, C. Bergman (2011)
Annotating genes and genomes with DNA sequences extracted from biomedical articlesBioinformatics, 27
P. Stenson, E. Ball, M. Mort, A. Phillips, Jacqueline Shiel, S. Abeysinghe, M. Krawczak, D. Cooper (2003)
Human Gene Mutation Database (HGMD
T. Barrett, S. Wilhite, Pierre Ledoux, Carlos Evangelista, Irene Kim, Maxim Tomashevsky, K. Marshall, K. Phillippy, Patti Sherman, Michelle Holko, A. Yefanov, H. Lee, Naigong Zhang, C. Robertson, N. Serova, S. Davis, Alexandra Soboleva (2012)
NCBI GEO: archive for functional genomics data sets—updateNucleic Acids Research, 41
L. Meyer, A. Zweig, A. Hinrichs, D. Karolchik, R. Kuhn, M. Wong, C. Sloan, K. Rosenbloom, Greg Roe, B. Rhead, B. Raney, A. Pohl, V. Malladi, Chin Li, Brian Lee, K. Learned, V. Kirkup, F. Hsu, S. Heitner, R. Harte, M. Haeussler, L. Guruvadoo, M. Goldman, B. Giardine, P. Fujita, T. Dreszer, M. Diekhans, M. Cline, H. Clawson, G. Barber, D. Haussler, W. Kent (2012)
The UCSC Genome Browser database: extensions and updates 2013Nucleic Acids Research, 41
M. Meyer, Martin Kircher, Marie-Theres Gansauge, Heng Li, F. Racimo, Swapan Mallick, J. Schraiber, F. Jay, Kay Prüfer, Cesare Filippo, Peter Sudmant, C. Alkan, Qiaomei Fu, R. Do, N. Rohland, Arti Tandon, Michael Siebauer, R. Green, K. Bryc, Adrian Briggs, U. Stenzel, Jesse Dabney, J. Shendure, J. Kitzman, M. Hammer, M. Shunkov, A. Derevianko, N. Patterson, A. Andrés, E. Eichler, M. Slatkin, D. Reich, J. Kelso, S. Pääbo (2012)
A High-Coverage Genome Sequence from an Archaic Denisovan IndividualScience, 338
P. Flicek, Ikhlak Ahmed, M. Amode, D. Barrell, Kathryn Beal, Simon Brent, D. Carvalho-Silva, P. Clapham, Guy Coates, S. Fairley, Stephen Fitzgerald, Laurent Gil, C. García-Girón, Leo Gordon, Thibaut Hourlier, S. Hunt, Thomas Juettemann, Andreas Kähäri, S. Keenan, M. Komorowska, Eugene Kulesha, Ian Longden, Thomas Maurel, W. McLaren, Matthieu Muffato, Rishi Nag, B. Overduin, M. Pignatelli, Bethan Pritchard, Emily Pritchard, H. Riat, G. Ritchie, Magali Ruffier, Michael Schuster, Daniel Sheppard, D. Sobral, K. Taylor, A. Thormann, S. Trevanion, S. White, S. Wilder, Bronwen Aken, E. Birney, Fiona Cunningham, I. Dunham, J. Harrow, Javier Herrero, T. Hubbard, Nathan Johnson, R. Kinsella, Anne Parker, Giulietta Spudich, A. Yates, A. Zadissa, S. Searle (2012)
Ensembl 2013Nucleic Acids Research, 41
(2011)
The variant call format and VCFtoolsBioinformatics, 27
D. Cooper, M. Krawczak (1996)
Human Gene Mutation DatabaseHuman Genetics, 98
J. Stamatoyannopoulos, M. Snyder, R. Hardison, B. Ren, T. Gingeras, D. Gilbert, M. Groudine, M. Bender, R. Kaul, T. Canfield, Erica Giste, Audra Johnson, Mia Zhang, Gayathri Balasundaram, Rachel Byron, Vaughan Roach, P. Sabo, R. Sandstrom, A. Stehling, R. Thurman, S. Weissman, Philip Cayting, M. Hariharan, Jin Lian, Yong Cheng, S. Landt, Zhihai Ma, B. Wold, J. Dekker, G. Crawford, C. Keller, Weisheng Wu, Christopher Morrissey, Swathi Kumar, Tejaswini Mishra, D. Jain, M. Byrska-Bishop, Daniel Blankenberg, Bryan Lajoie1, Gaurav Jain, Amartya Sanyal, Kaun-Bei Chen, Olgert Denas, James Taylor, G. Blobel, M. Weiss, M. Pimkin, Wulan Deng, G. Marinov, B. Williams, Katherine Fisher-Aylor, Gilberto Desalvo, Anthony Kiralusha, Diane Trout, Henry Amrhein, A. Mortazavi, Lee Edsall, David McCleary, Samantha Kuan, Yin Shen, Feng Yue, Z. Ye, Carrie Davis, C. Zaleski, Sonali Jha, Chenghai Xue, A. Dobin, Wei Lin, Meagan Fastuca, Huaien Wang, R. Guigó, S. Djebali, Julien Lagarde, T. Ryba, Takayo Sasaki, V. Malladi, M. Cline, V. Kirkup, K. Learned, K. Rosenbloom, W. Kent, E. Feingold, P. Good, M. Pazin, R. Lowdon, Leslie Adams (2012)
An encyclopedia of mouse DNA elements (Mouse ENCODE)Genome Biology, 13
(2013)
Database resources of the National Center for Biotechnology InformationNucleic Acids Res., 41
D. Adams, L. Altucci, S. Antonarakis, J. Ballesteros, S. Beck, A. Bird, C. Bock, B. Boehm, E. Campo, A. Caricasole, Fredrik Dahl, E. Dermitzakis, T. Enver, M. Esteller, X. Estivill, A. Ferguson-Smith, J. Fitzgibbon, P. Flicek, C. Giehl, T. Graf, F. Grosveld, R. Guigó, I. Gut, K. Helin, Jonas Jarvius, R. Küppers, H. Lehrach, Thomas Lengauer, Å. Lernmark, David Leslie, M. Loeffler, E. Macintyre, A. Mai, J. Martens, S. Minucci, W. Ouwehand, P. Pelicci, Hélène Pendeville, B. Porse, V. Rakyan, W. Reik, M. Schrappe, D. Schübeler, M. Seifert, R. Siebert, David Simmons, N. Soranzo, S. Spicuglia, M. Stratton, H. Stunnenberg, A. Tanay, D. Torrents, A. Valencia, E. Vellenga, M. Vingron, J. Walter, S. Willcocks (2012)
BLUEPRINT to decode the epigenetic signature written in bloodNature Biotechnology, 30
K. Pruitt, J. Harrow, R. Harte, Craig Wallin, M. Diekhans, D. Maglott, S. Searle, C. Farrell, J. Loveland, B. Ruef, E. Hart, M. Suner, M. Landrum, Bronwen Aken, S. Ayling, R. Baertsch, J. Fernandez-Banet, J. Cherry, V. Curwen, Michael DiCuccio, Manolis Kellis, Jennifer Lee, Michael Lin, Michael Schuster, Andrew Shkeda, C. Amid, Garth Brown, Oksana Dukhanina, A. Frankish, Jennifer Hart, B. Maidak, Jonathan Mudge, Michael Murphy, Terence Murphy, J. Rajan, B. Rajput, Lillian Riddick, Catherine Snow, C. Steward, David Webb, Janet Weber, L. Wilming, Wenyu Wu, E. Birney, D. Haussler, T. Hubbard, J. Ostell, R. Durbin, D. Lipman (2009)
The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.Genome research, 19 7
R. Apweiler, A. Bateman, M. Martin, C. O’Donovan, M. Magrane, Y. Alam-Faruque, E. Alpi, R. Antunes, J. Arganiska, E. Casanova, B. Bely, M. Bingley, C. Bonilla, R. Britto, Borisas Bursteinas, W. Chan, G. Chavali, Elena Cibrián-Uhalte, A. Silva, M. Giorgi, Tunca Dogan, F. Fazzini, P. Gane, Lg Castro, Penelope Garmiri, E. Hatton-Ellis, R. Hieta, R. Huntley, D. Legge, W. Liu, J. Luo, Alistair MacDougall, P. Mutowo, Andrew Nightingale, S. Orchard, K. Pichler, D. Poggioli, S. Pundir, L. Pureza, G. Qi, S. Rosanoff, Rabie Saidi, T. Sawford, A. Shypitsyna, E. Turner, Volynkin, T. Wardell, X. Watkins, H. Zellner, M. Corbett, M. Donnelly, P. Rensburg, Mickael Goujon, Hamish McWilliam, R. Lopez, I. Xenarios, L. Bougueleret, A. Bridge, S. Poux, Nicole Redaschi, L. Aimo, A. Auchincloss, K. Axelsen, Parit Bansal, Delphine Baratin, P. Binz, M. Blatter, B. Boeckmann, Jerven Bolleman, E. Boutet, L. Breuza, C. Casal-Casas, E. Castro, L. Cerutti, E. Coudert, Béatrice Cuche, M. Doche, D. Dornevil, S. Duvaud, A. Estreicher, L. Famiglietti, M. Feuermann, E. Gasteiger, S. Gehant, Gerritsen, A. Gos, N. Gruaz-Gumowski, U. Hinz, C. Hulo, J. James, F. Jungo, G. Keller, Lara, P. Lemercier, J. Lew, D. Lieberherr, T. Lombardot, X. Martin, P. Masson, A. Morgat, T. Neto, S. Paesano, I. Pedruzzi, S. Pilbout, Monica Pozzato, Manuela Pruess, C. Rivoire, B. Roechert, Michel Schneider, Christian Sigrist, K. Sonesson, S. Staehli, A. Stutz, S. Sundaram, M. Tognolli, L. Verbregue, A. Veuthey, Cathy Wu, C. Arighi, L. Arminski, Chuming Chen, Youhai Chen, J. Garavelli, Hongzhan Huang, K. Laiho, P. McGarvey, D. Natale, Baris Suzek, C. Vinayaka, Q. Wang, Y. Wang, L. Yeh, Yerramalla, J. Zhang (2013)
Activities at the Universal Protein Resource (UniProt)Nucleic Acids Research, 42
Xiaoming Liu, X. Jian, E. Boerwinkle (2013)
dbNSFP v2.0: A Database of Human Non‐synonymous SNVs and Their Functional Predictions and AnnotationsHuman Mutation, 34
Sarah Burge, J. Daub, R. Eberhardt, J. Tate, Lars Barquist, Eric Nawrocki, S. Eddy, P. Gardner, A. Bateman (2012)
Rfam 11.0: 10 years of RNA familiesNucleic Acids Research, 41
B. Bernstein, J. Stamatoyannopoulos, J. Costello, B. Ren, A. Milosavljevic, A. Meissner, Manolis Kellis, M. Marra, A. Beaudet, J. Ecker, P. Farnham, M. Hirst, E. Lander, T. Mikkelsen, J. Thomson (2010)
The NIH Roadmap Epigenomics Mapping ConsortiumNature Biotechnology, 28
M. Blanchette, Abdoulaye Diallo, E. Green, W. Miller, D. Haussler (2008)
Computational reconstruction of ancestral DNA sequences.Methods in molecular biology, 422
W. McLaren, Bethan Pritchard, Daniel Rios, Yuan Chen, P. Flicek, Fiona Cunningham, Alfonso Valencia
Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor
Genome Biol
(2010)
Sequence analysis Advance Access publication June 7, 2011 The variant call format and VCFtools
(2013)
Nucleic Acids Res
(2014)
Database issue D769 Conflict of interest statement
(2006)
The UCSC Known Genes
National Cancer Institute [1U41HG007234 subcontract 2186-03 to European Molecular Biology Organization Long- Term Fellowship (in part) [ALTF 292-2011 to M
B. Raney, T. Dreszer, G. Barber, H. Clawson, P. Fujita, Ting Wang, N. Nguyen, B. Paten, A. Zweig, D. Karolchik, W. Kent (2013)
Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome BrowserBioinformatics, 30
W. Kent, A. Zweig, G. Barber, A. Hinrichs, D. Karolchik (2010)
BigWig and BigBed: enabling browsing of large distributed datasetsBioinformatics, 26
D764–D770 Nucleic Acids Research, 2014, Vol. 42, Database issue Published online 21 November 2013 doi:10.1093/nar/gkt1168 1, 1 1 1 Donna Karolchik *, Galt P. Barber , Jonathan Casper , Hiram Clawson , 1 1 1 1 Melissa S. Cline , Mark Diekhans , Timothy R. Dreszer , Pauline A. Fujita , 1 1 1 1 Luvina Guruvadoo , Maximilian Haeussler , Rachel A. Harte , Steve Heitner , 1 1 1 1 1 Angie S. Hinrichs , Katrina Learned , Brian T. Lee , Chin H. Li , Brian J. Raney , 2 1 3 1 Brooke Rhead , Kate R. Rosenbloom , Cricket A. Sloan , Matthew L. Speir , 1 1,4 1 1 Ann S. Zweig , David Haussler , Robert M. Kuhn and W. James Kent Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA Received September 18, 2013; Revised and Accepted October 30, 2013 INTRODUCTION ABSTRACT The University of California Santa Cruz (UCSC) Genome The University of California Santa Cruz (UCSC) Browser (1,2) at http://genome.ucsc.edu is a web-based Genome Browser (http://genome.ucsc.edu) offers resource for the scientific, medical and academic online public access to a growing database of research communities that provides timely, convenient genomic sequence and annotations for a large access to a database of high-quality genome sequence collection of organisms, primarily vertebrates, with and annotations. The Browser tools facilitate the visual- an emphasis on the human and mouse genomes. ization, comparison and analysis of both hosted and The Browser’s web-based tools provide an user-generated data sets ranging from a genome-wide integrated environment for visualizing, comparing, perspective down to the base level. analysing and sharing both publicly available The Genome Browser database contains genome and user-generated genomic data sets. As of sequence from GenBank (3) for a wide selection of organ- September 2013, the database contained genomic isms, many with multiple assembly versions. In September sequence and a basic set of annotation ‘tracks’ 2013 our database included 13 primates, 33 additional for 90 organisms. Significant new annotations mammals, 17 non-mammalian vertebrates, 13 insects, include a 60-species multiple alignment conserva- 6 worms and 5 other invertebrates. Annotation data for each genome assembly are displayed graphically as tion track on the mouse, updated UCSC Genes ‘tracks’ aligned to the genomic sequence and grouped ac- tracks for human and mouse, and several new cording to shared characteristics, such as gene predictions sets of variation and ENCODE data. New software or comparative genomics. The level of annotation varies tools include a Variant Annotation Integrator that among organisms. At a minimum, most assemblies offer returns predicted functional effects of a set of mapping and sequence annotation tracks describing variants uploaded as a custom track, an extension assembly, gap and GC content, and alignments of to UCSC Genes that displays haplotype alleles mRNA, EST and RefSeq (3) genes (available on approxi- for protein-coding genes and an expansion of mately one-half of the assemblies) from GenBank. Some data hubs that includes the capability to display assemblies provide additional gene annotation tracks, remotely hosted user-provided assembly sequence such as Ensembl Genes (4) and Human Proteins, as well in addition to annotation data. To improve European as multiple sequence alignments (multiz) (5) and pairwise access, we have added a Genome Browser mirror genomic alignments between assemblies to facilitate (http://genome-euro.ucsc.edu) hosted at Bielefeld comparative and evolutionary investigations. The heavily University in Germany. annotated human genome offers extensive conservation *To whom correspondence should be addressed. Tel: +1 831 459 1571; Fax: +1 831 459 1809; Email: [email protected] The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2014, Vol. 42, Database issue D765 and evolutionary comparisons, a large collection of gene information). Instructions for downloading the data, soft- models including the locally generated UCSC Genes track ware and source code may be found at http://hgdownload. (6,7), regulation, expression, epigenetics and tissue differ- soe.ucsc.edu/downloads.html. entiation, variation, phenotype and disease association The following sections highlight the genome assembly data, and data that have been text-mined from publica- and annotation data sets added to the Genome Browser tions. Much of our annotation data is obtained through since the last update in this journal and describe the signifi- external collaboration. When available, links are provided cant new features and capabilities of our data access tools. to the complementary annotations in the Ensembl and NCBI browsers, and to supplementary information on other websites. GENOME BROWSER DATA SETS The Genome Browser serves as the repository for human New genome assemblies and mouse genome data that was contributed through During the past year the UCSC team added 35 vertebrate September 2012 by the Encyclopedia of DNA Elements assemblies to the Genome Browser (Table 1), including (ENCODE) Consortium (8,9). During the transition of the premier releases of 20 species. In line with our focus the ENCODE Data Coordination Center role to a joint col- on primates and other vertebrates, the group of newly laboration with Stanford University, the Genome Browser introduced species features 4 primates (baboon, mouse team has continued to add significant new content to the lemur, squirrel monkey and tarsier), 12 additional ENCODE data portal (http://encodeproject.org) and mammals (alpaca, dolphin, ferret, hedgehog, kangaroo publish newly reprocessed ENCODE data sets (10). rat, manatee, megabat, rock hyrax, shrew, sloth, In addition to the native data sets local to the UCSC southern white rhinoceros and tree shrew) and 4 add- servers, the Genome Browser offers several options to itional vertebrates (American alligator, Atlantic cod, users for viewing their own sequence and annotations: budgerigar and coelacanth). Several of the new assemblies track and assembly data hubs, custom tracks and were added to support the generation of the 60-species sessions. Alternatively, the Genome Browser database and tools may be installed on a local server for customized Conservation track released in 2012 on the GRCm38/ use (see http://genome.ucsc.edu/license/ for more mm10 mouse assembly, and many of these were originally Table 1. New and updated genome assemblies added to the Genome Browser since September 2012 Common name Scientific name Sequencing center UCSC ID Seq. ctr ID Primates Baboon Papio hamadryas Baylor College of Medicine HGSC papHam1 Pham_1.0 Baboon Papio anubis Baylor College of Medicine HGSC papAnu2 Panu_2.0 Bushbaby Otolemur garnettii Broad Institute otoGar3 OtoGar3 Chimpanzee Pan troglodytes Chimpanzee Sequencing and Analysis Consortium panTro4 Build 2.1.4 Gibbon Nomascus leucogenys Gibbon Genome Sequencing Consortium nomLeu2 Nleu1.1 nomLeu3 Nleu3.0 Mouse lemur Microcebus murinus Broad Institute micMur1 MicMur1.0 Rhesus macaque Macaca mulatta Beijing Genomics Institute rheMac3 CR_1.0 Squirrel monkey Saimiri boliviensis Broad Institute saiBol1 SaiBol1.0 Tarsier Tarsius syrichta Broad Institute tarSyr1 Tarsyr1.0 Other mammals Alpaca Vicugna pacos Broad Institute vicPac1 VicPac1.0 vicPac2 VicPac2.0 Armadillo Dasypus novemcinctus Baylor College of Medicine HGSC dasNov3 DasNov3 Cat Felis catus International Cat Genome Sequencing Consortium felCat5 Felis_catus-6.2 Dolphin Tursiops truncatus Baylor College of Medicine HGSC turTru2 Ttru_1.4 Ferret Mustela putorius furo Ferret Genome Sequencing Consortium musFur1 MusPutFur1.0 Hedgehog Erinaceus europaeus Broad Institute eriEur1 Draft_v1 Kangaroo rat Dipodomys ordii Baylor College of Medicine HGSC, Broad Institute dipOrd1 DipOrd1.0 Manatee Trichechus manatus latirostris Broad Institute triMan1 TriManLat1.0 Megabat Pteropus vampyrus Broad Institute pteVam1 PteVap1.0 Naked mole rat Heterocephalus glaber Broad Institute hetGla2 HetGla_female_1.0 Pig Sus scrofa Swine Genome Sequencing Consortium susScr3 Sscrofa10.2 Pika Ochotona princeps Broad Institute ochPri2 OchPri2 Rock hyrax Procavia capensis Baylor College of Medicine HGSC proCap1 Procap1.0 Shrew Sorex araneus Broad Institute sorAra1 SorAra1.0 Sloth Choloepus hoffmanni Broad Institute choHof1 ChoHof1.0 Southern white rhinoceros Ceratotherium simum simum Broad Institute cerSim1 cerSimSim1.0 Squirrel Spermophilus tridecemlineatus Broad Institute speTri2 SpeTri2.0 Tree shrew Tupaia belangeri Broad Institute tupBel1 Tupbel1.0 Other vertebrates American alligator Alligator mississippiensis Int’l Crocodilian Genomes Working Group allMis1 allMis0.2 Atlantic cod Gadus morhua Genofisk gadMor1 GadMor_May2010 Budgerigar Melopsittacus undulatus Genome Institute at Wash. Univ. St. Louis melUnd1 v6.3 Coelacanth Latimeria chalumnae Broad Institute latCha1 LatCha1 Lamprey Petromyzon marinus Genome Institute at Wash. Univ. St. Louis petMar2 WUGSC 7.0 Nile tilapia Oreochromis niloticus Broad Institute oreNil2 OreNil1.1 The ‘UCSC ID’ column shows the Genome Browser database designation for the genome assembly. D766 Nucleic Acids Research, 2014, Vol. 42, Database issue Table 2. New and updated annotation data sets added to the sequenced and assembled for the Mammalian Genome Genome Browser between September 2012 and September 2013 Project (11). We plan to release a preliminary Browser with a minimal annotation set on the new GRCh38/hg38 Annotation track Assembly human assembly in late 2013 or early 2014. Beginning with this new release, the numeric portion of the UCSC human Human genome assembly version name will match the Genome Reference 1000 Genomes Phase 1 Integrated Variant Calls hg19 1000 Genomes Phase 1 Paired-end Accessible Regions hg19 Consortium version number to reduce confusion. Affymetrix CytoScan HD Array hg19 As the number of vertebrate assemblies deposited into Coriell Cell Line Copy Number Variants hg19 GenBank increases, we continue to explore options for Denisova: Modern Human Derived, Sequence hg19 providing timely, maximum coverage of genome Reads, Variant Calls, Variant Calls from 11 Modern Human Genome Sequences assemblies in the Genome Browser. Assembly data hubs DGV: Structural Variation hg18-19 (described below) offer a potential solution for DNaseI Hypersensitivity Uniform Peaks— hg19 streamlining our process for hosting genome assemblies, ENCODE/Analysis as well as providing our users with an easy way to visualize ENCODE Regulation: DNaseI HS Clusters, hg19 Transaction Factor ChiP-seq Clusters and share their own genome sequences in the Genome GENCODE Genes v14, v17 hg19 Browser. GeneReviews hg18-19 GRCh37 Patch 10 hg19 New and updated annotations GWAS Catalog of Published Genome-Wide hg18-19 Association Studies We added many new annotation data sets to the Genome Human Gene Mutation Database (HGMD) hg19 Browser in the past year, and several existing data sets Leiden Open Variation Database (LOVD) hg19 underwent major revisions. Our human and mouse Pfam domains in UCSC Genes hg19 assemblies, which receive the bulk of attention from our Proteogenomics and GENCODE Mapping—ENCODE hg19 qPCR Primers hg19 user community, are the most richly annotated. This Reactome v41 hg17-19 section highlights some of the new annotation tracks Retroposed Genes hg19 released this year. See Table 2 for a complete list of SNPs (Build 137): All SNPs, Common SNPs, hg19 recent releases. Flagged SNPs, Mult. SNPs SNPs (Build 138): All SNPs, Common SNPs, hg19 Flagged SNPs, Mult. SNPs Gene annotations Transcription Factor ChIP-seq Uniform Peaks— hg19 The UCSC Genes track, which includes protein-coding ENCODE/Analysis genes and non-coding RNA genes from RefSeq, UCSC Genes hg19 GenBank, CCDS (12), Rfam (13) and the tRNA Genes UniProt Mutations hg19 track (14), was updated on both the GRCh37/hg19 Mouse genome 60-species Conservation mm10 human and GRCm38/mm10 mouse assemblies. The GRC Incident Database mm10 human UCSC Genes set increased by 2038 transcripts to a GRCm38 Patch Release 1 mm10 total of 82 960 transcripts, 92% of which did not change Mouse strain variants mm10 between versions. The number of genes, defined as clusters qPCR Primers mm10 Reactome v.41 mm8-9 of transcripts with overlapping exons on the same strand, SNPs (Build 137) mm10 increased by 621 genes to 31 848. In the mouse UCSC Genes UCSC Genes mm10 set, the number of transcripts grew by 3702 transcripts to Cow genome 59 121, with 88% remaining the same between versions. The NumtS Nuclear Mitochondrial Sequences bosTau6 Pig genome number of genes increased by 2566 genes to 31 227. For NumtS Nuclear Mitochondrial Sequences susScr2 more information on the latest methods used to generate Multiple genomes the UCSC Genes data, refer to the description pages that Ensembl Genes Many accompany the tracks. We also updated the GENCODE Human proteins Many Genes (15) track on the latest human assembly to version 17. Publications track Many Variation data We update our SNP annotations for the human and mouse (and occasionally for other species) whenever a new version coloring and filtering options for configuring the Genome is released by dbSNP. The latest human and mouse Browser display. assemblies were updated to dbSNP Build 137 in 2012–13, This year we released three new tracks that describe and the human assembly SNP tracks were updated to human disease-associated genetic variation based on dbSNP Build 138 in October 2013. The annotation curated public data in the Leiden Open Variation includes an ‘All SNPs’ track that contains all mappings Database (LOVD) (16), the Human Gene Mutation of reference SNPs to the human assembly, as well as Database (HGMD) (17) and amino acid mutations in three SNP subsets: Common SNPs (those with at least the UniProt database (18). We also added two annotation 1% minor allele frequency), Flagged SNPs (annotated by sets based on Phase 1 sequencing data from the 1000 dbSNP as ‘clinical’) and Mult. SNPs (those that map to Genomes Project (19). The integrated variant calls track, multiple genomic loci, and therefore should be viewed with 1000G Ph1 Vars, shows single nucleotide variants (SNVs), suspicion). The updated tracks contain additional annota- indels and structural variants (SVs) that have been phased tion data not included in previous dbSNP tracks, and offer into independent haplotypes, which the Genome Browser Nucleic Acids Research, 2014, Vol. 42, Database issue D767 clusters by local similarity for display. The paired-end bands and gene symbols that have been text-mined from accessible regions track, 1000G Ph1 Accsbl, shows biomedical articles in Elsevier, PubMed Central and other databases (23). In the past year we have doubled the which genome regions are more or less accessible to number of research articles to more than 5 million, and next-generation sequencing methods that use short, now classify them into different categories (disease related, paired-end reads. protein structure, cis-regulatory, etc.) depending on their Comparative alignments keyword content. The categories are differentiated by The 60-species multiple alignment and conservation track color in the display, which can be filtered by categories released in 2012 for the GRCm38/mm10 mouse assembly or publishers. was the largest comparative alignment track generated by Denisova data UCSC to date. In 2013, we undertook an ambitious In February 2013, we released a set of Denisova annota- project to produce a 100-species conservation track on tions tracks in conjunction with the publication of a paper the GRCh37/hg19 human assembly, released to the by Meyer et al. (24). The sequence data were derived by public in Nov. 2013. As part of this undertaking we have been evaluating software alternatives, such as applying a novel single-stranded DNA library preparation Cactus (20), to extend the scalability of our multiple align- method to DNA previously extracted from 40 mg of a ment pipeline, which has been challenged by the increasing phalanx bone excavated from Denisova Cave in the number of species. Altai Mountains of southern Siberia. The Genome Browser tracks show mappings to the human reference ENCODE data sequence of high-coverage Denisova sequence reads, In the past year UCSC has focused on improving the acces- variant calls from sequence reads of 11 modern individuals sibility and usability of the ENCODE data hosted in the and an archaic Denisovan individual, and mutations in the Genome Browser. The ENCODE Analysis Working modern human lineage that rose to fixation or near Group (AWG) reprocessed the transcription factor ChIP- fixation since the split from the last common ancestor seq and DNaseI HS peak call data sets released through with Denisovans, along with predicted functional effects March 2012 using the uniform processing pipeline from the Ensembl Variant Effect Predictor (25). developed for the ENCODE Integrative Analysis effort. This reprocessing factored out many of the cross-lab differ- ences, allowing the different data sets to be used more ef- GENOME BROWSER SOFTWARE UPDATES fectively in the same analyses. These reprocessed data sets Track and assembly data hubs were released on the Genome Browser as the Transcription In 2011 we introduced track data hubs (26), a means for Factor ChIP-seq Uniform Peaks track (within the ENC TF users to import collections of their own locally hosted Binding super-track) and the DNaseI Hypersensitivity genome annotations into the Genome Browser where Uniform Peaks track (within the ENC DNase/FAIRE they may be organized, configured and viewed alongside super-track). The new data sets that met a specific native tracks. Track hubs now support four compressed integrated quality metric defined by the AWG (http:// binary indexed file formats: BigBed and BigWig (27), genome.ucsc.edu/ENCODE/qualityMetrics.html) were both developed at UCSC, BAM (28) and VCF/tabix (29). then used to update the individual Transcription Factor As genome sequencing becomes more accessible and cost- ChIP-seq Clusters and Digital DNaseI Hypersensitivity effective, we have faced a growing demand from researchers Clusters tracks within the ENCODE Integrated who wish to use the Genome Browser tools to browse and Regulation super-track set, providing summary clustered annotate genome sequences for which we do not host a views. ENCODE data hosted in the Browser has now database. In response to this need, we have extended the been fully accessioned through the Gene Expressions functionality of track data hubs to encompass entire Omnibus (GEO) repository (21) and cross-linked back to assemblies that are not hosted natively on the Genome UCSC. We also added the ENCODE Integrative Analysis Browser. These ‘assembly data hubs’ enable researchers Data Hub to the Genome Browser public hubs page (http:// to import both the underlying reference sequence as well genome.ucsc.edu/cgi-bin/hgHubConnect) to provide easy, as data tracks annotating that sequence into the Genome integrated access to AWG data. Together with the Browser for display and analysis. The genome sequence is Roadmap Epigenomic data track hub (22), the ENCODE stored in the UCSC .2bit format and made available on the data provide a comprehensive look at DNA landmarks user’s remote web server, along with optional annotation across a large number of tissues. data files stored in the same compressed binary formats The Genome Browser currently hosts a large amount of supported by track data hubs. Track and assembly data ChIP-seq data on transcription factors, many of which bind hubs can be shared with others by providing the URL of to specific DNA motifs. In late 2013, we plan to release an the hub.txt file needed to load the hub. Hubs of general extension to this data type that displays the location of interest to the research community can be registered at motifs within the peak and shows the sequence logo and UCSC for sharing on the Genome Browser website. We matching score on the track details page for the peak. offer a growing collection of publicly shared track and Publications data assembly data hubs on the ‘Public Hubs’ tab on the In 2012, we introduced a Publications track that shows Genome Browser Track Data Hubs web page (http:// mapped DNA and protein sequences, SNPs, cytogenetic genome.ucsc.edu/cgi-bin/hgHubConnect), including data D768 Nucleic Acids Research, 2014, Vol. 42, Database issue sets from the ENCODE AWG, the Roadmap Epigenomics effect (e.g., synonymous, missense, frameshift, intronic) Project (22) and the Blueprint Epigenome Project (30). for each variant. The VAI can also provide several other For more information about creating and using assembly types of relevant information, such as the dbSNP identifier data hubs, refer to http://genomewiki.ucsc.edu/index.php/ if the variant is found in dbSNP, protein damage scores Assembly_Hubs and http://genome.ucsc.edu/goldenPath/ for missense variants from the Database of Non-syn- help/hgTrackHubHelp.html. onymous Functional Predictions (dbNSFP) (31) and con- servation scores computed from multiple-species Variant annotation integrator alignments. Filters are available to focus results on the variants of greatest interest. The VAI can be accessed To assist researchers in annotating and prioritizing thou- from the Genome Browser ‘Tools’ menu or through the sands of variant calls from sequencing projects, we have VAI button on the ‘Manage Custom Tracks’ page that developed a new software tool, the Variant Annotation Integrator (VAI). Given a set of variants uploaded as a displays after a custom track is loaded into the Browser. custom track in either Personal Genome SNP (pgSnp) or For more information about the VAI, see http://genome. VCF format, the VAI returns the predicted functional ucsc.edu/cgi-bin/hgVai. Figure 1. The haplotype alleles display for the ABO gene, which encodes proteins related to the ABO blood group system. A large portion of the ‘Predicted full sequence’ section is truncated in the upper image for display purposes, and is shown in greater detail in the lower image. The leftmost columns of the top image indicate the frequency of each allele haplotype within the 1000 Genomes sample and the occurrence of homozygosity for each allele. In this instance the haplotype alleles display has been expanded to show the distribution of the haplotypes across the major 1000 Genomes population groups. The ‘Variant Sites’ columns summarize the non-synonymous variant sites that occur in at least 1% of the subject chromosomes, with the value from the reference genome (in this case GRCh37/hg19) indicated at the top of each variant column. In all but one case, the ‘O’ phenotype results from a common insertion (indicated by ‘-’ in the reference) causing a frameshift (indicated by ‘[]’) that results in a downstream premature stop codon, thus truncating the protein. Note that although certain haplotyes are more frequently found within one popu- lation, the insertion that gives rise to the majority of ‘O’ phenotypes is found across all populations, which may indicate that the insertion predates the most recent migration out of Africa. On the other hand, the haplotype in which the SNP variant introduces a stop codon at the variant site may have arisen in the Americas. The zoomed-in view of the ‘Predicted full sequence’ section in the bottom image shows the reference sequence (top row) and sequences incorporating the common non-synonymous variants. The residues corresponding to the variant sites are highlighted by green vertical bars, the site corresponding to the frameshift-causing insertion is highlighted by a blue bar and changes to the reference amino acid sequence are shown in red. Nucleic Acids Research, 2014, Vol. 42, Database issue D769 Gene haplotype alleles We plan to release a preliminary Browser with a minimal annotation set on the new GRCh38/hg38 human assem- We have extended the protein-coding genes detail pages in bly in late 2013 or early 2014. New annotation data the UCSC Genes track on the GRCh37/hg19 human display types and features will be added as required by assembly to include a section that displays and compares new data sets. We plan to extend track hubs to support ‘gene haplotype alleles’ generated from phased chromo- new file formats, such as the HAL hierarchical multiple somal data from Phase 1 of the 1000 Genomes Project (19) alignment format (32), and to allow searching for tracks (Figure 1). Each haplotype allele is a distinct set of variants within a hub. The VAI will be expanded to include more found on at least one of the 1000 Genomes subject chromo- input/upload options, output formats and annotation somes. By default the common non-synonymous variants options. (those of at least 1% frequency) are displayed, although rare haplotypes are optionally available. The Browser shows the frequency of each haplotype in the 1000 Genomes popula- CONTACTING US tions and indicates the frequency with which it occurs To stay on top of the latest Genome Browser announce- homozygously. Unexpected frequencies of occurrence ments, genome assembly releases, new software features, may be used to identify alleles that merit further study. updates and training seminars, subscribe to the genome- Predicted protein sequence for common haplotypes can [email protected] mailing list or follow also be displayed, allowing differences among alleles to be @GenomeBrowser on Twitter. We have two public, used to identify differences at the amino acid level. To access moderated mailing lists for interactive user support: gen- the gene haplotype alleles information, go to the details page [email protected] for general questions about the for any protein-coding gene in the UCSC Genes track Genome Browser and [email protected] for (GRCh37/hg19 assembly) and click the ‘Gene Alleles’ link questions specific to the setup and maintenance of in the ‘Page Index’ matrix. For more information, see http:// Genome Browser mirrors. Messages sent to these lists genome.ucsc.edu/goldenPath/help/haplotypes.html. are archived on public, searchable Google Groups Updates to Browser display and navigation forums. You may also reach us privately at genome- [email protected], the preferred address for inquiring During the past year, we have made several improvements about mirror site licenses, reporting server errors or con- to the Genome Browser web interface, many in response tacting us about confidential issues. You will find to requests from our users. We have updated the naviga- complete contact information, links to the browser’s tion menus for much of the website and simplified the Google Groups forums and access to our user suggestion background on the Genome Browser tracks page. The box at http://genome.ucsc.edu/contacts.html. Browser now offers chromosome ideograms for genome assemblies that do not have a microscopically derived cytology. The drag-reorder feature in the Browser image ACKNOWLEDGEMENTS now supports the vertical dragging of subtracks to any The authors would like to thank the many data contribu- location in the image. We have also made display tors and collaborators whose work makes the Genome improvements to overlay wiggle tracks, and have Browser possible, our Scientific Advisory Board for improved the display speed of bigDataUrl custom tracks. guiding our efforts, our users for their consistent User training and mirror support support and valuable feedback, and our outstanding team of system administrators: Jorge Garcia, Erich We continually update and expand our documentation and Weiler and Gary Moro. training materials, which offer extensive information on using the Genome Browser tools to explore UCSC-hosted data sets as well as custom sequence and annotation data FUNDING hosted at user sites. We are broadening our onsite training program to include several additional geographical regions. This work was supported by the National Human To better support our Genome Browser mirror sites and Genome Research Institute [5U4 HG002371 to G.P.B., source code users, we have rearchitected the software H.C., J.C., T.R.D., P.A.F., L.G., S.H., A.S.H., M.H., makefile system for our utilities and command-line tools D.K., W.J.K., R.M.K., K.L., B.T.L., C.H.L, B.J.R., to allow the compilation of specific tools independent of a B.R., M.L.S. and A.S.Z.; 1U41HG006992 subcontract full Browser installation. We have also adopted UDR 60141508-106846-A to D.K., W.J.K., K.L., M.S.C., (https://github.com/LabAdvComp/UDR), a new package T.R.D., B.J.R., K.R.R., C.A.S. and A.S.Z]; National that integrates rsync with the high-performance network Institute of Dental and Craniofacial Research protocol UDT, allowing quicker transfers of our large [5U01DE020057 subcontract 1000736806 to G.P.B. and data sets to remote mirror sites. R.M.K.]; National Cancer Institute [1U41HG007234 sub- contract 2186-03 to M.D. and R.H.; 5U24CA143858 to M.C.]. European Molecular Biology Organization Long- FUTURE PLANS Term Fellowship (in part) [ALTF 292-2011 to M.H.]; During the upcoming year we will continue to add new Howard Hughes Medical Institute fellow (to and updated genome assemblies for vertebrate organisms D.H.). Funding for open access charge: National as they become available in NCBI’s GenBank repository. Human Genome Research Institute. D770 Nucleic Acids Research, 2014, Vol. 42, Database issue Searle,S. et al. (2012) GENCODE: the reference human genome Conflict of interest statement. G.P.B., H.C., M.D., T.R.D., annotation for The ENCODE Project. Genome Res., 22, P.A.F., L.G., D.H., R.A.H., S.H., A.S.H., D.K., W.J.K., 1760–1774. R.M.K., K.L., C.H.L., B.J.R., B.R., K.R.R., C.A.S. and 16. Fokkema,I.F., Taschner,P.E., Schaafsma,G.C., Celli,J., Laros,J.F. A.S.Z. receive royalties from the sale of UCSC Genome and den Dunnen,J.T. (2011) LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat., 32, 557–563. Browser source code licenses to commercial entities. 17. Stenson,P.D., Mort,M., Ball,E.V., Howells,K., Phillips,A.D., W.J.K. works for Kent Informatics. Thomas,N.S. and Cooper,D.N. (2009) The Human Gene Mutation Database (HGMD ): 2008 Update. Genome Med., 1, 13. 18. The UniProt Consortium. (2013) Update on activities at the REFERENCES Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res., 41, D43–D47. 1. Kent,W.J., Sugnet,C.W., Furey,T.S., Roskin,K.M., Pringle,T.H., 19. Genomes Project Consortium, Abecasis,G.R., Auton,A., Zahler,A.M. and Haussler,D. (2002) The human genome browser Brooks,L.D., DePristo,M.A., Durbin,R.M., Handsaker,R.E., at UCSC. Genome Res., 12, 996–1006. Kang,H.M., Marth,G.T. and McVean,G.A. (2012) An integrated 2. Meyer,L.R., Zweig,A.S., Hinrichs,A.S., Karolchik,D., Kuhn,R.M., map of genetic variation from 1,092 human genomes. Nature, Wong,M., Sloan,C.A., Rosenbloom,K.R., Roe,G., Rhead,B. et al. 491, 56–65. (2013) The UCSC Genome Browser database: extensions and 20. Paten,B., Earl,D., Nguyen,N., Diekhans,M., Zerbino,D. and updates 2013. Nucleic Acids Res., 41, D64–D69. Haussler,D. (2011) Cactus: algorithms for genome multiple 3. NCBI Resource Coordinators. (2013) Database resources of the sequence alignment. Genome Res., 21, 1512–1528. National Center for Biotechnology Information. Nucleic Acids 21. Barrett,T., Wilhite,S.E., Ledoux,P., Evangelista,C., Kim,I.F., Res., 41, D8–D20. Tomashevsky,M., Marshall,K.A., Phillippy,K.H., Sherman,P.M., 4. Flicek,P., Ahmed,I., Amode,M.R., Barrell,D., Beal,K., Brent,S., Holko,M. et al. (2013) NCBI GEO: archive for functional Carvalho-Silva,D., Clapham,P., Coates,G., Fairley,S. et al. (2013) genomics data sets–update. Nucleic Acids Res., 41, D991–D995. Ensembl 2013. Nucleic Acids Res., 41, D48–D55. 22. Bernstein,B.E., Stamatoyannopoulos,J.A., Costello,J.F., Ren,B., 5. Blanchette,M., Diallo,A.B., Green,E.D., Miller,W. and Milosavljevic,A., Meissner,A., Kellis,M., Marra,M.A., Haussler,D. (2007) Computational reconstruction of ancestral Beaudet,A.L., Ecker,J.R. et al. (2010) The NIH Roadmap DNA sequences. In: Murphy,W.J. (ed.), Methods in Molecular Epigenomics Mapping Consortium. Nat. Biotechnol., 28, 1045–1048. Biology: Phylogenomics. Springer, New York, pp. 171–184. 23. Haeussler,M., Gerner,M. and Bergman,C.M. (2011) Annotating 6. Hsu,F., Kent,W.J., Clawson,H., Kuhn,R.M., Diekhans,M. and genes and genomes with DNA sequences extracted from Haussler,D. (2006) The UCSC Known Genes. Bioinformatics, 22, biomedical articles. Bioinformatics, 27, 980–986. 1036––1046. 24. Meyer,M., Kircher,M., Gansauge,M.T., Li,H., Racimo,F., 7. Karolchik,D., Kuhn,R.M., Baertsch,R., Barber,G.P., Clawson,H., Mallick,S., Schraiber,J.G., Jay,F., Prufer,K., de Filippo,C. et al. Diekhans,M., Giardine,B., Harte,R.A., Hinrichs,A.S., Hsu,F. (2012) A high-coverage genome sequence from an archaic et al. (2008) The UCSC Genome Browser Database: 2008 update. Denisovan individual. Science, 338, 222–226. Nucleic Acids Res., 36, D773–D779. 25. McLaren,W., Pritchard,B., Rios,D., Chen,Y., Flicek,P. and 8. ENCODE Project Consortium, Dunham,I., Kundaje,A., Cunningham,F. (2010) Deriving the consequences of genomic Aldred,S.F., Collins,P.J., Davis,C.A., Doyle,F., Epstein,C.B., variants with the Ensembl API and SNP Effect Predictor. BMC Frietze,S., Harrow,J. et al. (2012) An integrated encyclopedia of Bioinformatics, 26, 2069–2070. DNA elements in the human genome. Nature, 489, 57–74. 26. Raney,B.J., Dreszer,T.R., Barber,G.P., Clawson,H., Fujita,P.A., 9. Mouse ENCODE Consortium, Stamatoyannopoulos,J.A., Wang,T., Karolchik,D. and Kent,W.J. (2013) Track Data Hubs Snyder,M., Hardison,R., Ren,B., Gingeras,T., Gilbert,D.M., enable visualization of user-defined genome-wide annotations Groudine,M., Bender,M., Kaul,R. et al. (2012) An on the UCSC Genome Browser. Bioinformatics, doi: 10.1093/ encyclopedia of mouse DNA elements (Mouse ENCODE). bioinformatics/btt637. Genome Biol., 13, 418. 27. Kent,W.J., Zweig,A.S., Barber,G., Hinrichs,A.S. and Karolchik,D. 10. Rosenbloom,K.R., Sloan,C.A., Malladi,V.S., Dreszer,T.R., (2010) BigWig and BigBed: enabling browsing of large distributed Learned,K., Kirkup,V.M., Wong,M.C., Maddren,M., Fang,R., data sets. Bioinformatics, 26, 2204–2207. Heitner,S.G. et al. (2013) ENCODE data in the UCSC Genome 28. Li,H., Handsaker,B., Wysoker,A., Fennell,T., Ruan,J., Homer,N., Browser: year 5 update. Nucleic Acids Res., 41, D56–D63. Marth,G., Abecasis,G., Durbin,R. and 1000 Genome Project 11. Lindblad-Toh,K., Garber,M., Zuk,O., Lin,M.F., Parker,B.J., Data Processing Subgroup. (2009) The Sequence Alignment/Map Washietl,S., Kheradpour,P., Ernst,J., Jordan,G., Mauceli,E. et al. format and SAMtools. Bioinformatics, 25, 2078–2079. (2011) A high-resolution map of human evolutionary constraint 29. Danecek,P., Auton,A., Abecasis,G., Albers,C.A., Banks,E., using 29 mammals. Nature, 478, 476–482. DePristo,M.A., Handsaker,R.E., Lunter,G., Marth,G.T., 12. Pruitt,K.D., Harrow,J., Harte,R.A., Wallin,C., Diekhans,M., Sherry,S.T. et al. (2011) The variant call format and VCFtools. Maglott,D.R., Searle,S., Farrell,C.M., Loveland,J.E., Ruef,B.J. Bioinformatics, 27, 2156–2158. et al. (2009) The consensus coding sequence (CCDS) project: 30. Adams,D., Altucci,L., Antonarakis,S.E., Ballesteros,J., Beck,S., identifying a common protein-coding gene set for the human and Bird,A., Bock,C., Boehm,B., Campo,E., Caricasole,A. et al. mouse genomes. Genome Res., 19, 1506. (2012) BLUEPRINT to decode the epigenetic signature written in 13. Burge,S.W., Daub,J., Eberhardt,R., Tate,J., Barquist,L., blood. Nat. Biotechnol., 30, 224–226. Nawrocki,E.P., Eddy,S.R., Gardner,P.P. and Bateman,A. (2013) 31. Liu,X., Jian,X. and Boerwinkle,E. (2013) dbNSFP v2.0: a Rfam 11.0: 10 years of RNA families. Nucleic Acids Res., 41, database of human non-synonymous SNVs and their D226–D232. functional predictions and annotations. Hum. Mutat., 34, 14. Lowe,T.M. and Eddy,S.R. (1997) tRNAscan-SE: a program for E2393–E2402. improved detection of transfer RNA genes in genomic sequence. 32. Hickey,G., Paten,B., Earl,D., Zerbino,D. and Haussler,D. (2013) Nucleic Acids Res., 25, 955–964. HAL: a hierarchical format for storing and analyzing multiple 15. Harrow,J., Frankish,A., Gonzalez,J.M., Tapanari,E., genome alignments. Bioinformatics, 29, 1341–1342. Diekhans,M., Kokocinski,F., Aken,B.L., Barrell,D., Zadissa,A.,
Nucleic Acids Research – Oxford University Press
Published: Jan 21, 2014
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.