Scientists & Research
  Overview  
dashed line
  FindSci  
dashed line
Scientific Competitions
dashed line
HHMI Investigators
dashed line
  JFRC Scientists  
dashed line
  Internatinal Scholars  
dashed line
  Profs  
dashed line
  Nobel Laureates  

HHMI-NIH Research Scholars
Learn about the HHMI-NIH Research Scholars Program, also known as the Cloister Program. Moresmall arrow

dashed line

Janelia Farm Research Campus
Learn about the new HHMI research campus located in Virginia. Moresmall arrow

Algorithms for Genome Analysis


Summary: David Haussler is developing new statistical and algorithmic methods to explore the molecular evolution of the human genome, integrating cross-species comparative and high-throughput genomics data to study gene structure, function, and regulation.

My genome informatics team has participated in the public consortium efforts to produce, assemble, and annotate the first mammalian genomes. As collaborators in the Human Genome Project, we built the program that assembled the first working draft of the human genome sequence from information produced by sequencing centers worldwide, and we participated in the informatics associated with the finishing effort. We provide an interactive genome browser for the human, mouse, rat, and other genomes that is used by thousands of biomedical researchers every day (genome.ucsc.edu). By integrating multiple sets of high-throughput genomics data, computational predictions, and curated genomic feature sets from dozens of laboratories, the browser provides a new kind of computational microscope for exploring genomes.

Our work developing and annotating genomes for the browser provides a foundation for our scientific efforts. These are directed at the large-scale discovery and characterization of the functional elements in mammalian genomes through comparative sequence analysis, the study of mammalian molecular evolution, and the integration of an increasing variety of high-throughput data sets provided by functional genomics efforts.

Throughout the approximately 75 million years since the human species diverged from its common ancestor with the rat and mouse, the three genomes have independently accumulated many changes, leading to the three different species we see today. Reconstructing these changes by computational analysis has given us a new understanding of mammalian genome evolution. In comparisons of the human, mouse, and rat genomes, we have found that the rate of neutral substitution varies regionally along the chromosomes. The mechanistic explanation of this variation has not yet been found. We determined that a core of about 40 percent of the human, rat, and mouse genome sequences derives from a common ancestor, and we produced base-level alignments between the three genomes in these regions. This alignment, combined with characterization of neutral substitution rates, led to the estimate that at least 5 percent of the human genome is under negative selection; changes to the bases in these regions reduce fitness, and hence seldom become established in the population.

We suspect that these conserved regions contain the most functionally important elements of the genome and point to areas where intensified study will lead to a better understanding of how the genome works. Since only 1.5 percent of the genome is coding, if this rough estimate holds up, it would imply that there is at least an additional 3.5 percent of the genome that is functionally important noncoding DNA. Some of these noncoding regions are "ultraconserved," showing almost no change for hundreds of millions of years. We have confirmed that negative selection is three times stronger in these regions than it is for nonsynonymous changes in coding regions. It is a mystery what molecular mechanisms would place virtually every base in a segment of size up to 1 kilobase under this level of negative selection. Our goal over the next several years is to characterize these regions computationally and in many cases also functionally, through wet-lab experiments.

In an attempt to build realistic and information-rich mathematical models of molecular evolution, we have undertaken larger, multispecies comparisons. Some of these models are tailored to specific kinds of functional elements, such as coding exons and transcription factor–binding sites (in conjunction with the National Human Genome Research Institute ENCODE project). These models should identify elements under negative selection with higher sensitivity and specificity than was possible with two-species comparisons. Ultimately we hope to explore the full spectrum of events in mammalian molecular evolution, including insertions, deletions, duplications, inversions, and rearrangements. As the number of genomes grows, our goal is to produce increasingly accurate analyses of the evolutionary history of each base in the human genome as a basis for genome-wide functional analysis.

Our work has revealed some unexpected origins for some ultraconserved elements. Multiple close copies of one of these critical DNA sequences in our genome can be traced to our common ancestor with the coelacanth, a descendant of the ancient marine organism that gave rise to the terrestrial vertebrates more than 360 million years ago. These sequences appear to derive from DNA elements known as retroposons, which are evolutionarily derived from retroviruses. In the coelacanth, the segments were produced by a retroposon known as a short interspersed repetitive element, or SINE, which is a piece of DNA that can make copies of itself and insert those copies elsewhere in an organism's genome. Wet-lab tests have confirmed that one of these segments regulates a nearby neurodevelopmental gene. Thus, the movement of retroposons can generate evolutionary experiments by adding new regulatory modules to genes, and for as yet unknown reasons, these can occasionally become ultraconserved.

Our other work has confirmed that this process of regulatory network expansion by retroposon movement is widespread. For example, we estimate that one-third of the binding sites for the tumor-suppressor gene p53 in our genome are specific to primates and were put in place through expansion of a particular family of endogenous retroviruses (a type of retroposon) about 40 million years ago. This significantly expanded the regulatory network of p53 in primates.

We have also begun to explore sudden change in noncoding regions of the genome that have previously been highly conserved by negative selection. Comparing our genome to that of our closest relative, the chimpanzee, we found the most dramatic example of evolutionary acceleration in a novel RNA gene that is expressed specifically in neurons in the developing human neocortex during a critical period for cortical neuron specification and migration. This and other regions of accelerated change in the human genome provide exciting new candidates in the search for uniquely human biology.

This work is funded in part by grants from the National Human Genome Research Institute, the National Cancer Institute, the National Institute on Drug Abuse, and the California Institute for Quantitative Biomedical Research (QB3).

Last updated: August 21, 2008

HHMI INVESTIGATOR

David Haussler
David Haussler
 

Related Links

AT HHMI

bullet icon

Researchers Identify Human DNA on the Fast Track
(08.16.06)

bullet icon

Mobile DNA Part of Evolution’s Toolbox
(05.03.06)

bullet icon

Evolution Is Our Laboratory

bullet icon

Bringing to Life the Genome of an Ancient Mammal
(11.30.04)

bullet icon

Critical Stretches of Human Genome

bullet icon

Genes Seen

ON THE WEB

external link icon

The Haussler Lab
(ucsc.edu)

external link icon

The UCSC Center for Biomolecular Science & Engineering
(ucsc.edu)

external link icon

The Human Genome Project at UCSC
(ucsc.edu)

search icon Search PubMed
dashed line
 Back to Topto the top
© 2009 Howard Hughes Medical Institute. A philanthropy serving society through biomedical research and science education.
4000 Jones Bridge Road, Chevy Chase, MD 20815-6789 | (301) 215-8500 | email: webmaster@hhmi.org