Genomics research is staggeringly complex. Massive datasets and extensive sources of experimental uncertainty demand versatile computational tools that realize the potential of DNA sequencing technologies and unlock new insights. We create foundational algorithms and software to enable scientific discovery. Our computational innovations are grounded in the biology we study, setting the stage for tools that embrace modern genomics’ complexity and drive discovery.

Causes and consequences of germline mutation

By studying the genomes of many multi-generational families, we revealed striking variability in the accumulation of mutations in sperm as men age. Our ongoing research studies the genetic and environmental causes of variability in germline mutation rates and spectra. We are also interested in the patterns and consequences of germline mutation, as they relate to rare human disease, genome evolution, aging, and fertility. We study germline mutation
in the genomes of families, populations, and through direct measurement of mutation dynamics in sperm and spermatogonial stem cells. Pursuing this research requires new molecular techniques and the development of computational tools to accurately detect mutations. These advances allow us to conduct large-scale studies of germline and testis biology and the role of genetic modifiers and age on sperm mutation dynamics.

Genetic constraint

We are interested in mutations we can observe, as well as mutations we cannot see in “healthy” individuals. We infer that genomic regions which are depleted for genetic variation in healthy individuals must be intolerant of mutation during development - also known as genetic “constraint”. In 2019, we developed a map of Constrained Coding Regions (CCRs). Not surprisingly, we found that these “constrained coding regions” were abundant in genes that cause developmental disorders. However, we were surprised to find hundreds of regions under extreme constraint within genes that lacked prior disease association. We hypothesized that these constrained coding regions could reveal new disease genes that are under strong negative selection and cause severe developmental phenotypes or embryonic lethality when mutated. Our ongoing research expands genetic constraint to noncoding genomic regions and explores coding constraint with new techniques and ever-larger databases of human genetic variation.

Software for genomic discovery

We craft open-source software that simplifies complex analyses and enables hypothesis testing in genomics. A transformational example is BEDTOOLS, software that is essential to genomics research and allows researchers worldwide to quickly conduct complex analyses on massive genomics datasets. This advance launched our research program seeking to create innovative tools that broadly impact genomics research.

Somatic mosaicism

Although we understand the extent and types of inherited genetic variation in the human population, our limited knowledge of somatic variation comes largely from cancer studies. We are exploring normal somatic mosaicism in the human body by measuring mutation within and between individuals, cell types, and organs. Our software, statistical approaches, and tissue access enable ongoing studies of how age, sex, environmental exposures, and developmental lineage impact the rate and composition of somatic mutation.

Structural variation

Human chromosomes harbor hundreds of structural differences including deletions, insertions, duplications, inversions, and translocations. Collectively, these differences are known as “structural variation” (or, “SV”). Any two humans differ by thousands of structural variants which vary greatly in size and phenotypic consequence. However, we are just beginning to understand the contribution of SV to evolution, development, and complex disease. Our laboratory continues to develop new methods such as LUMPY, SMOOVE, COVVIZ (demo), and STRLING for detecting, scrutinizing, and understanding structural variation using modern DNA sequencing techniques.

Rare disease genetics

We develop and apply new software for identifying causal genetic variants in studies of rare familial disease. The University of Utah has a long history of expertise in this area and we work closely with many clinical collaborators to solve rare disease. We have developed numerous tools (e.g., GEMINI, SLIVAR, CYVCF, PEDDY, SOMALIER) that have driven the discovery of causal variants and genes underlying rare human disease in our lab and labs worldwide.