Our laboratory develops free, open-source software for genomics research. We strive to develop intuitive and well-documented tools and are always open to feedback and user requests. If you find these tools useful in your research, we ask that you cite them in your research and report any issues that you uncover.
Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.
GEMINI is a powerful framework for exploring genetic variation in the context of the wealth of existing genome annotations that are available for the human genome. By integrating diverse annotations with genetic variation in the now standard VCF format, researchers have an single system for prioritizing variants in studies of human disease.
Genome Query Tools (GQT) is command line tool and a C API for storing and querying large-scale genotype data sets like those produced by 1000 Genomes, the Uk100K, and forthcoming datasets involving millions of genomes. GQT represents genotypes as compressed bitmap indices, which reduce the storage and compututational burden by orders of magnitude. This index can significantly expand the capabilities of population-scale analyses by providing interactive-speed queries to data sets with millions of individuals.
Lumpy is a new probabilistic framework that we have developed to integrate multiple structural variation signals such as discordant paired-end alignments and split-read alignments. While it is clear that integrating all SV signals is important for sensitive discovery, most existing (including our own Hydra) tools only exploit one signal. Lumpy integrates multiple signals in order to improve sensitivity and breakpoint resolution. This is especially important for cancer genome analysis where tumor heterogeneity causes potentially important rearrangements occur with less supporting alignments in the sampled DNA.
Pybedtools is a Python wrapper, programming interface (and much, much more) for Bedtools and extends these "genome algebra" programs by offering feature-level manipulations from with Python. Pybedtools is maintained by Ryan Dale.
The MinION (TM) from Oxford Nanopore Technologies (ONT) is the first nanopore sequencer to be commercialised and is now available to early-access users. The MinION (TM) is a USB-connected, portable nanopore sequencer which permits real-time analysis of streaming event data. Currently, the research community lacks a standardized toolkit for the analysis of nanopore datasets. We have therefore developed poretools, a flexible toolkit for exploring datasets generated by nanopore sequencing devices from MinION for the purposes of quality control and downstream analysis. Poretools operates directly on the native FAST5 (a variant of the HDF5 standard) file format produced by ONT and provides a wealth of format conversion utilities and data exploration and visualization tools.
Hydra-Multi detects all classes of structural variation using paired-end sequence alignments from multiple (100s) samples. Unlike other existing tools, hydra detects SVs arising from repetitive DNA and can make improved SV predictions by incorporating sequence alignments from many (100s or more) samples.