Intro to Bedder

Bedder is the successor to bedtools. It is designed to be more flexible so that users can accomplish custom functionality with small python functions.

While there have been many new alternatives to BedTools such as bedtk, bedrs, and coitrees with various advantages (often speed), BedTools persists as a mainstay in genomics with more than 25,000 citations because of the range of functionality that it provides.

Motivation

With bedder, we aimed to address several limitations with BedTools:

1. Flexibility / Development time

After 15 years, BedTools has accumulated many useful tools; each requires maintenance, documentation, and development. In addition, any custom requirement from a user would necessitate adding command-line options or new subtools and then documenting, testing and maintaining. We expected that after the core functionality of intersecting and then splicing intervals is implemented, it would be quite useful for a user to be able to modify/filter the result of those intersections before they are written.

In bedder this is done with simple python functions. We’ll explain more in the tutorial section, but as a quick example, we can get the total bases of overlap for each query interval with all database intervals as:

def bedder_total_b_overlap(fragment) -> int:
    """Reports the total bases of b-overlap for the a fragment"""
    return sum(b.stop - b.start for b in fragment.b)

A fragment is a piece of the a-interval. With this setup, we are even able to update VCF INFO fields, something that’s not possible with BedTools!! In this case, the INFO field would be updated with something like total_b_overlap=42; for a given VCF query record. And the VCF header would be updated appropriately as well.

2. Index Usage

BedTools allows a streaming intersection on sorted data, but does not support using the index. This means, that if, for example, we have a single query interval on chr22 and a large database VCF of, say, gnomAD then BedTools would have to iterate over every record in the VCF that comes before chr22. With bedder we look for an index and use it where possible/appropriate. This can speed many quick commands for exploratory research. Also note that we can get this speed without loading the entire file into an interval tree (though bedder is agnostic to how the intervals are accessed so it can support interval trees).

3. Output Management

In BedTools intersect, we have options like -wa, -wao, -u, -wb, -F, -f, etc. We have tried to aggregate these into their basic concepts in bedder. These appear as:

  -m, --a-mode <INTERSECTION_MODE>
          intersection mode for a-file. this determines how the overlap requirements are accumulated. [default: default] [possible values: default, not, piece]
  -M, --b-mode <B_MODE>
          intersection mode for b-file. this determines how the overlap requirements are accumulated. [default: default] [possible values: default, not, piece]
  -p, --a-piece <A_PIECE>
          the piece of the a intervals to report [default: whole] [possible values: none, piece, whole, inverse]
  -P, --b-piece <B_PIECE>
          the piece of the b intervals to report [default: whole] [possible values: none, piece, whole, inverse]
  -r, --a-requirements <A_REQUIREMENTS>
          a-requirements for overlap. A float value < 1 or a number ending with % will be the fraction (or %) of the interval. An integer will be the number of bases. Default is 1 unless n-closest is set.
  -R, --b-requirements <B_REQUIREMENTS>
          b-requirements for overlap. A float value < 1 or a number ending with % will be the fraction (or %) of the interval. An integer will be the number of bases. Default is 1 unless n-closest is set.

Where, for example --a-piece replaces much of the -wa, -wao, etc. and the --a-requirements replaces -f and -r and it’s more straightforward to understand the a and b options.

While this may take some getting used-to, we provide examples for the most common use-cases.

4. File Formats

Bedder can seemlessly support any interval type that can satisfy this trait:

/// A Positioned has a position in the genome. It is a bed-like (half-open) interval.
pub trait Positioned: Debug + Sync + Send {
    fn chrom(&self) -> &str;

    /// 0-based start position.
    fn start(&self) -> u64;

    /// non-inclusive end.
    fn stop(&self) -> u64;

    /// set the start position.
    fn set_start(&mut self, start: u64);

    /// set the stop position.
    fn set_stop(&mut self, start: u64);
}

Therefore, bedder can support any genomic file format. There is some extra work to handle the parsing, and optionally to support index-skipping, but each new file format will be isolated and require extremely minimal changes to bedder itself. Currently, BED and VCF (BCF) are supported and we will add BAM/CRAM shortly.

Tutorial

Grab a bedder binary from https://github.com/quinlan-lab/bedder-rs/releases/latest, rename it to bedder add it to your PATH. Make sure to have python 3.13 or higher installed along with the headers either from here or using uv as: uv python install 3.13.

As an example, let’s use the following interval files:

a.bed

chr1    10      20
chr1    15      25
chr1    30      40

b.bed

chr1    0       11
chr1    18      22

and a genome (often just .fai) file g.fai:

chr1    10000

To intersect two BED files, we use a command like:

$ bedder intersect -a a.bed -b b.bed -g g.fai
chr1    10      20      chr1    0       11      chr1    18      22
chr1    15      25      chr1    18      22

but we may want only the a interval pieces that overlap without the b pieces. (This would match the output of bedtools intersect -a a.bed -b b.bed):

$ bedder intersect -a a.bed -b b.bed -g g.fai --a-piece piece --b-piece none
chr1    10      11
chr1    18      20
chr1    18      22

As with BedTools, we can also:

require a certain number of bases (or fraction) of overlap.
count the number of overlapping intervals.

$ bedder intersect -a a.bed -b b.bed -g g.fai --a-piece piece  --a-requirements 4 -c count
chr1    18      22      chr1    18      22      1

We can perform the equivalent of bedtools subtract by changing the -piece option:

$ bedder intersect -a a.bed -b b.bed -g g.fai --a-piece inverse --b-piece none
chr1    11      18
chr1    15      18
chr1    22      25

Python

Finally we can use python functions to output custom columns. First we must write a python function that starts with def bedder_ and returns an integer, float, string, or boolean.

This example will sum the total bases in a if the end of any overlapping b interval is odd:

def bedder_oddb(fragment) -> int:
    "return len(a) if the end of any overlapping b interval is odd"
    if all(b.stop % 2 == 0 for b in fragment.b): return 0
    return fragment.a.stop - fragment.a.start

Then we can use it to output a custom column:

$ bedder intersect -a a.bed -b b.bed -g g.fai --python col.py -c py:oddb 
chr1    10      20      chr1    0       11      chr1    18      22      10
chr1    15      25      chr1    18      22      0

bedder will call bedder_oddb for each fragment in the output. A fragment is determined by --a-piece which can be, for example, piece where it’s split to the pieces of of a that overlap a b interval, or whole where the entire a interval becomes the fragment.

Note that, for this example, the first row returns 10 because the end of the first b interval is odd. And second row returns 0 because the only overlapping b interval has an even end. The python wrapper allows us to access INFO and other fields from a VCF and other columns from a BED file.

VCF

Let’s add an example test.vcf with a single variant (and appropriate VCF header):

chr1    14      rs1     G       A       60      PASS    DP=321;AF=0.5

As an example of how to use/access VCFs in python, we add the following to col.py:

def bedder_info_dp(fragment) -> int:
    """INFO.DP"""
    for b in fragment.b:
        v = b.vcf()
        if v.filters[0] != "PASS": return -1
        return v.info("DP")
    return 0

This will extract the DP field from the first overlapping variant and return it as an int. We use bedder to intersect with a.bed from above:

bedder intersect -a a.bed -b test.vcf -g g.fai --python col.py -c "py:info_dp"

with the output:

chr1    10      20      chr1    13      14      321

where the final column has the DP value of 321.

VCF query

Using the same test.vcf, we show how a VCF column is added when the query is a VCF.

The command: bedder intersect -a test.vcf -b a.bed -g g.fai --python col.py -c "py:oddb" which uses oddb from above, generates the output:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=oddb,Number=1,Type=Integer,Description="oddb">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    14      rs1     G       A       60      PASS    DP=321;AF=0.5;oddb=0

where oddb has been added to the header and to the INFO field of the variant.

Conclusions

Bedder is approaching beta quality; there are likely lots of rough edges, but they should be easily fixed. We are interested in feedback.

Grab a binary from https://github.com/quinlan-lab/bedder-rs/releases/latest, rename it to bedder add it to your PATH. Make sure to have python 3.13 or higher installed along with the headers either from here or using uv as: uv python install 3.13. You can help us by thinking about the following and letting us know:

What else is needed in the python functions?
How can we tune the index strategy?
How might we support custom formats without change to the source? (perhaps by allowing a python interface that can meet the Positioned trait.)
Any bugs!