Intro to Bedder
Bedder is the successor to bedtools. It is designed to be more flexible so that users can accomplish custom functionality with small python functions.
While there have been many new alternatives to BedTools such as bedtk, bedrs, and coitrees with various advantages (often speed), BedTools persists as a mainstay in genomics with more than 25,000 citations because of the range of functionality that it provides.
Motivation
With bedder, we aimed to address several limitations with BedTools:
1. Flexibility / Development time
After 15 years, BedTools has accumulated many useful tools; each requires maintenance, documentation, and development. In addition, any custom requirement from a user would necessitate adding command-line options or new subtools and then documenting, testing and maintaining. We expected that after the core functionality of intersecting and then splicing intervals is implemented, it would be quite useful for a user to be able to modify/filter the result of those intersections before they are written.
In bedder this is done with simple python functions. We’ll explain more in the tutorial section, but as a quick example, we can get the total bases of overlap for each query interval with all database intervals as:
def bedder_total_b_overlap(fragment) -> int:
"""Reports the total bases of b-overlap for the a fragment"""
return sum(b.stop - b.start for b in fragment.b)A fragment is a piece of the a-interval. With this setup, we are even able to update VCF INFO fields, something that’s not possible with BedTools!! In this case, the INFO field would be updated with something like total_b_overlap=42; for a given VCF query record. And the VCF header would be updated appropriately as well.
2. Index Usage
BedTools allows a streaming intersection on sorted data, but does not support using the index. This means, that if, for example, we have a single query interval on chr22 and a large database VCF of, say, gnomAD then BedTools would have to iterate over every record in the VCF that comes before chr22. With bedder we look for an index and use it where possible/appropriate. This can speed many quick commands for exploratory research. Also note that we can get this speed without loading the entire file into an interval tree (though bedder is agnostic to how the intervals are accessed so it can support interval trees).
3. Output Management
In BedTools intersect, we have options like -wa, -wao, -u, -wb, -F, -f, etc. We have tried to aggregate these into their basic concepts in bedder. These appear as:
-m, --a-mode <INTERSECTION_MODE>
intersection mode for a-file. this determines how the overlap requirements are accumulated. [default: default] [possible values: default, not, piece]
-M, --b-mode <B_MODE>
intersection mode for b-file. this determines how the overlap requirements are accumulated. [default: default] [possible values: default, not, piece]
-p, --a-piece <A_PIECE>
the piece of the a intervals to report [default: whole] [possible values: none, piece, whole, inverse]
-P, --b-piece <B_PIECE>
the piece of the b intervals to report [default: whole] [possible values: none, piece, whole, inverse]
-r, --a-requirements <A_REQUIREMENTS>
a-requirements for overlap. A float value < 1 or a number ending with % will be the fraction (or %) of the interval. An integer will be the number of bases. Default is 1 unless n-closest is set.
-R, --b-requirements <B_REQUIREMENTS>
b-requirements for overlap. A float value < 1 or a number ending with % will be the fraction (or %) of the interval. An integer will be the number of bases. Default is 1 unless n-closest is set.
Where, for example --a-piece replaces much of the -wa, -wao, etc. and the --a-requirements replaces -f and -r and it’s more straightforward to understand the a and b options.
While this may take some getting used-to, we provide examples for the most common use-cases.
4. File Formats
Bedder can seemlessly support any interval type that can satisfy this trait:
/// A Positioned has a position in the genome. It is a bed-like (half-open) interval.
pub trait Positioned: Debug + Sync + Send {
fn chrom(&self) -> &str;
/// 0-based start position.
fn start(&self) -> u64;
/// non-inclusive end.
fn stop(&self) -> u64;
/// set the start position.
fn set_start(&mut self, start: u64);
/// set the stop position.
fn set_stop(&mut self, start: u64);
}Therefore, bedder can support any genomic file format. There is some extra work to handle the parsing, and optionally to support index-skipping, but each new file format will be isolated and require extremely minimal changes to bedder itself. Currently, BED and VCF (BCF) are supported and we will add BAM/CRAM shortly.
Tutorial
Grab a bedder binary from https://github.com/quinlan-lab/bedder-rs/releases/latest, rename it to bedder add it to your PATH. Make sure to have python 3.13 or higher installed along with the headers either from here or using uv as: uv python install 3.13.
As an example, let’s use the following interval files:
a.bed
chr1 10 20
chr1 15 25
chr1 30 40
b.bed
chr1 0 11
chr1 18 22
and a genome (often just .fai) file g.fai:
chr1 10000
To intersect two BED files, we use a command like:
$ bedder intersect -a a.bed -b b.bed -g g.fai
chr1 10 20 chr1 0 11 chr1 18 22
chr1 15 25 chr1 18 22
but we may want only the a interval pieces that overlap without the b pieces. (This would match the output of bedtools intersect -a a.bed -b b.bed):
$ bedder intersect -a a.bed -b b.bed -g g.fai --a-piece piece --b-piece none
chr1 10 11
chr1 18 20
chr1 18 22
As with BedTools, we can also:
- require a certain number of bases (or fraction) of overlap.
- count the number of overlapping intervals.
$ bedder intersect -a a.bed -b b.bed -g g.fai --a-piece piece --a-requirements 4 -c count
chr1 18 22 chr1 18 22 1
We can perform the equivalent of bedtools subtract by changing the -piece option:
$ bedder intersect -a a.bed -b b.bed -g g.fai --a-piece inverse --b-piece none
chr1 11 18
chr1 15 18
chr1 22 25
Python
Finally we can use python functions to output custom columns. First we must write a python function that starts with def bedder_ and returns an integer, float, string, or boolean.
This example will sum the total bases in a if the end of any overlapping b interval is odd:
def bedder_oddb(fragment) -> int:
"return len(a) if the end of any overlapping b interval is odd"
if all(b.stop % 2 == 0 for b in fragment.b): return 0
return fragment.a.stop - fragment.a.startThen we can use it to output a custom column:
$ bedder intersect -a a.bed -b b.bed -g g.fai --python col.py -c py:oddb
chr1 10 20 chr1 0 11 chr1 18 22 10
chr1 15 25 chr1 18 22 0
bedder will call bedder_oddb for each fragment in the output. A fragment is determined by --a-piece which can be, for example, piece where it’s split to the pieces of of a that overlap a b interval, or whole where the entire a interval becomes the fragment.
Note that, for this example, the first row returns 10 because the end of the first b interval is odd. And second row returns 0 because the only overlapping b interval has an even end. The python wrapper allows us to access INFO and other fields from a VCF and other columns from a BED file.
VCF
Let’s add an example test.vcf with a single variant (and appropriate VCF header):
chr1 14 rs1 G A 60 PASS DP=321;AF=0.5
As an example of how to use/access VCFs in python, we add the following to col.py:
def bedder_info_dp(fragment) -> int:
"""INFO.DP"""
for b in fragment.b:
v = b.vcf()
if v.filters[0] != "PASS": return -1
return v.info("DP")
return 0This will extract the DP field from the first overlapping variant and return it as an int. We use bedder to intersect with a.bed from above:
bedder intersect -a a.bed -b test.vcf -g g.fai --python col.py -c "py:info_dp"
with the output:
chr1 10 20 chr1 13 14 321
where the final column has the DP value of 321.
VCF query
Using the same test.vcf, we show how a VCF column is added when the query is a VCF.
The command: bedder intersect -a test.vcf -b a.bed -g g.fai --python col.py -c "py:oddb" which uses oddb from above, generates the output:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=oddb,Number=1,Type=Integer,Description="oddb">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 14 rs1 G A 60 PASS DP=321;AF=0.5;oddb=0
where oddb has been added to the header and to the INFO field of the variant.
Conclusions
Bedder is approaching beta quality; there are likely lots of rough edges, but they should be easily fixed. We are interested in feedback.
Grab a binary from https://github.com/quinlan-lab/bedder-rs/releases/latest, rename it to bedder add it to your PATH. Make sure to have python 3.13 or higher installed along with the headers either from here or using uv as: uv python install 3.13. You can help us by thinking about the following and letting us know:
- What else is needed in the python functions?
- How can we tune the index strategy?
- How might we support custom formats without change to the source? (perhaps by allowing a python interface that can meet the
Positionedtrait.) - Any bugs!