filter
Keep variants matching an expression. The fast path reads raw VCF lines and only parses the fields referenced by the expression — matching records are written as raw bytes without re-serialization.
Synopsis
Section titled “Synopsis”vcfkit filter [OPTIONS] --expression <EXPR> [INPUT]Options
Section titled “Options”| Flag | Description |
|---|---|
-e, --expression <EXPR> | Filter expression (required) |
-o, --output <FILE> | Output file (default: stdout) |
-v, --invert | Invert: keep records that do NOT match |
-q, --quiet | Suppress progress bar and stats |
Examples
Section titled “Examples”# Rare variantsvcfkit filter -e "INFO/AF < 0.01" input.vcf
# High quality PASS variantsvcfkit filter -e "QUAL > 30 && FILTER == 'PASS'" input.vcf
# Substring match (contains)vcfkit filter -e "INFO/CSQ ~ 'missense'" input.vcf
# Non-PASS variants (inverted filter)vcfkit filter -e "FILTER == 'PASS'" --invert input.vcf
# Chromosome + position rangevcfkit filter -e "CHROM == 'chr17' && POS >= 43044295 && POS <= 43125483" input.vcf
# Compound expressionvcfkit filter -e "INFO/AF < 0.05 && QUAL >= 50 && FILTER == 'PASS'" input.vcf > output.vcf
# From stdinbcftools view input.bcf | vcfkit filter -e "INFO/DP > 10"Expression language
Section titled “Expression language”Fields
Section titled “Fields”| Field | Type | Notes |
|---|---|---|
INFO/<key> | Per-header type | e.g., INFO/AF, INFO/DP, INFO/CSQ |
FORMAT/<key> | Per-header type | First sample only |
CHROM | String | e.g., 'chr1' |
POS | Integer | 1-based |
QUAL | Float | Missing (.) evaluates to false |
FILTER | String | e.g., 'PASS' |
Operators
Section titled “Operators”| Operator | Meaning |
|---|---|
<, <=, >, >=, ==, != | Comparison |
&&, ||, ! | Logical |
~ | Substring match (contains) |
!~ | Substring non-match |
Literals
Section titled “Literals”42 # integer3.14 # float'chr1' # string (single quotes)Type coercion
Section titled “Type coercion”Fields declared as Type=Float in the VCF header are parsed as f64 for numeric
comparisons. Type=Integer as i64. Type=String (including FILTER) as string.
A missing value (.) evaluates to false in all comparisons.
Multi-allelic INFO fields
Section titled “Multi-allelic INFO fields”INFO fields with Number=A (one value per ALT allele) use any-element semantics:
INFO/AF < 0.01 matches if any ALT allele has AF < 0.01.
INFO/AF=0.05,0.003 → INFO/AF < 0.01 matches (0.003 < 0.01)INFO/AF=0.05,0.12 → INFO/AF < 0.01 does not matchbcftools equivalence
Section titled “bcftools equivalence”# vcfkitvcfkit filter -e "INFO/AF < 0.01 && FILTER == 'PASS'" input.vcf
# bcftoolsbcftools view -i 'INFO/AF < 0.01 && FILTER == "PASS"' input.vcfThe expression syntax is similar. Key differences: vcfkit uses single quotes for string literals; bcftools uses double quotes.
Performance
Section titled “Performance”On 1000 Genomes chr22 (1.1M records): 422ms vs bcftools 1,695ms (4.0× faster).
The fast path reads raw lines. For each line, it only parses the INFO fields referenced in the expression — skipping all other fields. Matching records are written as raw bytes. Non-matching records are discarded. The VCF header is parsed once with noodles to get INFO type metadata.