10 minutes guide
Introduction
filterx
is a command line tool to filter lines from files. It is inspired by grep and awk and bioawk.
Features
-
🚀 Filter lines by column-based expression
-
🎨 Support multiple input formats e.g. vcf/sam/fasta/fastq/gff/bed/csv/tsv
-
🎉 Cross-platform support
-
📦 Easy to install
-
📚 Rich documentations
Installation
Install by Cargo
cargo
1cargo install filterx
2# or
3cargo install --git [email protected]:dwpeng/filterx.git
Install from Pip
Download from Github Release
Github Prebuild Binary
1https://github.com/dwpeng/filterx/releases
Quick Start
example.csv
1name,age
2Alice,20
3Bob,30
4Charlie,40
Filter by Column Value
TIP
filterx assumes that the csv file does not contain a header. If the file contains a header, you need to add the -H
parameter.
one filter condition
1filterx csv example.csv -H -e 'age > 25'
2
3# Output
4# Bob,30
5# Charlie,40
example-without-header.csv
1Alice,20
2Bob,30
3Charlie,40
filterx can filter the csv file without a header. Use col
to reference the column. The column index starts from 0. In the following example, col(1)
refers to the second column in the csv file.
1filterx csv example-without-header.csv -e 'col(1) > 25'
2
3# Output
4# Bob,30
5# Charlie,40
multiple filter conditions
TIP
filterx don't output the header by default. If you want to output the header, you need to add the --oH
parameter.
1filterx csv example.csv -H --oH -e 'age > 25 and name == "Bob"'
2# Output
3# name,age
4# Bob,30
TIP
filterx can pass multiple expressions to filter lines by using -e
multiple times.
1filterx csv example.csv -H -e 'age > 25' -e 'name == "Bob"'
2# equivalent to
3filterx csv example.csv -H -e 'age > 25 and name == "Bob"'
Filter by builtins functions
filter by length
1filterx csv example.csv -H -e 'len(name) > 4'
2
3# Output
4# Charlie,40
Column manipulation
create a new column by using alias
function
1filterx csv example.csv -H -e 'alias(new_col) = age + 10'
2
3# Output
4# Alice,20,30
5# Bob,30,40
6# Charlie,40,50
select columns by using select
function
1filterx csv example.csv -H --oH -e 'select(age, name)'
2# Output
3# age,name
4# 20,Alice
5# 30,Bob
6# 40,Charlie
Row manipulation
process the first N rows
1filterx csv example.csv -H -e 'head(1)' -e 'print("{name}")'
2
3# Output
4# Alice
process the last N rows
1filterx csv example.csv -H -e 'tail(1)' -e 'print("{name}")'
2
3# Output
4# Charlie
Filter by Regular Expression
filterx use in
to filter by regular expression pattern.
1filterx csv example.csv -H -e '"^B" in name'
2
3# Output
4# Bob,30
Format Output
filterx use print
function to format output. The print
function supports python-like string formatting. Use {}
to format the value.
1filterx csv example.csv -H \
2 -e 'age > 25' \
3 -e 'print("this is {name}")'
4
5# Output
6# this is Bob
7# this is Charlie
1filterx csv example.csv -H \
2 -e 'age > 25' \
3 -e 'print("this is {name}, age is {age}")'
4
5# Output
6# this is Bob, age is 30
7# this is Charlie, age is 40
filterx also supports functions in the print
function.
1filterx csv example.csv -H \
2 -e 'age > 25' \
3 -e 'print("{name}: length is {len(name)}")'
4
5# Output
6# Bob: length is 3
7# Charlie: length is 7
Sequence manipulation
example.fasta
1>seq1
2ACGT
3>seq2
4ACGTACGT
filter by sequence length and compute GC content
1filterx fasta example.fasta -e 'len(seq) > 5' -e 'gc(seq) > 0.5' -e 'print('{name}: {gc(seq)}')'
2
3# Output
4# seq2: 0.5
Inplace edit
use function which ends with _
to modify the value in place.
1# inplace edit
2filterx fasta example.fasta -e 'lower_(seq)'
3
4# Output
5# >seq1
6# acgt
7# >seq2
8# acgtacgt