10 minutes guide

Introduction

filterx is a command line tool to filter lines from files. It is inspired by grep and awk and bioawk.

Features

  • 🚀 Filter lines by column-based expression

  • 🎨 Support multiple input formats e.g. vcf/sam/fasta/fastq/gff/bed/csv/tsv

  • 🎉 Cross-platform support

  • 📦 Easy to install

  • 📚 Rich documentations

Installation

Install by Cargo

cargo
1cargo install filterx
2# or
3cargo install --git [email protected]:dwpeng/filterx.git

Install from Pip

pip
1pip install filterx

Download from Github Release

Github Prebuild Binary
1https://github.com/dwpeng/filterx/releases

Quick Start

example.csv
1name,age
2Alice,20
3Bob,30
4Charlie,40

Filter by Column Value

TIP

filterx assumes that the csv file does not contain a header. If the file contains a header, you need to add the -H parameter.

one filter condition

1filterx csv example.csv -H -e 'age > 25'
2
3# Output
4# Bob,30
5# Charlie,40
example-without-header.csv
1Alice,20
2Bob,30
3Charlie,40

filterx can filter the csv file without a header. Use col to reference the column. The column index starts from 0. In the following example, col(1) refers to the second column in the csv file.

1filterx csv example-without-header.csv -e 'col(1) > 25'
2
3# Output
4# Bob,30
5# Charlie,40

multiple filter conditions

TIP

filterx don't output the header by default. If you want to output the header, you need to add the --oH parameter.

1filterx csv example.csv -H --oH -e 'age > 25 and name == "Bob"'
2# Output
3# name,age
4# Bob,30
TIP

filterx can pass multiple expressions to filter lines by using -e multiple times.

1filterx csv example.csv -H -e 'age > 25' -e 'name == "Bob"'
2# equivalent to
3filterx csv example.csv -H -e 'age > 25 and name == "Bob"'

Filter by builtins functions

filter by length

1filterx csv example.csv -H -e 'len(name) > 4'
2
3# Output
4# Charlie,40

Column manipulation

create a new column by using alias function

1filterx csv example.csv -H -e 'alias(new_col) = age + 10'
2
3# Output
4# Alice,20,30
5# Bob,30,40
6# Charlie,40,50

select columns by using select function

1filterx csv example.csv -H --oH -e 'select(age, name)'
2# Output
3# age,name
4# 20,Alice
5# 30,Bob
6# 40,Charlie

Row manipulation

process the first N rows

1filterx csv example.csv -H -e 'head(1)' -e 'print("{name}")'
2
3# Output
4# Alice

process the last N rows

1filterx csv example.csv -H -e 'tail(1)' -e 'print("{name}")'
2
3# Output
4# Charlie

Filter by Regular Expression

filterx use in to filter by regular expression pattern.

1filterx csv example.csv -H -e '"^B" in name'
2
3# Output
4# Bob,30

Format Output

filterx use print function to format output. The print function supports python-like string formatting. Use {} to format the value.

1filterx csv example.csv -H \
2     -e 'age > 25' \
3     -e 'print("this is {name}")'
4
5# Output
6# this is Bob
7# this is Charlie
1filterx csv example.csv -H \
2     -e 'age > 25' \
3     -e 'print("this is {name}, age is {age}")'
4
5# Output
6# this is Bob, age is 30
7# this is Charlie, age is 40

filterx also supports functions in the print function.

1filterx csv example.csv -H \
2     -e 'age > 25' \
3     -e 'print("{name}: length is {len(name)}")'
4
5# Output
6# Bob: length is 3
7# Charlie: length is 7

Sequence manipulation

example.fasta
1>seq1
2ACGT
3>seq2
4ACGTACGT

filter by sequence length and compute GC content

1filterx fasta example.fasta -e 'len(seq) > 5' -e 'gc(seq) > 0.5' -e 'print('{name}: {gc(seq)}')'
2
3# Output
4# seq2: 0.5

Inplace edit

use function which ends with _ to modify the value in place.

1# inplace edit
2filterx fasta example.fasta -e 'lower_(seq)'
3
4# Output
5# >seq1
6# acgt
7# >seq2
8# acgtacgt