Remove tRNA and rRNA from RNA-Seq data

2017-10-06 1 min read

When analyzing RNA-Seq data, rRNA and tRNA reads can be removed from the sequencing files. Here, I briefly describe how to do this step using ERNE

Prepare software

Download software from

Unzip and move erne-create and erne-filter to ~/local/bin

Install Seqkit conda install -c bioconda seqkit to use rmdup that remove duplicated fasta files.

Prepare rRNA and tRNA sequence


rRNA sequences were downloaded from Silver



tRNA sequences were downloaded from GtRNAdb.


Combine two fasta files together and remove duplicate

Combine two fasta files as contaminate_rna.fa.

cat GtRNAdb-all-tRNAs.fa SILVA_128_LSUParc_tax_silva.fasta > contaminate_rna.fa
cat contaminate_rna.fa |seqkit rmdup -o contaminate_rna_uniq.fa

Prepare allign file

erne-create --output-prefix contaminate_rna --fasta contaminate_rna_uniq.fa & # takes about three hours

Run erne-filter to remove rRNA/tRNA

erne-filter --contamination-reference contaminate_rna.ebh --threads 20 --query1 test_trimmed.fq --output-prefix rmcontac&

As a result, the clean file rmcontac_1.fastq has 55167445 sequences, while the original file has 55301662 sequences. Over 99.76% of reads retained. This step is probably more important for small RNA-sequencing.