When analyzing RNA-Seq data, rRNA and tRNA reads can be removed from the sequencing files. Here, I briefly describe how to do this step using ERNE

Prepare software

Download software from https://sourceforge.net/projects/erne/files/2.1.1/

Unzip and move erne-create and erne-filter to ~/local/bin

Install Seqkit conda install -c bioconda seqkit to use rmdup that remove duplicated fasta files.

Prepare rRNA and tRNA sequence

rRNA

rRNA sequences were downloaded from Silver

w g e t h t t p s : / / w w w . a r b - s i l v a . d e / f i l e a d m i n / s i l v a _ d a t a b a s e s / c u r r e n t / E x p o r t s / S I L V A _ 1 2 8 _ L S U P a r c _ t a x _ s i l v a . f a s t a . g z

tRNA

tRNA sequences were downloaded from GtRNAdb.

w g e t h t t p : / / g t r n a d b 2 0 0 9 . u c s c . e d u / d o w n l o a d / t R N A s / G t R N A d b - a l l - t R N A s . f a . g z

Combine two fasta files together and remove duplicate

Combine two fasta files as contaminate_rna.fa.

c c a a t t G c t o R n N t A a d m b i - n a a l t l e - _ t r R n N a A . s f . a f a | s S e I q L k V i A t _ 1 r 2 m 8 d _ u L p S U P a r c c o _ n t t a a x m _ i s n i a l t v e a _ . r f n a a s _ t u a n i > q . c f o a n t a m i n a t e _ r n a . f a

Prepare allign file

e r n e - c r e a t e o u t p u t - p r e f i x c o n t a m i n a t e _ r n a - f a s t a c o n t a m i n a t e _ r n a _ u n i q . f a & # t a k e s a b o u t t h r e e h o u r s

Run erne-filter to remove rRNA/tRNA

e r n e - f i l t e r - c o n t a m i n a t i o n - r e f e r e n c e c o n t a m i n a t e _ r n a . e b h - t h r e a d s 2 0 - q u e r y 1 t e s t _ t r i m m e d . f q o u t p u t - p r e f i x r m c o n t a c &

As a result, the clean file rmcontac_1.fastq has 55167445 sequences, while the original file has 55301662 sequences. Over 99.76% of reads retained. This step is probably more important for small RNA-sequencing.