Rice RAPDB to MSU7 ID conversion

Ji Huang 2019-04-03 2 min read

In this post, I generate the table for RAPDB gene IDs and MSU7 gene IDs conversion.

There are some online tools can do the conversion, but it’s more handy to have a local version. One online tool is RAP-DB ID Converter. The other is OryzaExpress ID converter.

The conversion table was downloaded from RAPDB on 2019-04-03.

0. Set up environment.

library(tidyverse)

1. Read data

# read raw downloaded table.
c_table <- read_tsv("./RAP-MSU_2019-03-22.txt", col_names = c("RAPDB", "MSU"))

2. Process table.

Get RAPDB gene ID correspondent MSU7 IDs. The MSU7 transcript ID was converted to Gene ID by removing .[digits].

cp_table <- c_table %>% separate_rows(MSU,sep = ",") %>%  
    mutate(MSU7 = str_replace(.$MSU,"\\.[:digit:]+","")) %>% 
    select(-MSU) %>% 
    distinct(RAPDB, MSU7)

There are 45967 RAPDB genes. And 55802 MSU7 genes.

3. Save table.

write_tsv(cp_table, path = "./RAPDB_MSU_ID_conversion_20190403.txt")

3. Some exploratory analysis.

  1. How many RAPDB genes don’t have MSU7 genes, as indicated by None in the MSU column?

Answer: 12282.

sum(cp_table$MSU7 == "None")
  1. How many MSU7 genes don’t have RAPDB IDs, as indicated by None in the RAPDB column?

Answer: 22991.

sum(cp_table$RAPDB == "None")
  1. How many RAPDB genes have multiple MSU7 genes?

Answer: 430.

cp_table %>% group_by(RAPDB) %>% tally() %>% filter(n > 1) %>% 
    dim() %>% magrittr::extract(1) -1
  1. How many MSU7 genes have multiple RAPDB genes?

Answer: 1233.

cp_table %>% group_by(MSU7) %>% tally() %>% filter(n > 1) %>% 
    dim() %>% magrittr::extract(1) -1