In this post, I generate the table for RAPDB gene IDs and MSU7 gene IDs conversion.
There are some online tools can do the conversion, but it’s more handy to have a local version. One online tool is RAP-DB ID Converter. The other is OryzaExpress ID converter.
The conversion table was downloaded from RAPDB on 2019-04-03.
0. Set up environment.
library(tidyverse)
1. Read data
# read raw downloaded table.
c_table <- read_tsv("./RAP-MSU_2019-03-22.txt", col_names = c("RAPDB", "MSU"))
2. Process table.
Get RAPDB gene ID correspondent MSU7 IDs. The MSU7 transcript ID was converted to Gene ID by removing .[digits].
cp_table <- c_table %>% separate_rows(MSU,sep = ",") %>%
mutate(MSU7 = str_replace(.$MSU,"\\.[:digit:]+","")) %>%
select(-MSU) %>%
distinct(RAPDB, MSU7)
There are 45967 RAPDB genes. And 55802 MSU7 genes.
3. Save table.
write_tsv(cp_table, path = "./RAPDB_MSU_ID_conversion_20190403.txt")
3. Some exploratory analysis.
- How many RAPDB genes don’t have MSU7 genes, as indicated by None in the
MSU
column?
Answer: 12282.
sum(cp_table$MSU7 == "None")
- How many MSU7 genes don’t have RAPDB IDs, as indicated by None in the
RAPDB
column?
Answer: 22991.
sum(cp_table$RAPDB == "None")
- How many RAPDB genes have multiple MSU7 genes?
Answer: 430.
cp_table %>% group_by(RAPDB) %>% tally() %>% filter(n > 1) %>%
dim() %>% magrittr::extract(1) -1
- How many MSU7 genes have multiple RAPDB genes?
Answer: 1233.
cp_table %>% group_by(MSU7) %>% tally() %>% filter(n > 1) %>%
dim() %>% magrittr::extract(1) -1