Tuesday, February 10, 2015

KEGG Data Errors. I am pissed off !!

I  am onto my Phd thesis working day and night but after running my calculations I found the results were not as much promising as I was expecting.  In order to build a predictive model you got to have a train set and a test set , so as I made. I was checking some pathway and disease associations from my model. I used a dataset from CPDB  which is a collection of all pathways from different databases and a nice resource to do enrichment studies and network analysis.

As my results were not much promising I went to check the dataset whether they are ok or not. I am predicting association of an OMIM disease Glaucoma with the pathways . In the test set I had various Reactome pathways but not the pathway from KEGG hsa03008, this is unexpected.  I went into the pathway page and saw that they mention about the disease Glaucoma . Now I was specifically interested on the OPTN gene because its one of the primary genes for the disease. If you go to the KEGG disease page of Glaucoma you can see the OPTN gene name exists. Moving onto the KEGG page for OPTN I didn't found any pathway associations mentioned. I went onto the pathway page hsa03008 where I didn't notice the name of the gene mentioned.  Also other essential genes which are mentioned on the disease page MYOC, CYP1B1, NTF4 were not linked to the pathways except for WDR36.
People analyzing on these unaccounted data is missing a lot of information and even Data analyst's interpreting it wrong.

KEGG guys needs to map the data in right way and provide the right information . This was just an small example there are others also for which i am pissed off a lot !!

Post a Comment