Sunday, November 25, 2012

fingerprints,fragments,side effect of drugs...

After a long time i am writing a blog . Though the material looks quite interesting to me for study the results indicate some essential fingerprints are related to side effect profile as well the compounds therapeutic area and also in target similarity between two compounds.

So lets see what the study tells....

I mapped around 746 compounds from drugbank and SIDER  by CID and then by name of the compounds and  manually checked and made a final dataset for my study. The recent SIDER provides 4500 side effects profiles.The side effect profiles are used to create a Compound - side effect binary matrix .

The side effects are used to calculate the compound similarity using matrix multiplication and normalization the algorithm i used is discussed in paper Metapath . It provides a fast and efficient way to calculate dice coefficient which i used to calculate the similarity more at Rajarshi's Guha Blog .

For the Final data of 746  compounds i removed metals and compounds molecular weight greater than 1000  and ultimately came down to 728 compounds.

I also created a compound- target matrix and compound - atc code matrix ( data is collected from drugbank) from which I calculated the similarity of compounds by side effect, ATC codes and protein  using using the metapath algorithm respectively.

Earlier research from Yamanishi's work on predicting side effects from substructure profile and drug target prediction using learning models  were very good papers . One of the important paper was from Campillos of drug target identification using side effect similarity which focused on similar side effect profile which is kind of similar to the off target profile. Another paper (SLAP) from our lab at Indiana University using semantics to identify target was a good improvement of integrating multiple heterogeneous network and predicting targets.

For my work i used pubchem fingerprints, maccs keys 166 , ECFP 4, ECFP 6 FCFP 4, FCFP 6 to find how much is the relation between the these fingerprints with the side effects and ATC codes and proteins.

I used Tanimoto similarity for the fingerprint based similarity of compounds and the pathsim similarity with side effect,ATC and protein of compounds. After performing a simple correlation study I found pubchem fingerprints are highly correlated with side effect though the correlation is about 0.16 but it was statically significant p-value < 2.2e-16 so as for  maccs keys and extended connectivity fingerprints. Below i provide correlation matrix of the different fingerprints,side effects,ATC Code, protein similarity.

Pubchem Fingerprint similarity Distribution

side effect distribution
Below is the matrix shown

sideffect pubchem maccs ECFP4 ECFP6 FCFP4 FCFP6 atc protein
sideffect 1 0.160535 0.156551 0.142548 0.140111 0.125838 0.12807 0.055374 0.097784
pubchem 0.160535 1 0.585017 0.604037 0.566609 0.617526 0.576539 0.018368 0.132186
maccs 0.156551 0.585017 1 0.523484 0.491311 0.477387 0.434868 0.014732 0.169165
ECFP4 0.142548 0.604037 0.523484 1 0.990811 0.757584 0.777071 0.096187 0.289458
ECFP6 0.140111 0.566609 0.491311 0.990811 1 0.747782 0.787673 0.102716 0.295364
FCFP4 0.125838 0.617526 0.477387 0.757584 0.747782 1 0.981399 0.094613 0.263978
FCFP6 0.12807 0.576539 0.434868 0.777071 0.787673 0.981399 1 0.106107 0.283874
atc 0.055374 0.018368 0.014732 0.096187 0.102716 0.094613 0.106107 1 0.166476
prot 0.097784 0.132186 0.169165 0.289458 0.295364 0.263978 0.283874 0.166476 1

It is found that ECFP and FCFP  shows very good correlation with Protein and atc code similairty fingerprint . But on the other hand pubchem fingeprint shows a best correlation with the side effect .
A question is arised here if the correlation is such low how much does Yamanishi's paper is able to predict the true relations? Can different substrctures methods able to correlate with the side effects ?

Well i am still thinking what's behind the side effects and the impact of right substructures on side effects.