Thursday, December 13, 2012

Drug Repurposing Explorer

Following from the previous work i have made an tool suitable for drug repurposing using the side effect, Protein sharing, Fingerprints(Pubchem, ECFP6) and also ROCS (3D) .

I collected tThe side effect information of 727 drugs from SIDER .The target information, ATC Codes
from the Drugbank. We also manually added some of the drugs ATC codes from KEGG database. The
disease information of 727 drugs was obtained from Comparative Toxicogenomics Database (CTD)
[8].CTD is a curated information resource for chemical- gene, chemical-diseases, gene-diseases, chemical
–ontology interaction which is curated by hand by the CTD bio-curators. We extracted the Chemical to
disease association between the compounds which are related to the therapeutic area only which will help
to identify possible diseases to treat the drug and also if new diseases could be identified. We then
converted side effect and target data into binary matrices of 0 and 1 and created drug-side effect, drug
target matrices.

A user searching for a drug in Drug Repurposing explorer should be aware of drug  and also for what purpose it is used.


If any user using this tool found something interesting please mail your results to or It will help to make a good manual for the user while searching the database.

Sunday, November 25, 2012

fingerprints,fragments,side effect of drugs...

After a long time i am writing a blog . Though the material looks quite interesting to me for study the results indicate some essential fingerprints are related to side effect profile as well the compounds therapeutic area and also in target similarity between two compounds.

So lets see what the study tells....

I mapped around 746 compounds from drugbank and SIDER  by CID and then by name of the compounds and  manually checked and made a final dataset for my study. The recent SIDER provides 4500 side effects profiles.The side effect profiles are used to create a Compound - side effect binary matrix .

The side effects are used to calculate the compound similarity using matrix multiplication and normalization the algorithm i used is discussed in paper Metapath . It provides a fast and efficient way to calculate dice coefficient which i used to calculate the similarity more at Rajarshi's Guha Blog .

For the Final data of 746  compounds i removed metals and compounds molecular weight greater than 1000  and ultimately came down to 728 compounds.

I also created a compound- target matrix and compound - atc code matrix ( data is collected from drugbank) from which I calculated the similarity of compounds by side effect, ATC codes and protein  using using the metapath algorithm respectively.

Earlier research from Yamanishi's work on predicting side effects from substructure profile and drug target prediction using learning models  were very good papers . One of the important paper was from Campillos of drug target identification using side effect similarity which focused on similar side effect profile which is kind of similar to the off target profile. Another paper (SLAP) from our lab at Indiana University using semantics to identify target was a good improvement of integrating multiple heterogeneous network and predicting targets.

For my work i used pubchem fingerprints, maccs keys 166 , ECFP 4, ECFP 6 FCFP 4, FCFP 6 to find how much is the relation between the these fingerprints with the side effects and ATC codes and proteins.

I used Tanimoto similarity for the fingerprint based similarity of compounds and the pathsim similarity with side effect,ATC and protein of compounds. After performing a simple correlation study I found pubchem fingerprints are highly correlated with side effect though the correlation is about 0.16 but it was statically significant p-value < 2.2e-16 so as for  maccs keys and extended connectivity fingerprints. Below i provide correlation matrix of the different fingerprints,side effects,ATC Code, protein similarity.

Pubchem Fingerprint similarity Distribution

side effect distribution
Below is the matrix shown

sideffect pubchem maccs ECFP4 ECFP6 FCFP4 FCFP6 atc protein
sideffect 1 0.160535 0.156551 0.142548 0.140111 0.125838 0.12807 0.055374 0.097784
pubchem 0.160535 1 0.585017 0.604037 0.566609 0.617526 0.576539 0.018368 0.132186
maccs 0.156551 0.585017 1 0.523484 0.491311 0.477387 0.434868 0.014732 0.169165
ECFP4 0.142548 0.604037 0.523484 1 0.990811 0.757584 0.777071 0.096187 0.289458
ECFP6 0.140111 0.566609 0.491311 0.990811 1 0.747782 0.787673 0.102716 0.295364
FCFP4 0.125838 0.617526 0.477387 0.757584 0.747782 1 0.981399 0.094613 0.263978
FCFP6 0.12807 0.576539 0.434868 0.777071 0.787673 0.981399 1 0.106107 0.283874
atc 0.055374 0.018368 0.014732 0.096187 0.102716 0.094613 0.106107 1 0.166476
prot 0.097784 0.132186 0.169165 0.289458 0.295364 0.263978 0.283874 0.166476 1

It is found that ECFP and FCFP  shows very good correlation with Protein and atc code similairty fingerprint . But on the other hand pubchem fingeprint shows a best correlation with the side effect .
A question is arised here if the correlation is such low how much does Yamanishi's paper is able to predict the true relations? Can different substrctures methods able to correlate with the side effects ?

Well i am still thinking what's behind the side effects and the impact of right substructures on side effects.

Sunday, August 12, 2012

Kinase compounds polypharmacology can it be studied using Mol Wt & Lipophilicity

I have came across a post in blog FBDD and also from the linkedIn groups about the effect of liphophilicity and Molecular weight in promiscuity. Peter have referred one very good paper from Michael Hahn and Andrew Leach of Molecular complexity and fragment-based drug discovery: ten years on .This review suggest the importance of liphophilicity, Molecular weight,positive charge , Heavy atoms count in promiscuity.According to the paper Molecular complexity is playing a crucial role in polypharmacology. 
My Question is here what do you mean by molecular complexity? 

From the paper they have mentioned the use of mol weight and heavy atom count to in molecular complexity.Several studies were made for example 75000 compounds from pfizer were tested against 220 assays ,they showed that promiscuity decrease with mol weight, to counter it Novartis did 160 HT assays were they found a postive effect of mol wt in promiscuity.They also found that compounds containing a carboxylic acid showed significant higher selectivity.Other than that sprinthorpe seminal work proposed that increased liphophilicity and presence of basic moiety is playing a important role in promiscuity. From the paper mentioned"People at Roche also mentioned presence of positively charged groups and increase promiscuity."

Based on leach paper i did some study of 72 known Kinase inhibitors which was assayed against 442 different kinases studied by Davis etal . The 72 different inhibitors are available at .

I used AlogP for liphophilicty, mol weight, polar surface area and Heavy atoms count against the selectivity score of the compound ar 3 uM. The kinase selectivity score at 3uM is defined as the number of kinases the compound is bound at 3uM to the total number of kinase domain queried.The selectivity scores are being distributed between 0.2 nad 0.7  the The data is available in as additional file in . They have observed "The lowest selectivity scores, and therefore the greatest selectivity, were observed for the MEK inhibitors AZD-6244/ARRY-886 and CI-1040, the MET inhibitor SGX-523, the CSF1R inhibitor GW-2580 and the ERBB2/EGFR inhibitor lapatinib (Tykerb)."

The figure below explains an interesting result. For liphophilicity there exist a negative correlation but for all other it was positve though the correlation was maximum for mol weight around 0.31 but PSA and heavy atoms count it was below 0.1. Both the figure displaying same type of distribution of compounds at 300nM and 3 uM in which lower selectivity observed for different classes of kinase inhibitors.


Most of the compounds studied was having a selectivity below 0.2 which indicates the selectivity of class I and class II. For class I and class II inhibitors they are not much a difference Class I was made for big gate keepers like phenyl alanine and class II made for small and medium gatekeepers.Figure below gives the distribution from selectivity scores of different classes of inhibitors from Davis paper.

I think more compounds needs to get explored for the study of kinase inhibitor selectivity and prosmiscuity. Does these signify that the kinases are selective for compounds? 
However i still have question what other property comes into molecular complexity when dealing with promiscuity?

Saturday, August 11, 2012

PCA Plot

Continuing from the last post i have done some PCA anaylsis using 12 different physicochemical property descriptors including molecular refractivity, atom polarizabilities, bond polarizabilities, hydrogen bond donors and acceptors, petitjean number, topological polar surface area, number of rotatable bonds,liphophilicity XLogP, molecular weight, topological shape and geometrical shape. 

I used CDK for this developed by Rajarshi Guha and plotted the plot using ggplot2

Quite impressive that the compounds which have been screened was in the same area of the PknB inhibitors. Some outliers like Mitoxantrone,straurosporine are their which are scattered in plot . But majority of the predicted ones are in close association with the PknB inhibitors. 

Monday, August 6, 2012

Pharmacophore study for Mycobacterium tuberculosis Ser/thr kinase (PknB)

Pknb Study

This posting is to give you the idea about recent research i have done in Mtb PknB. Previously i have reported some of the predicted inhibitors. Now this time i am again updating some of the research done in identifying more inhibitors.I am here in India from June 21 to august 20 th 2012 under IUSSTF  science and technology scholarship at Birla Institute of technology hyderabad campus.


Previously some work had been done here on PknB inhibitors.(paper under publication),with collaboration with OSDD.They have identified some inhibitors under 1micro molar concentration . Well quite impressive!!
 But the problem with the PknB is that the inhibitors are not capable enough to inhibit the growth in cell line study.

Why is this so?

Is it due to the mycolic acids barrier which don't let compounds to enter the membrane.Not sure about this But more research needs to be done on the branch of drug permeation.

I set my target as PknB which i had been doing earlier and a recent paper by Lougheed team helped me to study the compounds well and also it's pharmacophores.Previously they also reported another group of inhibitors in a paper which was not so much potent as the current ones.But the study done by me helped me to identify the main pharmacophore behind these structures.

PknB a kinase so what it could be a typical kinase like pharmacophore. having hydrogen bond donor's and acceptors and hyrophobic moeity for example class I type inhibitors. Thanks to Jae hong shin  for good presentation on kinase inhibitors which helped me to study the inhibitors of PknB.
Kinase Inhibitors and its pharmacophoric positions.

From a review paper by Zhang etal Targeting cancer with small molecule kinase inhibitors  given in  the pic above we can see most of inhibitors and its scaffolds the basic pharmacophore looks like which is given below having 2 donors and one acceptors and big hydrophobic moiety.

Different types of Kinase pharmacophores.

List of published PknB Inhibitors.

The features of the kinase pharmacophore completely resemble inhibitors identified the Lougheed team. I collected  62 Inhibitors with IC 50 values and generated E-pharmacophore based on the compound VIII given in Figure below.E -Pharmacophores are based on the Energetic contribution of the group of atoms to the receptors .It is generated from the Glide XP descriptors. After docking the 62 compounds compound I showed the best docking score followed by compound VIII and then it was Mitoxantrone. I generated pharmacophores for three compounds.
 The pharmacophore  for compounds was resembling the typical kinase class-I type pharmacophore. I did some enrichment studies and BEDROC scores and AUC studies and found the pharmacophore for compound VIII was most appropiate for PknB till now from the dataset I collected. If anyone wants the dataset of 62 compounds email me at enrichment study 1000 decoys were collected from Schrodinger's website. The pharmacophore was having a high enrichment at 1% ,2%  and 5% of the database hits.
The paper by Salam etal on E-pharmacophore mention the choice of sites which have scores greater than -1 kcal/mol but i have a different scenario here . The ring aromatics score was less than -1 Kcal.mol and one extra donor was  having less than -1kcal/mol score. Still my selected pharmacohore was giving a very good enrichment results.
Not sure anywhere does selection of sites based on the energetic contirbution or not.But i selected the sites based on the structure of a typical kinase pharmacophore. The picture given below represents my pharmacophore .The previous work done not published didn't considered the extra donor D6 on the bottom right side. This donor has some special effect for most of the compounds when this donor site is present it brings down the IC 50 value below 0.1 micro molar which indicates that this is one of the important sites.

The docking pose of the compound is also given below for compound VIII along with the pharmacophore

I have done data fusion using the structure and ligand based methods  pharmacophore ,glide and rocs and  using sum score, sum rank and reciprocal rank . Well reciprocal rank is amazingly performing well giving me one of the best enrichment scores along with a BEDROC value of 0.875 and RIE 12.73.Also the AUC for 1% ,2% and 5% was 0.71,0.75 and 0.81. The next datafusion algorithm worked very well was sum score method with BEDROC of 0.785. Both the data fusion methods performance was better than usual Virtual screening methods.

I have done some screening using the datafusion methods using the Asinex screening library. Will post some materials in next post.

Happy reading my post.

Friday, March 9, 2012

The Above Shows the compounds which are used in Cardiac Therapy and Some are Anti hypertensives.  48 compounds taken from drug bank. The association of the drugs are being done based on the (SLAP), a highly developed powerful tool for drug target prediction . The compounds are being scored based on the semantic association of   1863 targets . After prunning 1500 targets are being set. But further prunning is required to remove unneccessary targets for these compounds. Below is the distribution of compounds with respect to targets quite a big plot . 
The idea is to find most associated proteins for the Cardiac therapy drugs and Anti hypersensitives. Are these compounds Prosmiscuos Or show polypharmacoogy ? some compounds are natural compounds here and these compounds get easily associated with many proteins.

Friday, January 27, 2012

Improved Map of Pknb 27 Inhibitors map made is Gephi...
My Face book network Some important properties: Average Weighted degree:22.44 Network diameter =10 Average Path length:3.87 Average Clustering coefficient:0.553 M
modularity: 0.697

Degree Report


Average Degree: 22.444

Graph Distance Report


Network Interpretation: undirected


Diameter: 10
Radius: 0
Average Path length: 3.8770262057454894
Number of shortest paths: 220486

 Clustering Coefficient Metric Report


Network Interpretation: undirected


Average Clustering Coefficient: 0.553
Total triangles: 39886
The Average Clustering Coefficient is the mean value of individual coefficients.