Pages

Showing posts with label data science. Show all posts
Showing posts with label data science. Show all posts

Wednesday, February 25, 2015

Link Prediction using Network based Inference - A quick matrix based implementation

I explored a paper proposed by Zhou etal used Network based Inference(NBI) method to predict missing links in bipartite network and was thinking a lot how to implement using some simple matrix ways. I have taken the pic below from Zhou paper above  to explain the idea .Given the bipartite graph , a two phase resource transfer Information from  X(x,y,z) set of nodes gets distributed to Y set of nodes and then again goes back to resource X .  This process allows us to define a technique for the calculation of the weight matrix W.  In 2010 a modified version of this approach is proposed in Solving the apparent diversity-accuracy dilemma of recommender systems which used a modified Hybrid algorithm in which the functions defined in NBI and HeatS are combined in connection with a parameter called λ.


In this post i am going to implement the algorithm how does this work using simple matrix method in R. Interested readers must see those publications for the mathematical equations explained. Before going a bit further , if we are given a weight matrix W( which is calculated using the algorithms above) and the adjacency matrix A of the bipartite network, it is possible to compute the recommendation matrix R using the equation below, where W is n x n matrix and A is n x m matrix .

                                                                               R = W.A      (1)

The R list is then sorted in a descending order with respect to the score.

We use this kind of calculations in chemo-genomics predictions and also other bipartite type data. When doing Drug target prediction we can use W is as the sequence similarity matrix and A as the Drug target adjacency matrix to obtain recommendation of targets based on sequence similarity . Similarity W can be a compound similarity matrix and A the bipartite compound target matrix. Now we can use equation (1) above to get recommendations of compounds given a sequence of interest. This trick of using matrix just blowed my mind off !! Isn't it cool ?

Now for the functions here it goes below. If you are using the codes do let me know the results how does it work. My next post would be integrating similarity matrices information along with the degree information into W.







Thursday, December 18, 2014

Adverse Effect mining of FDA data Part 1

.
US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS, formerly AERS) is a database that contains information on adverse event and medication error reports submitted to the FDA. The database is designed to support the FDA's post-marketing safety surveillance program for drug and therapeutic biologic products. In this issue of my blog i have used 2012 quarter 4 data to visualize adverse reactions and also compare the adverse reactions for a group of drugs (group means by ATC code). There are some very important point to note the data is submitted from countries which mean a drug can have different names (brandnames) . So you can search as "Aspirin" on the database you need to search by all the alternate brandnames of Aspirin on the database. For that I have created a brandname based dataset of all the drugbank data which i will use for this example.

For this datasets used :
  • ATC drug group class data
  • Drug Brandnames data
  • 2012 FDA quarter 4 dataset.
R packages used
  • foreach
  • doParallel
  • rCharts(for visualization)
I used foreach with parallel because it takes a lot of time to query the data frames on a single processor making it parallel reduces a lot of timing. I am using rCharts NVD3 library for the interactive pyramid plots developed by Ramnath Vaidyanathan .  The codes for the work is avaialable at here. To simplify things I made a simple query "Aspirin" on the drug dataframe and used grepl with ignore case and drug_seq=1 ( which indicates primary medication) on lines 51-63(Initial.R script)  since aspirin is categorized  under 3 different classes such as Blood and blood forming organs ,Alimentary tract and metabolism and Nervous system , so i extracted all the drugs from each category and extracted the data from drug dataframe for each drugs and merged the results and visualized it as Interactive pyramid plots below.  I have some further ideas on this calculating some statistics based on class and also further categorizing using Mesh Class creating a Shiny app.

If you want the brandnames data with the ATC code classes leave a comment below with your email id or you can shoot an email.

Class Blood and blood forming organs ( Access the full charts at http://abhik1368.github.io/plots/classB.html)


Class Nervous System ( http://abhik1368.github.io/plots/classN.html)


Class Alimentary tract and metabolism (http://abhik1368.github.io/plots/classA.html)