Friday, October 24, 2014

Lean Markov Chain Clustering in R

Markov Chain Clustering (MCL) is fast scalable supervised clustering algorithm based on information flow in graphs. The algorithm finds cluster in graphs by random walks. It uses two important operators one is the inflation and other the expansion. "Expansion takes the power of a stochastic matrix using the normal matrix product. Inflation takes the Hadamard power of a matrix (taking powers entrywise), followed by a scaling step, such that the resulting matrix is stochastic again, i.e. the matrix elements (on each column) correspond to probability values." More information can be found here and here . Below the R code describes how to perform it step by step. Also a nice explanation is presented here

Wednesday, October 1, 2014

Link Prediction using Bipartite Networks .

Missing link prediction of networks is  of practical significance in modern science like in Social Networks , Biological Networks and Food networks and lots others. Adamic-Adar  index refines the simple counting of common neighbors by assigning the lower connected neighbors more weights which is given by the equation below. More on the other indexes are ,


The code takes a bipartite graph as input (stored as a text file in an adjacency list) and computes the Adamic/Adar similarity of each non-neighboring node pair. The similarity is computed using the degree of the intermediate nodes. The output file is written as a text file containing three fields per row score , Proteins and Drugs. However this can be applied to other bipartite networks also.

After calculation the predicted links are stored in an output file and the highest predicted links can be obtained by sorting the first column. The Bipartite data (inputdata.txt) and code is avialable at Git.