Open Chemical Information and Research: October 2014

Friday, October 24, 2014

Lean Markov Chain Clustering in R

Markov Chain Clustering (MCL) is fast scalable supervised clustering algorithm based on information flow in graphs. The algorithm finds cluster in graphs by random walks. It uses two important operators one is the inflation and other the expansion. "Expansion takes the power of a stochastic matrix using the normal matrix product. Inflation takes the Hadamard power of a matrix (taking powers entrywise), followed by a scaling step, such that the resulting matrix is stochastic again, i.e. the matrix elements (on each column) correspond to probability values." More information can be found here and here . Below the R code describes how to perform it step by step. Also a nice explanation is presented here .

	# Algorithm to perform MCL Clustering in R
	# Add the identity matrix to the matrix which indicates self loops
	add.selfloops <- function (M) {
	LM<-M+diag(dim(M)[1])
	return (LM);
	}

	# Inflation step of MCL
	inflate <- function (M,inf) {
	M <- M^(inf)
	return (M)
	}

	# Normalize the matrix by column
	norm <- function (M) {
	col.sum <- apply(M,2,sum)
	M <- t(M) / col.sum
	return (t(M))

	}

	# MCL procedure
	mcl <- function (M, # Matrix
	inf, # Inflation value
	iter, # Number of iterations
	verbose = T
	)
	{
	for (i in 1:iter) {
	old.M <- M
	M.norm <- norm(M)
	M <- M.norm%*%M.norm
	M <- inflate(M, inf)
	M <- norm(M)
	# Check to reach steady state vector
	if (sum(old.M == M) == dim(M)[1]*dim(M)[2]) {
	break;
	}
	if (verbose) {
	print (paste ("iteration", i));
	}
	}
	return (M);
	}


	# get the grouping using the matrix names after running the iteration


	group.mcl.clusters <- function (M) {
	M.names <- row.names(M)
	clustered.nodes <- vector(mode = "logical", length = dim(M)[1])
	for (i in 1:dim(M)[1]) {
	nodes <- M.names[which(M[i,] != 0)]
	if (length(nodes) > 0 && !clustered.nodes[which(M[i,] != 0)]) {
	print (nodes)
	clustered.nodes[which(M[i,] != 0)] = T
	}
	}
	}

	## End of functions

	## Example
	## Consider a unipartite network example protein protein interaction data or social network data.
	data <- as.matrix(read.csv("mcl_data.csv", header = T, row.names = 1))
	data <- as.matrix(data)
	data <- add.selfloops(data);
	inf <- 2
	iter<-200
	# Launch MCL

	clusters <- mcl(data,inf,iter, verbose = T);
	group.mcl.clusters(clusters)

view raw mclust.R hosted with ❤ by GitHub

Wednesday, October 1, 2014

Link Prediction using Bipartite Networks .

Missing link prediction of networks is of practical significance in modern science like in Social Networks , Biological Networks and Food networks and lots others. Adamic-Adar index refines the simple counting of common neighbors by assigning the lower connected neighbors more weights which is given by the equation below. More on the other indexes are ,

1 . Link Prediction in Complex Networks

2 . Predicting missing links via local information

3. Link prediction problem in social networks

∑w∈Γ(u)∩Γ(v)1log|Γ(w)|

The code takes a bipartite graph as input (stored as a text file in an adjacency list) and computes the Adamic/Adar similarity of each non-neighboring node pair. The similarity is computed using the degree of the intermediate nodes. The output file is written as a text file containing three fields per row score , Proteins and Drugs. However this can be applied to other bipartite networks also.

After calculation the predicted links are stored in an output file and the highest predicted links can be obtained by sorting the first column. The Bipartite data (inputdata.txt) and code is avialable at Git.

Open Chemical Information and Research

Pages

Friday, October 24, 2014

Lean Markov Chain Clustering in R

Wednesday, October 1, 2014

Link Prediction using Bipartite Networks .

Followers

Blog Archive