Pages

Sunday, September 14, 2014

Some R codes and Examples to perform Random Walks on a network.

My research is mainly focused on Drug Target Prediction and Drug to Target to Adverse events prediction. I use Random walk on different heterogeneous networks for this. There are several papers which came up like .

1) Kohler 2008 (Walking the Interactome for Prioritization of Candidate Disease Genes)
2) The power of protein interaction networks for associating genes with diseases
3) Chen Etal ( Drug Target Prediction with Random walk with restart)
4) Genome-wide inferring gene–phenotype relationship by walking on the heterogeneous network


Here i will post some codes in R on how to perform a Normal Random walk on a small network using two method one proposed by Kohler (2008) and another proposed by Vanunu (2010).
The method iteratively simulates random transitions of a walker from a node to a randomly selected neighbour node and at any time step the walk can be restarted depending on a predefined probability. Random walk with restarts is slightly different than PageRank with priors in the way that it normalizes the link weights. The convergence is decided when a probability difference is less than 10e-10 between two consecutive time steps which is calculated by L1 / Frobenius norm. The loop breaks when the difference is less than value. The function assumes two types of method based 

1 Kohler -> which is just a simple random walk
2 Vanunu - > Which is a modification of random walk with restarts such that the link weight is normalized not only by number of outgoing edges but also by number of incoming edges. They mentioned "We chose to normalize the weight of an edge by the degrees of its end-points, since the latter relate to the probability of observing an edge between the same end-points in a random network with the same node degrees".

There are no such parameters applied here and also similarity matrices . This is the heart of the code of Random walk and one can modify on top of this as much as they can.



# Parameter r: restart probability
r<-0.8
# convergence cutoff
conv_cut<-1e-10
RWR <- function(M, p_0, r,conv_cut, prop=FALSE) {
# use Network propagation when prop=TRUE
if(prop) {
w<-colSums(M)
# Degree weighted adjacency matrix
W<-matrix(rep(1,nrow(M)*ncol(M)), nrow(M), ncol(M))
for (i in 1:nrow(M)) {
for (j in 1:ncol(M)) {
v<-sqrt(w[i]*w[j])
W[i,j]<-v
W[j,i]<-v
}
}
W<-M/W
} else {
# Convert M to column-normalized adjacency matrix
W=scale(M,center=F,scale=colSums(M))
}
# Assign equal probabilities to seed nodes
p_0<-p_0/sum(p_0)
p_t<-p_0
# Iterate till convergance is met
while (TRUE ) {
# Calculate new proabalities
p_tx<- (1-r) * W %*% p_t + r * p_0
# Check convergance
print (norm(p_tx-p_t))
if ( norm(p_tx-p_t) < conv_cut ) {
break
}
p_t<-p_tx
}
return(p_tx)
}
# Generate the interaction matrix its a toy set
M<-matrix(rep(0,25),ncol=5)
M[1,]=c(0,1,0,0,0)
M[2,]=c(1,0,1,1,0)
M[3,]=c(0,1,0,1,0)
M[4,]=c(0,1,1,0,1)
M[5,]=c(0,0,0,1,0)
#Initialize the seed nodes
p_0<-t(t(c(0,1,1,0,0)))
# With out propagation
RW<-RWR(M, p_0, r,conv_cut)
print(RW)
# With Propagation
RW<-RWR(M, p_0, r,conv_cut,prop=TRUE)
print(RW)
view raw random_walk.R hosted with ❤ by GitHub


Hope this code makes life simple for some people when they try to understand it :)

Thursday, September 11, 2014

Some function for VS: AUC , BEDROC and RIE in R

Below are some R functions to compute Area under the curve , Robust Initial Enhancement Metric and Boltzmann-Enhanced Discrimination of ROC which is implemented in

Truchon et al. Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" J. Chem. Inf. Model. (2007) 47, 488-508.

These metrics use for the early recognition problem in virtual screening.

AUC
# x = a vector for scores
# y = a vector of labels
function (x, y, decreasing = TRUE, top = 1)
{
if (length(x) != length(y)) {
stop(paste("Length of scores does not match with labels."))
}
N <- length(y)
n <- sum(y == 1)
x_p <- -Inf
area <- 0
fp = tp = fp_p = tp_p = 0
ord <- order(x, decreasing = decreasing)
for (i in seq_along(ord)) {
j <- ord[i]
if (x[j] != x_p) {
if (fp >= (N - n) * top) {
rat <- ((N - n) * top - fp_p)/(fp - fp_p)
area <- area + rat * (fp - fp_p) * (tp + tp_p)/2
return(area/(n * (N - n) * top))
}
area <- area + (fp - fp_p) * (tp + tp_p)/2
x_p <- x[j]
fp_p <- fp
tp_p <- tp
}
if (y[j] == 1) {
tp <- tp + 1
}
else {
fp <- fp + 1
}
}
area <- area + (fp - fp_p) * (tp + tp_p)/2
return(area/(n * (N - n)))
}
view raw auc.R hosted with ❤ by GitHub


RIE

# x = a vector of scores
# y = a vector of labels
function (x, y, decreasing = TRUE, alpha = 20)
{
if (length(x) != length(y)) {
stop(paste("The length of scores should be equal to number of labels."))
}
N <- length(y)
n <- length(which(y == 1))
ord <- order(x, decreasing = decreasing)
m_rank <- which(y[ord] == 1)
s <- sum(exp(-alpha * m_rank/N))
ra <- n/N
ri <- (N - n)/N
random_sum <- (n/N) * (1 - exp(-alpha))/(exp(alpha/N) - 1)
return(s/random_sum)
}
view raw rie.R hosted with ❤ by GitHub


BEDROC

# x = a vector of scores
# y = a vector of labels
function (x, y, decreasing = TRUE, alpha = 20)
{
if (length(x) != length(y)) {
stop(paste("The length of scores should be equal to number of labels."))
}
N <- length(y)
n <- length(which(y == 1))
ord <- order(x, decreasing = decreasing)
m_rank <- which(y[ord] == 1)
s <- sum(exp(-alpha * m_rank/N))
ra <- n/N
ri <- (N - n)/N
random_sum <- ra * exp(-alpha/N) * (1 - exp(-alpha))/(1 -
exp(-alpha/N))
fac <- ra * sinh(alpha/2)/(cosh(alpha/2) - cosh(alpha/2 -
alpha * ra))
cte = 1/(1 - exp(alpha * ri))
return(s/random_sum * fac + cte)
}
view raw bedroc.R hosted with ❤ by GitHub