On Bioinformatics:
Tribe MCL Algorithm

Today in our journal club we discussed a paper on Tribe-MCL algorithm which uses the Markov Clustering Algorithm (MCL) for finding protein families. Some of the main concerns I have are
a) They say that they have an expertise in detecting Domain Architecture (Domain Architecture means a sequence of conserved domains found in a protein {Refere CDART paper for definition}). This expertise overcomes the problem that previous algorithms had, which detected domains and not the domain architecture. In that case , many a times two proteins in the same protein family and having the same conserved domain landed themselves in a totally different category of biochemical function. We need to discuss, how exactly the mcl algorithm overcomes this problem
b) Second thing they discusses is that try use an algorithm called CAST to remove the low complexity regions {and not promiscous domains}. I need to know how this low complexity region removing is different than the blastp's -F NO flag.
c) What are protein domains in general? How Similar/different are they, when sequence simmilarity is involved?

d) Well they talk about a markov matrix (which is a stochastic/transitional matrix). What does transitional matrix means? Well the matrix in which the columns have a probability vector i.e the columns add up to 1. One of the ways they achieve this is by forming nxn similarity matrix and then giving each column a pecentage weightage by diving each column a value by the sum of total of that column values. Here is an example of how they do it

120 20 40 40
20 120 30 40
30 40 110 22
30 32 03 110

the new matrix will be

120/200 20/212 40/183 40/202
20/200 120/212 30/183 40/202
30/200 40/212 110/183 22/202
30/200 32/212 3/183 110/202

On of the quick markov matrix link I found is
http://www.sosmath.com/matrix/markov/markov.html

One obvious philosphical question that pop's up is that does this trend of relying associating everything based sequence similarity lead us to some a non-hypothetical results? I think that the whole bioinformatics field (or atleast the part of which I know about), is pushing itself towards a Bayesian game where the apriori is that if two DNA fragments have a similar sequence, they a similar X (where X is a set which is growing at a pretty fast speed). A nice sentence that Joao said is that what every we are looking at is Hypothetical, based on the same old hypothesis.....

Comments

Popular posts from this blog