I am a biologist with some maths background... but not enough knowledge to solve this problem, so I would be really grateful if someone could help (and explain it at a level that a biologist might understand).
I have a large symmetrical matrix (1250 x 1250) which is made up of pairwise correlations between 1250 variables, such that row A = 1, A1B2, A1B3, ... A1B1250 and so on for all 1250 rows.
I would like to calculate modules of related variables, with no prior expectation about how many modules the 1250 variables will fall into.
I have some vague instructions that I should use the absolute correlation values to calculate edge weights, then use edge weights to determine clustering. The problem is that these instructions are too vague for me to make sense of. Specifically, I would like to know:
- How are edge weights calculated for this kind of symmetrical matrix?
- How are edge weights used to cluster variables?
I'd be very grateful for any help offered, either specific formula and explanation, or simply pointing me in the direction of another useful resource.
Thanks very much in advance,
Anna
1) Those very same pair-wise correlation values could be used as weights. When people say "weights" what is meant is a measure of either similarity or dissimilarity between any two pair of elements so that you can use that to guide your clustering into groups of elements "very similar" (something that must be clarified what it means for each clustering problem) among themselves and "very different" (idem) against those of other groups. In short, you seek high intra-cluster similarity and small inter-cluster similarity.
Thus you can't just use blindly any measure as weights. It depends on what those measures stand for. For instance you might want to ignore the sign of correlations if anti-correlation (as in $-1$ pearson correlation) is not a significant distinction in your problem.
Thus, in this step it plays a role your knowledge of the Biological problem you are trying to solve.
2)One possible, unsupervised method is Hierarchical Cluster. See wikipedia page https://en.wikipedia.org/wiki/Hierarchical_clustering . See also http://www.ibm.com/developerworks/library/os-weka2/index.html
You may want to look for some software that does it for you. Like Weka mentioned in the above link: http://www.cs.waikato.ac.nz/ml/weka/