Brief overview of the Omics program area of Arbor Research Collaborative for Health will be provided. Then, I will focus on the project that I am currently most excited about, i.e. clustering of the structures.
Biological systems are complex structures with multiple interacting parts. Thus, correlation matrices describing variable interdependency in such structures provide key information for comparison and classification. Classification based on correlation matrices could supplement classification based on variable values, because the former reveals similarities in system structures, while the latter relies on the similarities in system states.
Existing methods for comparison of correlation matrices arise from representing the differences between two matrices as one number – a similarity measure or a distance, which is calculated by using various methods, including random skewers, T-, and S-statistics. This reductionist approach has at least two limitations: (a) one number cannot adequately represent multidimensional differences; and (b) pairwise distance admits only hierarchical clustering, while other clustering methods use vectors representing multidimensional attributes of the object.
The innovative solution that we developed is to create a “snake” vector, formed by making a serpentine path through the off-diagonal terms of the matrix. The “snake” vector captures information on interactions of attribute variables and thus represents the system structure. This idea is further developed in combining “snakes” with various vectors representing the system state and its overall properties – “dragon vector”. To avoid redundancy, the combined vector is transformed by applying principal component analysis with weights proportional to the percentage of variance explained by each component.
The suggested approach to clustering of the complex structures will be illustrated by examples from brain connectivity matrices, to correlation matrices based on microbiome data, to macroeconomics data.