Du er ikke logget ind
Beskrivelse
Clustering is a computational technique for the assignment of objects into groups of similar elements. Generally, it is widely used for business data interpretation, natural language analyses, and image processing. Typical bioinformatic applications are the detection of homologous proteins and the identification of co-expressed genes. Here, we introduce Transitivity Clustering and its accompanying software framework TransClust, a homogeneous data partitioning method based on Weighted Transitive Graph Projection. It aims for unraveling hidden transitive substructures in a given similarity graph deduced from a pairwise similarity measure. Transitivity Clustering is an efficient technique that is capable of processing hundreds of thousands of data points while still being robust against outliers and noise. A single, intuitive density parameter determines the number and the size of the clusters; with provable attributes. In addition, we present extensions of the underlying graph model in order to create hierarchies and overlaps, as well as comparisons against alternative clustering approaches and real-world application cases.