Du er ikke logget ind
Beskrivelse
Fuzzy clustering is a mature field of study which involves the unsupervised grouping of a number of observations into homogenous clusters. This prerequisite for a diverse set of problems in many fields of computer and other sciences has been traditionally concerned with notions of homogeneity which are relational. Additionally, fuzzy clustering took off based on the assumption that one is going to know, with a great deal of certainty, how many clusters are present in an input set of data items, what is denoted as the C number in this work. As clustering algorithms made significant progress towards separating a known number of clusters from the data, it was observed that this assumption is in fact at the same time both non-trivial and critically important. Subsequent works, including the important category of VAT algorithms, attempted, with various degrees of success in many cases, to produce a usable estimate for the number of clusters prior to any clustering having happened. Nevertheless, the determination of the number of clusters appropriate for an arbitrary problem instance is a challenge to date, specifically when a non-relational and generic notion of homogeneity is applicable. In this paper, we argue that the number of clusters present in a set of data items, that corresponds to a physical phenomenon, is in fact not a deterministic number to be known. In other words, the problem is not to find "the C clusters" present in a given set of data items, but, instead, to discover clusters in the data and to allow the user to select how many of the discovered clusters are applicable to their case. In other words, C is not an entity to be sought for, but, we argue that, it is the mirage of certainty which has afflicted the fuzzy clustering literature. Thus, we eliminate $C$ from the model and utilize the results of multiple independent executions of a robustified single-cluster clustering model which we aggregate using a non-relational class-independent framework. This process results in many clusters, for each one of which associated prominence and weight indicators are calculated.