Du er ikke logget ind
Beskrivelse
When faced with the reality of a study, the researcher usually has many variables measured or observed in a collection of individuals, intends to study them together, and turns to Data Analysis. He is faced with a variety of techniques and must select the most appropriate for his data but, above all, for his scientific objective.
The researcher will have to consider whether he assigns equal importance to all his variables, that is, whether no variable stands out as the main dependent variable in the research objective. If this is the case, because he is simply handling a set of diverse aspects observed and collected in his sample, he can turn to what could be called descriptive techniques or interdependence analysis techniques (unsupervised learning techniques in the language of Machine Learning) for their treatment as a whole.
And he can do so with two different orientations: on the one hand, to reduce the size of an excessively large data table due to the high number of variables it contains and keep a few fictitious variables that, although not observed, are a combination of real ones and synthesize most of the information contained in his data. You should also take into account the type of variables you are dealing with. If they are quantitative variables, the techniques that allow this treatment are Principal Component Analysis and Factor Analysis, and if they are qualitative variables, you will use Correspondence Analysis. The other possible approach when faced with a collection of variables, without any outstanding dependent variable, would be to classify its individuals into more or less homogeneous groups in relation to the profile they present, in which case you will use Cluster Analysis, in which the groups, not previously defined, will be configured by the variables you use. Other segmentation techniques can also be used
Throughout this book, most of the unsupervised learning techniques for classification are developed from a methodological point of view and from a practical point of view with applications through the R software. The most important techniques for segmentation are discussed in depth, such as hierarchical and non-hierarchical Cluster Analysis, Simple and Multiple Correspondence Analysis, Multidimensional Scaling, and pattern recognition. Kohonen Neural Networks, Convolutional Neural Networks (CNN), Hopfield Neural Networks, Autoencoders, Anomaly Detection and Transfer Learning. For all topics, a methodological introduction is presented followed by illustrative examples and exercises solved with the R software.