Understanding the SNN Input Parameters and How They Affect the Clustering Results

Guilherme Moreira, Maribel Yasmina Santos, João Moura Pires, João Galvão: Understanding the SNN Input Parameters and How They Affect the Clustering Results. In: International Journal of Data Warehousing and Mining (IJDWM), 11 (3), pp. 26–48, 2015.

Abstract

Huge amounts of data are available for analysis in nowadays organizations, which are facing several challenges when trying to analyze the generated data with the aim of extracting useful information. This analytical capability needs to be enhanced with tools capable of dealing with big data sets without making the analytical process an arduous task. Clustering is usually used in the data analysis process, as this technique does not require any prior knowledge about the data. However, clustering algorithms usually require one or more input parameters that influence the clustering process and the results that can be obtained. This work analyses the relation between the three input parameters of the SNN (Shared Nearest Neighbor) clustering algorithm, providing a comprehensive understanding of the relationships that were identified between k, Eps and MinPts, the algorithm's input parameters. Moreover, this work also proposes specific guidelines for the definition of the appropriate input parameters, optimizing the processing time, as the number of trials needed to achieve appropriate results can be substantial reduced.

BibTeX (Download)

@article{moreira2015understanding,
title = {Understanding the SNN Input Parameters and How They Affect the Clustering Results},
author = { Guilherme Moreira and Maribel Yasmina Santos and João Moura Pires and João Galvão},
url = {http://www.igi-global.com/article/understanding-the-snn-input-parameters-and-how-they-affect-the-clustering-results/129523},
doi = {10.4018/IJDWM.2015070102},
year  = {2015},
date = {2015-06-01},
journal = {International Journal of Data Warehousing and Mining (IJDWM)},
volume = {11},
number = {3},
pages = {26--48},
publisher = {IGI Global},
abstract = {Huge amounts of data are available for analysis in nowadays organizations, which are facing several challenges when trying to analyze the generated data with the aim of extracting useful information. This analytical capability needs to be enhanced with tools capable of dealing with big data sets without making the analytical process an arduous task. Clustering is usually used in the data analysis process, as this technique does not require any prior knowledge about the data. However, clustering algorithms usually require one or more input parameters that influence the clustering process and the results that can be obtained. This work analyses the relation between the three input parameters of the SNN (Shared Nearest Neighbor) clustering algorithm, providing a comprehensive understanding of the relationships that were identified between k, Eps and MinPts, the algorithm's input parameters. Moreover, this work also proposes specific guidelines for the definition of the appropriate input parameters, optimizing the processing time, as the number of trials needed to achieve appropriate results can be substantial reduced.},
keywords = {Clustering, Spatial Clustering},
pubstate = {published},
tppubtype = {article}
}