José Guilherme Moreira, December 2013

Input parameters self-tuning on the SNN algorithm

Maribel Yasmina Santos (superv.), Universidade do Minho, December 2013.

Keywords: density-based clustering, SNN, shared nearest neighbour, input parameters tuning

Abstract: Recent technological developments have lead to a ever increasing rate in data collection. Organisations are facing several challenges when they try to analyse this vast amount of data with the aim of extracting useful information. This analytical capacity needs to be enhanced with tools capable of dealing with big data sets without making the analytical process a difficult task. Clustering is usually used, as this technique does not require any a priori knowledge about the data. However, clustering algorithms usually require one or more input parameters that influence the clustering process and the results that can be obtained.
This work analyses the relation between the three input parameters of the SNN (Shared Nearest Neighbour) algorithm through extensive brute-force executions and finds some strong relations between them. These findings help to propose an heuristic suitable for the identification and suggestion of the SNN input parameters. The proposed heuristic is validated using different data sets that the ones used for the heuristic development.
The solution is very useful because it allows the user to avoid a considerable time spent on trial and error executions. It suggests the user an initial quality clustering result, that while not definitive, it is a good starting point for the clustering analysis.