Dynamic analytics for spatial data with an incremental clustering approach

Fernando Mendes, Maribel Yasmina Santos, João Moura Pires: Dynamic analytics for spatial data with an incremental clustering approach. In: Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on, pp. 552–559, IEEE 2013.

Abstract

Several clustering algorithms have been extensively used to analyze vast amounts of spatial data. One of these algorithms is the SNN (Shared Nearest Neighbor), a density- based algorithm, which has several advantages when analyzing this type of data due to its ability of identifying clusters of different shapes, sizes and densities, as well as the capability to deal with noise. Having into account that data are usually progressively collected as time passes, incremental clustering approaches are required when there is the need to update the clustering results as new data become available. This paper proposes SNN++, an incremental clustering algorithm based on the SNN. Its performance and the quality of the resulting clusters are compared with the SNN and the results show that the SNN++ yields the same result as the SNN and show that the incremental feature was added to the SNN without any computational penalty. Moreover, the experimental results also show that processing huge amounts of data using increments considerably decreases the number of distances that need to be computed to identify the points’ nearest neighbors.

BibTeX (Download)

@inproceedings{mendes2013dynamic,
title = {Dynamic analytics for spatial data with an incremental clustering approach},
author = { Fernando Mendes and Maribel Yasmina Santos and João Moura Pires},
url = {http://dx.doi.org/10.1109/ICDMW.2013.169},
year  = {2013},
date = {2013-01-01},
booktitle = {Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on},
pages = {552--559},
organization = {IEEE},
abstract = {Several clustering algorithms have been extensively used to analyze vast amounts of spatial data. One of these algorithms is the SNN (Shared Nearest Neighbor), a density- based algorithm, which has several advantages when analyzing this type of data due to its ability of identifying clusters of different shapes, sizes and densities, as well as the capability to deal with noise. Having into account that data are usually progressively collected as time passes, incremental clustering approaches are required when there is the need to update the clustering results as new data become available. This paper proposes SNN++, an incremental clustering algorithm based on the SNN. Its performance and the quality of the resulting clusters are compared with the SNN and the results show that the SNN++ yields the same result as the SNN and show that the incremental feature was added to the SNN without any computational penalty. Moreover, the experimental results also show that processing huge amounts of data using increments considerably decreases the number of distances that need to be computed to identify the points’ nearest neighbors.},
keywords = {Clustering, Incremental Clustering, Shared Nearest Neighbour, Spatial data},
pubstate = {published},
tppubtype = {inproceedings}
}