Density-Based Clustering Based on Hierarchical Density Estimates

Campello, Ricardo J. G. B.; Moulavi, Davoud; Sander, Joerg

doi:10.1007/978-3-642-37456-2_14

Ricardo J. G. B. Campello²³,
Davoud Moulavi²³ &
Joerg Sander²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7819))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

15k Accesses
563 Citations
23 Altmetric

Abstract

We propose a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed. For obtaining a “flat” partition consisting of only the most significant clusters (possibly corresponding to different density thresholds), we propose a novel cluster stability measure, formalize the problem of maximizing the overall stability of selected clusters, and formulate an algorithm that computes an optimal solution to this problem. We demonstrate that our approach outperforms the current, state-of-the-art, density-based clustering methods on a wide variety of real world data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley (2006)
Google Scholar
Sander, J.: Density-based clustering. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 270–273. Springer (2010)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Int. Conf. Knowl. Discovery and Data Mining (1996)
Google Scholar
Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowl. and Info. Sys. 5, 387–415 (2003)
Article Google Scholar
Sun, H., Huang, J., Han, J., Deng, H., Zhao, P., Feng, B.: gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In: IEEE Int. Conf. Data Mining (2010)
Google Scholar
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. SIGMOD Rec. 28, 49–60 (1999)
Article Google Scholar
Pei, T., Jasra, A., Hand, D., Zhu, A.X., Zhou, C.: Decode: a new method for discovering clusters of different densities in spatial data. Data Mining and Knowl. Discovery 18, 337–369 (2009)
Article MathSciNet Google Scholar
Stuetzle, W., Nugent, R.: A generalized single linkage method for estimating the cluster tree of a density. J. Comp. and Graph. Stat. 19(2), 397–418 (2010)
Article MathSciNet Google Scholar
Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A.: Automatic extraction of clusters from hierarchical clustering representations. In: Pacific-Asia Conf. of Advances in Knowl. Discovery and Data Mining (2003)
Google Scholar
Gupta, G., Liu, A., Ghosh, J.: Automated hierarchical density shaving: A robust automated clustering and visualization framework for large biological data sets. IEEE/ACM Trans. Comp. Biology and Bioinf. 7(2), 223–237 (2010)
Article Google Scholar
Lelis, L., Sander, J.: Semi-supervised density-based clustering. In: IEEE Int. Conf. Data Mining (2009)
Google Scholar
Herbin, M., Bonnet, N., Vautrot, P.: Estimation of the number of clusters and influence zones. Patt. Rec. Letters 22(14), 1557–1568 (2001)
Article MATH Google Scholar
Gupta, G., Liu, A., Ghosh, J.: Hierarchical density shaving: A clustering and visualization framework for large biological datasets. In: IEEE ICDM Workshop on Data Mining in Bioinf. (2006)
Google Scholar
Hartigan, J.A.: Clustering Algorithms. John Wiley & Sons (1975)
Google Scholar
Muller, D.W., Sawitzki, G.: Excess mass estimates and tests for multimodality. J. Amer. Stat. Association 86(415), 738–746 (1991)
MathSciNet Google Scholar
Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10), 977–987 (2001)
Article Google Scholar
Yeung, K., Medvedovic, M., Bumgarner, R.: Clustering gene-expression data with repeated measurements. Genome Biol. 4(5) (2003)
Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Google Scholar
Naldi, M., Campello, R., Hruschka, E., Carvalho, A.: Efficiency issues of evolutionary k-means. Applied Soft Computing 11(2), 1938–1952 (2011)
Article Google Scholar
Paulovich, F., Nonato, L., Minghim, R., Levkowitz, H.: Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Visual. & Comp. Graphics 14(3), 564–575 (2008)
Article Google Scholar
Geusebroek, J.M., Burghouts, G., Smeulders, A.: The Amsterdam library of object images. Int. J. of Computer Vision 61, 103–112 (2005)
Article Google Scholar
Horta, D., Campello, R.J.: Automatic aspect discrimination in data clustering. Pattern Recognition 45, 4370–4388
Google Scholar
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Int. Conf. Knowl. Discovery and Data Mining (1999)
Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classification 2(1), 193–218 (1985)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computing Science, University of Alberta, Edmonton, AB, Canada
Ricardo J. G. B. Campello, Davoud Moulavi & Joerg Sander

Authors

Ricardo J. G. B. Campello
View author publications
You can also search for this author in PubMed Google Scholar
Davoud Moulavi
View author publications
You can also search for this author in PubMed Google Scholar
Joerg Sander
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Jian Pei
Dept. of Computer Science and Information Engineering, Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan
Vincent S. Tseng
Faculty of Engineering and Information Technology, University of Technology Sydney, Broadway, P.O. Box 123, 2007, Sydney, NSW, Australia
Longbing Cao & Guandong Xu &
Asian Office of Aerospace Research and Development (AOARD), Air Force Office of Scientific Research (AFOSR), Air Force Research Laboratory USA, Osaka University, 7-23-17 Roppongi, 106-0032, Minato-ku, Tokyo, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Campello, R.J.G.B., Moulavi, D., Sander, J. (2013). Density-Based Clustering Based on Hierarchical Density Estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37456-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-37456-2_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37455-5
Online ISBN: 978-3-642-37456-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics