Faculty Publications

Implementation-induced Inconsistency and Nondeterminism in Deterministic Clustering Algorithms

Xin Yin, Department of Computer Science
Iulian Neamtiu, Department of Computer Science
Saketan Patil, Department of Computer Science
Sean T. Andrews, Department of Computer Science

Document Type

Conference Proceeding

Publication Date

10-1-2020

Abstract

A deterministic clustering algorithm is designed to always produce the same clustering solution on a given input. Therefore, users of clustering implementations (toolkits) naturally assume that implementations of a deterministic clustering algorithm A have deterministic behavior, that is: (1) two different implementations I1 and I2 of A are interchangeable, producing the same clustering on a given input D, and (2) an implementation produces the same clustering solution when run repeatedly on D. We challenge these assumptions. Specifically, we analyze clustering behavior on 528 datasets, three deterministic algorithms (Affinity Propagation, DBSCAN, Hierarchical Agglomerative Clustering) and the deterministic portion of a fourth (K-means), as implemented in various toolkits; in total, we examined 13 algorithm-toolkit combinations. We found that different implementations of deterministic clustering algorithms make different choices, e.g., default parameter settings, noise insertion, input dataset characteristics. As a result, clustering solutions for a fixed algorithm-dataset combination can differ across runs (nondeterminism) and across toolkits (inconsistency). We expose several root causes of such behavior. We show that remedying these root causes improves determinism, increases consistency, and can even improve efficiency. Our approach and findings can benefit developers, testers, and users of clustering algorithms.

Identifier

85091581734 (Scopus)

ISBN

[9781728157771]

Publication Title

Proceedings 2020 IEEE 13th International Conference on Software Testing Verification and Validation Icst 2020

External Full Text Location

https://doi.org/10.1109/ICST46399.2020.00032

First Page

231

Last Page

242

Grant

W911NF-13-2-0045

Fund Ref

Army Research Laboratory

Recommended Citation

Yin, Xin; Neamtiu, Iulian; Patil, Saketan; and Andrews, Sean T., "Implementation-induced Inconsistency and Nondeterminism in Deterministic Clustering Algorithms" (2020). Faculty Publications. 4940.
https://digitalcommons.njit.edu/fac_pubs/4940

This document is currently not available here.

COinS

DOI

10.1109/ICST46399.2020.00032

Faculty Publications

Implementation-induced Inconsistency and Nondeterminism in Deterministic Clustering Algorithms

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

External Full Text Location

First Page

Last Page

Grant

Fund Ref

Recommended Citation

DOI

Search

Browse

Author Corner

Links

Faculty Publications

Implementation-induced Inconsistency and Nondeterminism in Deterministic Clustering Algorithms

Authors

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

External Full Text Location

First Page

Last Page

Grant

Fund Ref

Recommended Citation

Share

DOI

Search

Browse

Author Corner

Links