Billion-scale Detection of Isomorphic Nodes
Document Type
Conference Proceeding
Publication Date
1-1-2023
Abstract
This paper presents an algorithm for detecting attributed high-degree node isomorphism. High-degree isomorphic nodes seldom happen by chance and often represent duplicated entities or data processing errors. By definition, isomorphic nodes are topologically indistinguishable and can be problematic in graph ML tasks. The algorithm employs a parallel, 'degree-bounded' approach that fingerprints each node's local properties through a hash, which constrains the search to nodes within hash-defined buckets, thus minimising the number of comparisons. This method scales on graphs with billions of nodes and edges. Finally, we provide isomorphic node oddities identified in real-world data.
Identifier
85169299295 (Scopus)
ISBN
[9798350311990]
Publication Title
2023 IEEE International Parallel and Distributed Processing Symposium Workshops Ipdpsw 2023
External Full Text Location
https://doi.org/10.1109/IPDPSW59300.2023.00046
First Page
230
Last Page
233
Grant
2109988
Fund Ref
National Science Foundation
Recommended Citation
Cappelletti, Luca; Fontana, Tommaso; Reese, Justin; and Bader, David A., "Billion-scale Detection of Isomorphic Nodes" (2023). Faculty Publications. 2303.
https://digitalcommons.njit.edu/fac_pubs/2303