Algebraic-datatype taint tracking, with applications to understanding Android identifier leaks
Document Type
Conference Proceeding
Publication Date
8-20-2021
Abstract
Current taint analyses track flow from sources to sinks, and report the results simply as source → sink pairs, or flows. This is imprecise and ineffective in many real-world scenarios; examples include taint sources that are mutually exclusive, or flows that combine sources (e.g., IMEI and MAC Address are concatenated, hashed, leaked vs. IMEI and MAC Address hashed separately and leaked separately). These shortcomings are particularly acute in the context of Android, where sensitive identifiers can be combined, processed, and then leaked, in complicated ways. To address these issues, we introduce a novel, algebraic-datatype taint analysis that generates rich yet concise taint signatures involving AND, XOR, hashing-akin to algebraic, product and sum, types. We implemented our approach as a static analysis for Android that derives app leak signatures-an algebraic representation of how, and where, hardware/software identifiers are manipulated before being exfiltrated to the network. We perform six empirical studies of algebraic-datatype taint tracking on 1,000 top apps from Google Play and their embedded libraries, including: discerning between "raw"and hashed flows which eliminates a source of imprecision in current analyses; finding apps and libraries that go against Google Play's guidelines by (ab)using hardware identifiers; showing that third-party code, rather than app code, is the predominant source of leaks; exposing potential de-anonymization practices; and quantifying how apps have become more privacy-friendly over the past two years.
Identifier
85116251941 (Scopus)
ISBN
[9781450385626]
Publication Title
Esec Fse 2021 Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
External Full Text Location
https://doi.org/10.1145/3468264.3468550
First Page
70
Last Page
82
Grant
CNS-1617584
Fund Ref
National Science Foundation
Recommended Citation
Rahaman, Sydur; Neamtiu, Iulian; and Yin, Xin, "Algebraic-datatype taint tracking, with applications to understanding Android identifier leaks" (2021). Faculty Publications. 3872.
https://digitalcommons.njit.edu/fac_pubs/3872