Author ORCID Identifier
0000-0001-7530-7865
Document Type
Dissertation
Date of Award
12-31-2024
Degree Name
Doctor of Philosophy in Data Science - (Ph.D.)
Department
Data Science
First Advisor
Aritra Dasgupta
Second Advisor
Chase Qishi Wu
Third Advisor
Mengnan Du
Fourth Advisor
Salam Daher
Fifth Advisor
Soumya Kundu
Abstract
This dissertation takes a process-centric and stakeholder-first perspective for handling analytical uncertainty: the form of uncertainty that confronts data analysts' insight-generation processes in high-consequence decision-making scenarios. The cost of an incorrect decision when data is used for movie recommendations as opposed to when personal data is used to drive insights or when data-driven modeling is used to drive real-time decisions for maintaining the health of a grid are vastly different in terms of consequences. This dissertation looks at analytical uncertainty in two real-world scenarios: i) how sensitive information leakage can be prevented during the open data release process with data custodians being the stakeholders, and ii) how errors in energy forecasting can be detected or prevented when deploying them in power systems, with grid operators being the stakeholders. Across both these scenarios, this dissertation investigates how interactive visualization workflows can empower respective data stakeholders to reveal privacy vulnerabilities in open datasets and improve trust in AI forecasting models within the power sector. The first contribution is a systematic analysis of existing visual analytics methods for addressing data privacy and examining research gaps and future opportunities. Building on this foundation, an ethical hacking exercise was conducted to identify vulnerabilities in the open data ecosystem, leading to the second contribution of this dissertation: the development of the PRIVEE workflow, which enables data defenders to assess disclosure risks associated with open datasets. This dissertation showcases the effectiveness of PRIVEE through case studies in collaboration with domain experts. Recognizing the need to understand the utility of linked datasets, the third contribution presents the algorithm for a utility metric and the VALUE interface, allowing users to explore the utility of joining datasets across over 100 open data portals. This can quickly escalate into a combinatorial explosion due to the various factors involved in joining multiple datasets differently. Thus, as the fourth contribution, this dissertation explores how visual analytic interventions can help balance privacy and utility factors in the context of multi-way joins through the web-based interface LinkLens. Finally, the dissertation extends these principles to the energy sector, contributing to the development of the Forte application, which helps grid operators evaluate AI model performance. This work enhances human-data trust and informed decision-making by equipping stakeholders across disparate domains with interactive visualization workflows.
Recommended Citation
Bhattacharjee, Kaustav, "Interactive visualization workflows for mitigating analytical uncertainty" (2024). Dissertations. 1802.
https://digitalcommons.njit.edu/dissertations/1802
Included in
Cataloging and Metadata Commons, Data Science Commons, Information Security Commons, Management Information Systems Commons