"Interactive visualization workflows for mitigating analytical uncertai" by Kaustav Bhattacharjee

Author ORCID Identifier

0000-0001-7530-7865

Document Type

Dissertation

Date of Award

12-31-2024

Degree Name

Doctor of Philosophy in Data Science - (Ph.D.)

Department

Data Science

First Advisor

Aritra Dasgupta

Second Advisor

Chase Qishi Wu

Third Advisor

Mengnan Du

Fourth Advisor

Salam Daher

Fifth Advisor

Soumya Kundu

Abstract

This dissertation takes a process-centric and stakeholder-first perspective for handling analytical uncertainty: the form of uncertainty that confronts data analysts' insight-generation processes in high-consequence decision-making scenarios. The cost of an incorrect decision when data is used for movie recommendations as opposed to when personal data is used to drive insights or when data-driven modeling is used to drive real-time decisions for maintaining the health of a grid are vastly different in terms of consequences. This dissertation looks at analytical uncertainty in two real-world scenarios: i) how sensitive information leakage can be prevented during the open data release process with data custodians being the stakeholders, and ii) how errors in energy forecasting can be detected or prevented when deploying them in power systems, with grid operators being the stakeholders. Across both these scenarios, this dissertation investigates how interactive visualization workflows can empower respective data stakeholders to reveal privacy vulnerabilities in open datasets and improve trust in AI forecasting models within the power sector. The first contribution is a systematic analysis of existing visual analytics methods for addressing data privacy and examining research gaps and future opportunities. Building on this foundation, an ethical hacking exercise was conducted to identify vulnerabilities in the open data ecosystem, leading to the second contribution of this dissertation: the development of the PRIVEE workflow, which enables data defenders to assess disclosure risks associated with open datasets. This dissertation showcases the effectiveness of PRIVEE through case studies in collaboration with domain experts. Recognizing the need to understand the utility of linked datasets, the third contribution presents the algorithm for a utility metric and the VALUE interface, allowing users to explore the utility of joining datasets across over 100 open data portals. This can quickly escalate into a combinatorial explosion due to the various factors involved in joining multiple datasets differently. Thus, as the fourth contribution, this dissertation explores how visual analytic interventions can help balance privacy and utility factors in the context of multi-way joins through the web-based interface LinkLens. Finally, the dissertation extends these principles to the energy sector, contributing to the development of the Forte application, which helps grid operators evaluate AI model performance. This work enhances human-data trust and informed decision-making by equipping stakeholders across disparate domains with interactive visualization workflows.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.