An improved data anonymization algorithm for incomplete medical dataset publishing
Document Type
Conference Proceeding
Publication Date
1-1-2019
Abstract
To protect sensitive information of patients and prevent privacy leakage, it is necessary to deal with data anonymously in medical dataset publishing. Most of the existing anonymity protection technologies discard the records with missing data, and it will cause large differences in characteristics in data anonymization, resulting in severe information loss. To solve this problem, we propose a novel data anonymization algorithm for incomplete medical dataset based on L-diversity algorithm (DAIMDL) in this work. In the premise of preserving records with missing data, DAIMDL clusters data on the basis of the improved k-member algorithm, and uses the information entropy generated by data generalization to calculate the distance in clustering stage. Then, the data groups obtained by clustering are generalized. The experimental results show that it can protect the sensitive attributes of patients better, reduce the information loss during the anonymization process of missing data, and improve the availability of the dataset.
Identifier
85065920765 (Scopus)
ISBN
[9789811368363]
Publication Title
Lecture Notes in Electrical Engineering
External Full Text Location
https://doi.org/10.1007/978-981-13-6837-0_9
e-ISSN
18761119
ISSN
18761100
First Page
115
Last Page
128
Volume
536
Grant
152300410047
Fund Ref
National Natural Science Foundation of China
Recommended Citation
Liu, Wei; Pei, Mengli; Cheng, Congcong; She, Wei; and Wu, Chase Q., "An improved data anonymization algorithm for incomplete medical dataset publishing" (2019). Faculty Publications. 8053.
https://digitalcommons.njit.edu/fac_pubs/8053
