An improved data anonymization algorithm for incomplete medical dataset publishing

Document Type

Conference Proceeding

Publication Date

1-1-2019

Abstract

To protect sensitive information of patients and prevent privacy leakage, it is necessary to deal with data anonymously in medical dataset publishing. Most of the existing anonymity protection technologies discard the records with missing data, and it will cause large differences in characteristics in data anonymization, resulting in severe information loss. To solve this problem, we propose a novel data anonymization algorithm for incomplete medical dataset based on L-diversity algorithm (DAIMDL) in this work. In the premise of preserving records with missing data, DAIMDL clusters data on the basis of the improved k-member algorithm, and uses the information entropy generated by data generalization to calculate the distance in clustering stage. Then, the data groups obtained by clustering are generalized. The experimental results show that it can protect the sensitive attributes of patients better, reduce the information loss during the anonymization process of missing data, and improve the availability of the dataset.

Identifier

85065920765 (Scopus)

ISBN

[9789811368363]

Publication Title

Lecture Notes in Electrical Engineering

External Full Text Location

https://doi.org/10.1007/978-981-13-6837-0_9

e-ISSN

18761119

ISSN

18761100

First Page

115

Last Page

128

Volume

536

Grant

152300410047

Fund Ref

National Natural Science Foundation of China

This document is currently not available here.

Share

COinS