Exposing Model Theft: A Robust and Transferable Watermark for Thwarting Model Extraction Attacks
Document Type
Conference Proceeding
Publication Date
10-21-2023
Abstract
The increasing prevalence of Deep Neural Networks (DNNs) in cloud-based services has led to their widespread use through various APIs. However, recent studies reveal the susceptibility of these public APIs to model extraction attacks, where adversaries attempt to create a local duplicate of the private model using data and API-generated predictions. Existing defense methods often involve perturbing prediction distributions to hinder an attacker's training goals, inadvertently affecting API utility. In this study, we extend the concept of digital watermarking to protect DNNs' APIs. We suggest embedding a watermark into the safeguarded APIs; thus, any model attempting to copy will inherently carry the watermark, allowing the defender to verify any suspicious models. We propose a simple yet effective framework to increase watermark transferability. By requiring the model to memorize the preset watermarks in the final decision layers, we significantly enhance the transferability of watermarks. Comprehensive experiments show that our proposed framework not only successfully watermarks APIs but also maintains their utility.
Identifier
85178134357 (Scopus)
ISBN
[9798400701245]
Publication Title
International Conference on Information and Knowledge Management Proceedings
External Full Text Location
https://doi.org/10.1145/3583780.3614739
First Page
4315
Last Page
4319
Grant
CNS-1816497
Fund Ref
National Science Foundation
Recommended Citation
Tang, Ruixiang; Wigington, Curtis; Jin, Hongye; Jain, Rajiv; Du, Mengnan; and Hu, Xia, "Exposing Model Theft: A Robust and Transferable Watermark for Thwarting Model Extraction Attacks" (2023). Faculty Publications. 1378.
https://digitalcommons.njit.edu/fac_pubs/1378