Faculty Publications

Exposing Model Theft: A Robust and Transferable Watermark for Thwarting Model Extraction Attacks

Ruixiang Tang, Rice University
Curtis Wigington, Adobe Inc.
Hongye Jin, Texas A&M University
Rajiv Jain, Texas A&M University
Mengnan Du, New Jersey Institute of Technology
Xia Hu, Rice University

Document Type

Conference Proceeding

Publication Date

10-21-2023

Abstract

The increasing prevalence of Deep Neural Networks (DNNs) in cloud-based services has led to their widespread use through various APIs. However, recent studies reveal the susceptibility of these public APIs to model extraction attacks, where adversaries attempt to create a local duplicate of the private model using data and API-generated predictions. Existing defense methods often involve perturbing prediction distributions to hinder an attacker's training goals, inadvertently affecting API utility. In this study, we extend the concept of digital watermarking to protect DNNs' APIs. We suggest embedding a watermark into the safeguarded APIs; thus, any model attempting to copy will inherently carry the watermark, allowing the defender to verify any suspicious models. We propose a simple yet effective framework to increase watermark transferability. By requiring the model to memorize the preset watermarks in the final decision layers, we significantly enhance the transferability of watermarks. Comprehensive experiments show that our proposed framework not only successfully watermarks APIs but also maintains their utility.

Identifier

85178134357 (Scopus)

ISBN

[9798400701245]

Publication Title

International Conference on Information and Knowledge Management Proceedings

External Full Text Location

https://doi.org/10.1145/3583780.3614739

First Page

4315

Last Page

4319

Grant

CNS-1816497

Fund Ref

National Science Foundation

Recommended Citation

Tang, Ruixiang; Wigington, Curtis; Jin, Hongye; Jain, Rajiv; Du, Mengnan; and Hu, Xia, "Exposing Model Theft: A Robust and Transferable Watermark for Thwarting Model Extraction Attacks" (2023). Faculty Publications. 1378.
https://digitalcommons.njit.edu/fac_pubs/1378

This document is currently not available here.

COinS

DOI

10.1145/3583780.3614739

Faculty Publications

Exposing Model Theft: A Robust and Transferable Watermark for Thwarting Model Extraction Attacks

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

External Full Text Location

First Page

Last Page

Grant

Fund Ref

Recommended Citation

DOI

Search

Browse

Author Corner

Links

Faculty Publications

Exposing Model Theft: A Robust and Transferable Watermark for Thwarting Model Extraction Attacks

Authors

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

External Full Text Location

First Page

Last Page

Grant

Fund Ref

Recommended Citation

Share

DOI

Search

Browse

Author Corner

Links