DeMinify: Neural Variable Name Recovery and Type Inference
Document Type
Conference Proceeding
Publication Date
11-30-2023
Abstract
To avoid the exposure of original source code, the variable names deployed in the wild are often replaced by short, meaningless names, thus making the code difficult to understand and be analyzed. We introduce DeMinify, a Deep-Learning (DL)-based approach that formulates such recovery problem as the prediction of missing features in a Graph Convolutional Network-Missing Features. The graph represents both the relations among the variables and the relations among their types, in which the names or types of some nodes are missing. Moreover, DeMinify leverages dual-task learning to propagate the mutual impact between the learning of the variable names and that of their types. We conducted experiments to evaluate DeMinify in both name recovery and type prediction on a Python dataset with 180k methods and a JavaScript (JS) dataset with 322k files. For variable name prediction, in 76.7% and 81.6% of the cases in Python and JS code respectively, DeMinify can predict correctly the variables' names with a single suggested name. DeMinify relatively improves 15.3%-40.7% and 7.7%-49.7% in top-1 accuracy over the state-of-the-art variable name recovery approaches for Python and JS code, respectively. It also relatively improves 14.5%-51.9% in top-1 accuracy over the existing type prediction approaches. Our experimental results showed that learning of data types helps improve variable name recovery and vice versa.
Identifier
85180556140 (Scopus)
ISBN
[9798400703270]
Publication Title
Esec Fse 2023 Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
External Full Text Location
https://doi.org/10.1145/3611643.3616368
First Page
758
Last Page
770
Grant
CNS-2120386
Fund Ref
National Science Foundation
Recommended Citation
Li, Yi; Yadavally, Aashish; Zhang, Jiaxing; Wang, Shaohua; and Nguyen, Tien N., "DeMinify: Neural Variable Name Recovery and Type Inference" (2023). Faculty Publications. 1310.
https://digitalcommons.njit.edu/fac_pubs/1310