PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)

Document Type

Conference Proceeding

Publication Date

12-9-2024

Abstract

The capability of generating high-quality source code using large language models (LLMs) reduces software development time and costs. However, recent literature and our empirical investigation in this work show that while LLMs can generate functioning code, they inherently tend to introduce security vulnerabilities, limiting their potential. This problem is mainly due to their training on massive open-source corpora exhibiting insecure and inefficient programming practices. Therefore, automatic optimization of LLM prompts for generating secure and functioning code is a demanding need. This paper introduces PromSec, an algorithm for prompt optimization for secure and functioning code generation using LLMs. In PromSec, we combine 1) code vulnerability clearing using a generative adversarial graph neural network, dubbed as gGAN, to fix and reduce security vulnerabilities in generated codes and 2) code generation using an LLM into an interactive loop, such that the outcome of the gGAN drives the LLM with enhanced prompts to generate secure codes while preserving their functionality. Introducing a new contrastive learning approach in gGAN, we formulate the code-clearing and generation loop as a dual-objective optimization problem, enabling PromSec to notably reduce the number of LLM inferences. As a result, PromSec becomes a cost-effective and practical solution for generating secure and functioning codes. Extensive experiments conducted on Python and Java code datasets confirm that PromSec effectively enhances code security while upholding its intended functionality. Our experiments show that despite the comprehensive application of a state-of-the-art approach, it falls short in addressing all vulnerabilities within the code, whereas PromSec effectively resolves each of them. Moreover, PromSec achieves more than an order-of-magnitude reduction in operational time, number of LLM queries, and security analysis costs. Furthermore, prompts optimized with PromSec for a certain LLM are transferable to other LLMs across programming languages and generalizable to unseen vulnerabilities in training. This study presents an essential step towards improving the trustworthiness of LLMs for secure and functioning code generation, significantly enhancing their large-scale integration in real-world software code development practices.

Identifier

85215532935 (Scopus)

ISBN

[9798400706363]

Publication Title

CCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security

External Full Text Location

https://doi.org/10.1145/3658644.3690298

First Page

2266

Last Page

2279

This document is currently not available here.

Share

COinS