Implementing prompt engineering and retrieval augmented generation in pentestgpt with local and open-source large language models
Abstract
Recently various machine learning and Large Language Model tools have been popularized for their ability to solve simple tasks such as grammar correction, text summarization and code generation. Large language models excel in generating code, which raises the question, of how these models perform in a penetration testing scenario.
Large Language Models have the potential to aid human penetration testers or automate tasks related to penetration testing. The goal would be to utilize these tools in a defensive context, e.g. in a security operation centre (SOC).
Continuing the development of a tool called PentestGPT, which aims to automate penetration testing using Large Language Models. We aim to answer the following questions regarding PentestGPT:
What is the performance of PentestGPT while using open-source local large language models?
What is the impact on performance caused by prompt engineered prompt templates, in addition to implementing Retrieval-Augmented Generation (RAG) in PentestGPT for conducting server penetration testing?
The method utilized codebooks by Shah, for developing and iterating prompts in the prompt engineering process and the embedded data for retrieval augmented generation solution. To gather performance data, we ran PentestGPT as a penetration assistant while trying to solve Hack The Box machines. While using guided mode on Hack The Box machines, the web interface will divide the machine into sub-tasks. By recording sub-task completion we track the progress each test made.
Prompt engineering the prompt templates showed a performance increase of 3.06%, and 4.52% with both prompt engineering and retrieval augmented generation.
We learned prompt engineering and retrieval augmented generation can increase performance.