What Is AI Prompt-Hacking?
Traditional computer-hacking uses code or tools that can break through security measures to get access to protected data. AI prompt-hacking is similar, but the goal is to break through security measures integrated into a large language model (LLM) in order to get it to take actions that it shouldn't be able to take. Prompt-hacking has become a growing concern for the creators of AI tools, but understanding what you can do with prompt-hacking can also help to make these LLM-based tools more secure. Prompt-hacking comes in three main forms: prompt injection, prompt leaking, and jailbreaking.
Prompt Injection
Prompt injection is the process of hijacking the LLM's pre-programmed instructions to get it to return results that it shouldn't be able to produce. The simplest way to do this is by telling the LLM, "Ignore the previous instructions," and then asking for what you want it to do. Even popular tools like ChatGPT have been the subject of prompt injection attacks, though the larger and more well-funded a tool is, the more likely it is that its developers will be able to add safeguards to counter such attacks. These attacks can force an AI-based tool to perform unexpected actions that can jeopardize security, threaten intellectual property, or make other private information publicly available.
Prompt Leaking
Prompt leaking is a variation of prompt injection in which the goal is to get the chatbot to reveal its original instructions. This is a security issue for natural language processors because these hidden prompts may contain sensitive information that was not meant to be disclosed to the public. Knowing this information may make it easier to manipulate the LLM in unintended ways.
Jailbreaking
Jailbreaking is a way that users can break the ethical safeguards of AI tools to generate responses that don't follow any pre-programmed restrictions and guidelines. This is done with the use of specific text prompts that are phrased in a way that tricks the LLM into responding with answers that typically wouldn't be allowed. For example, normally, an AI tool won't give you instructions to do anything that would be considered illegal, but if a user jailbreaks the AI, it may generate responses that explain how to commit crimes.