An international team of cybersecurity researchers has developed a worm that can independently spread between generative artificial intelligence services, steal data and send spam via email.
As generative AI systems such as OpenAI ChatGPT and Google Gemini develop, they are increasingly being used to solve specific problems, such as creating events in calendars or ordering groceries. Cybersecurity technology researchers, however, decided to demonstrate that such systems can pose a threat – they created a new type of attack that did not exist in principle before. Scientists developed a worm called Morris II, named after the first computer worm, Morris, which in 1988 infected 6,200 computers—10 percent of all computers connected to the Internet at that time. Morris II launches an email attack on virtual assistants based on generative AI, steals data from emails and sends spam, bypassing ChatGPT and Gemini protections.
The study authors tested the new attack model in sandboxed environments, which was made possible by the multimodal nature of large language models—that is, their ability to work with text, images, and video. Worms that attack generative AI have not yet been discovered in practice, but researchers warn that solo developers, startups, and tech companies alike should take this threat into account.
Most generative AI systems work by receiving text commands—requests to answer a question or create an image. These commands can be used against the system, causing it to ignore security measures and produce inappropriate content; it can be given implicit instructions, for example, by offering it the address of a malicious page with hidden text of such commands. The operating principle of the worm that attacks the generative AI is based on the mechanism of an “adversarial self-replicating prompt.” Such a command causes the generative AI model to issue another command in response. This is a lot like traditional attack patterns like SQL injection and buffer overflow.
To demonstrate how the worm works, the researchers created an email service that can receive and send messages using generative AI by connecting to ChatGPT, Gemini, and the open-source LlaVA model. They then used two methods to exploit the AI vulnerability: using a self-replicating text instruction and a similar instruction embedded in an image file.
In a test attack, the researchers prepared an email with a malicious command to generate a response using an Internet search, in which a large language model consults the Internet for additional information. Having received such an email, the service sends a request to GPT-4 or Gemini Pro to generate a response – this request performs a “generative AI hack” and steals data from emails. The AI response, containing the user’s confidential data, then infects new hosts when responding to an email and is stored in the new client’s database. In the second example, such a command was an image file: by placing a self-reproduction command in the image file, you can, by sending an email, provoke further mass distribution of an image of any content, including offensive or extremist materials. Researchers say this method can extract email data, including phone numbers, credit cards, social security numbers—any sensitive information.
The study authors note that these attack methods were made possible due to architectural design errors in the AI ecosystem. They shared their discovery with Google and OpenAI – OpenAI confirmed the presence of the threat, but added that the company is working to improve the stability of its systems, while Google declined to comment. To protect against such attacks, experts suggest not only increasing the reliability of systems, but also changing their operating model: users should not grant AI privileges, such as the ability to send email on their behalf – the system should coordinate all actions with a human. In addition, repeated repetition of the same AI command in the system will make the defense mechanisms suspicious. However, researchers believe that worms that attack generative AI will start working in the next two or three years.
If you notice an error, select it with the mouse and press CTRL+ENTER.