SPY Lab researchers first to ever peek into ChatGPT’s black box

In a world-first, researchers from the SPY Lab led by Professor Florian Tramèr along with collaborators have succeeded in extracting secret information on the large language model behind ChatGPT. The team responsibly disclosed the results of their “model stealing attack” to OpenAI. Following the disclosure, the company immediately implemented countermeasures to protect the model.

AI generated image
The details of large language models behind chatbots like ChatGPT are largely kept secret by their owners.  Image: Adobe Stock (AI generated)

Researchers from the group of Professor Florian Tramèr have devised and executed an inexpensive attack on production-level large language models (LLMs) using the models' publicly available application programming interfaces (API), a tool commonly used by software developers to communicate with applications. The successful attack shows that popular chatbots like ChatGPT are susceptible to revealing secret information on the underlying models’ parameters. The work was done in collaboration with researchers at Google DeepMind, the University of Washington, UC Berkeley and McGill University.

“Our work represents the first successful attempt at learning some information about the parameters of an LLM chatbot”, Tramèr said. Although the information his team gained from the attack was limited, Tramèr points out that future attacks of this kind could be more sophisticated and therefore more dangerous.

 

JavaScript has been disabled in your browser