For most people, the idea of using artificial intelligence tools in daily life—or even just messing around with them—has only become mainstream in recent months, with new releases of generative AI tools from a slew of big tech companies and startups, like OpenAI’s ChatGPT and Google’s Bard. But behind the scenes, the technology has been proliferating for years, along with questions about how best to evaluate and secure these new AI systems. On Monday, Microsoft is revealing details about the team within the company that since 2018 has been tasked with figuring out how to attack AI platforms to reveal their weaknesses.
In the five years since its formation, Microsoft’s AI red team has grown from what was essentially an experiment into a full interdisciplinary team of machine learning experts, cybersecurity researchers, and even social engineers. The group works to communicate its findings within Microsoft and across the tech industry using the traditional parlance of digital security, so the ideas will be accessible rather than requiring specialized AI knowledge that many people and organizations don’t yet have. But in truth, the team has concluded that AI security has important conceptual differences from traditional digital defense, which require differences in how the AI red team approaches its work.
“When we started, the question was, ‘What are you fundamentally going to do that’s different? Why do we need an AI red team?’” says Ram Shankar Siva Kumar, the founder of Microsoft’s AI red team. “But if you look at AI red teaming as only traditional red teaming, and if you take only the security mindset, that may not be sufficient. We now have to recognize the responsible AI aspect, which is accountability of AI system failures—so generating offensive content, generating ungrounded content. That is the holy grail of AI red teaming. Not just looking at failures of security but also responsible AI failures.”
Shankar Siva Kumar says it took time to bring out this distinction and make the case that the AI red team’s mission would really have this dual focus. A lot of the early work related to releasing more traditional security tools like the 2020 Adversarial Machine Learning Threat Matrix, a collaboration between Microsoft, the nonprofit R&D group MITRE, and other researchers. That year, the group also released open source automation tools for AI security testing, known as Microsoft Counterfit. And in 2021, the red team published an additional AI security risk assessment framework.
Over time, though, the AI red team has been able to evolve and expand as the urgency of addressing machine learning flaws and failures becomes more apparent.
In one early operation, the red team assessed a Microsoft cloud deployment service that had a machine learning component. The team devised a way to launch a denial of service attack on other users of the cloud service by exploiting a flaw that allowed them to craft malicious requests to abuse the machine learning components and strategically create virtual machines, the emulated computer systems used in the cloud. By carefully placing virtual machines in key positions, the red team could launch “noisy neighbor” attacks on other cloud users, where the activity of one customer negatively impacts the performance for another customer.
The red team ultimately built and attacked an offline version of the system to prove that the vulnerabilities existed, rather than risk impacting actual Microsoft customers. But Shankar Siva Kumar says that these findings in the early years removed any doubts or questions about the utility of an AI red team. “That’s where the penny dropped for people,” he says. “They were like, ‘Holy crap, if people can do this, that’s not good for the business.’”
Crucially, the dynamic and multifaceted nature of AI systems means that Microsoft isn’t just seeing the most highly resourced attackers targeting AI platforms. “Some of the novel attacks we’re seeing on large language models—it really just takes a teenager with a potty mouth, a casual user with a browser, and we don’t want to discount that,” Shankar Siva Kumar says. “There are APTs, but we also acknowledge that new breed of folks who are able to bring down LLMs and emulate them as well.”
As with any red team, though, Microsoft’s AI red team isn’t just researching attacks that are being used in the wild right now. Shankar Siva Kumar says that the group is focused on anticipating where attack trends may go next. And that often involves an emphasis on the newer AI accountability piece of the red team’s mission. When the group finds a traditional vulnerability in an application or software system, they often collaborate with other groups within Microsoft to get it fixed rather than take the time to fully develop and propose a fix on their own.
“There are other red teams within Microsoft and other Windows infrastructure experts or whatever we need,” Shankar Siva Kumar says. “The insight for me is that AI red teaming now encompasses not just security failures, but responsible AI failures.”