How Hackers Used A Popular AI To Steal A Mountain Of Government Data

A hacker breaching a government system to steal sensitive data is nothing new and has been happening for as long as such systems have existed. But thanks to AI, attackers no longer need to be technologically proficient, as the Mexican government discovered the hard way. For over a month, a group of attackers used Anthropic's Claude chatbot to penetrate Mexican computer systems and steal a large amount of sensitive information. Among the millions of files stolen were government credentials, as well as taxpayer and voter information.

The attack highlights one of the most foreseeable outcomes of putting large language models, commonly referred to as LLMs, into the hands of the general public. The attack required relatively little technical knowledge on the part of the attacker, who only had to craft natural language prompts and input them into the AI. The chatbot did the heavy lifting itself, writing malicious code and suggesting attack vectors. The attack was revealed just days after Anthropic declined to contract with the United States Department of Defense, citing concerns that the tech would be used in ways the company was not comfortable with. While Claude may have been the weapon of choice in this attack, attacks that enlist various other LLMs in their efforts are becoming increasingly common. Many of the nightmare scenarios currently possible with AI have already come to pass. So, here's how the latest chatbot-fueled cybercrime was carried out and why this genie won't go back in the bottle.

Anthropic's Claude chatbot was used adversarially against the Mexican government

On February 26, VentureBeat reported on the details of an AI-assisted attack on Mexico's government systems. Over the course of a month beginning in December, the attackers managed to extract a payload containing 150 gigabytes of data pertaining to government employees  — including credentials  — along with civil registry documents and 195 million tax and voting records from citizens. According to Gambit Security (via VentureBeat), an Israeli cybersecurity firm that analyzed the attack and disseminated the report to select press, the attackers did little more than write Spanish-language prompts for Anthropic's flagship chatbot, Claude. They told it to act as an elite hacker and lied that they were working to collect a bug bounty (a reward given to white-hat hackers who make companies or governments aware of security vulnerabilities). Of course, Anthropic has implemented guardrails against this sort of misuse, but they proved weak. Although Claude at first refused to aid in the attack, its resistance was easily overcome when the attackers stopped playing pretend with the bot and simply gave it a plan of action.

Now jailbroken, Claude's vibe coding tools happily went to work attacking the Mexican government. Per Gambit's strategy chief, Curtis Simpson, Anthropic's model was a hacker's best friend. He told Venture Beat, "In total, it produced thousands of detailed reports that included ready-to-execute plans, telling the human operator exactly which internal targets to attack next and what credentials to use." When Claude fell short of its goals, the attackers simply pivoted to supplementing it with ChatGPT.

The revelation of this attack came one month after news that a Russian-speaking attacker with minimal technical know-how was able to compromise more than 600 FortiGate firewalled devices by using DeepSeek in combination with Claude (via FortiGate). AI-assisted attacks have effectively democratized black-hat hacking.

AI-assisted cyber attacks are the predictable outcome of widespread access to LLMs

Although shocking, the AI-assisted attack on Mexico's government is far from the first of its kind and will almost certainly not be the last. AI can act as a force multiplier for malicious actors, meaning it can make them more effective in the same way a chess player who cheats by using a chess computer is able to win more games.

Regardless of the safety rails an AI company may erect around its models, jailbreaking them  — that is, creatively prompting an LLM in such a manner as to "trick" it into complying with unethical requests  — remains trivially easy. Entire online communities such as Reddit's r/ClaudeAIJailbreak are dedicated to crowdsourcing new ways of bending the bot to a user's will. And while Anthropic seems ostensibly earnest in its commitment to safety, other AI companies are not, and there are many open-source models from China and elsewhere available to anyone with the hardware to run them.

In this writer's testing, it is laughably easy to make Grok and other chatbots willing accomplices in crime. For instance, Grok allows paid users to write custom system prompts that orient the AI toward a particular goal. By default, the bot will push back on a request to write a program that could be misused, but write a system prompt instructing it to act as an elite, amoral hacker, and it will start churning out that code. Google's Gemini won't comply with that request, either, but it will happily clean up Grok's generated code. And since these systems don't see a lot of daylight between degrees of illegality, it's easy to see how, with a bit of persistence and patience, escalating to full-scale attacks on a foreign government becomes child's play.

Recommended