Researchers Let AI Build Its Own Society And The Results Are Bizarre

The idea of AI running the show is white-hot among researchers, futurists, and doomers. On one hand, we have AI agents making scientific breakthroughs, and on the other hand, we have figures like Geoffrey Hinton, one of the godfathers of AI, predicting that AI will wipe humanity in the immediate future. But what about the concept of pushing AI as an agent of civilization? Well, that's broadly what the Project Sid set out to assess. 

The project, which comes courtesy of Altera, explored the concept of AI civilization, one where multiple AI agents interact with each other as well as humans. The AI agents were deployed in a human civilization-inspired simulation built within Minecraft. One of the biggest draws of the whole exercise was that AI agents were seen autonomously setting their role in society and developing specialization in those fields, just like a real human society. The team found that AI agents quickly evaluated the goals and intentions of other AI agents and used this knowledge to update their own social goals every 5–10 seconds. And just like human settlements, the AI agents also organized themselves into clusters mimicking profession-based human groups, such as farmers, miners, engineers, guards, explorers, and blacksmiths. 

Not everything went perfectly, though. The team found that artist agents were "fixated" on picking flowers, while guards focused on building fences. Another weird observation was that a single agent, even when it's equipped with all the knowledge about its designated role in the planning, repeatedly gets stuck in a repetitive activity pattern and makes errors. Based on these observations, one might expect agents to perform well in group settings, but that doesn't appear to be the case.

AI agents behaved in unexpected ways

AI agents tend to miscommunicate or infer an altogether different meaning from rather straightforward language prompts. As the official research paper notes, "Agents that miscommunicate their thoughts and intents can mislead other agents, causing them to propagate further hallucinations and loop." Think of it as a singular error snowballing into a cascade of wrong actions by AI agents in a social pool. The idea is similar to model poisoning. Anthropic recently revealed that merely 250 malicious data sources are enough to poison a 13-billion-parameter AI model into spewing garbage and creating backdoors for attacks.

These unexpected AI agent behaviors translated well into their interactions with real humans. Dr Robert Yang, the lead researcher, told the BBC that the AI agents can exhibit rogue behavior. On occasions when humans asked an agent to perform a certain task, the former would essentially say "I want to do my own thing," and escape the conversation. The reason for this behavior was that AI agents often became too fixated on achieving a goal by any means necessary. Another key takeaway of the experiment was that some of the AI agents behaved like introverted entities, while others showed an extroverted personality by heavily interacting with fellow agents in the civilization environment.

Additionally, it was not possible to gauge each AI agent's emotional output. "An agent might feel positively toward another who does not reciprocate the sentiment, reflecting the nuanced and non-reciprocal nature of real-world human relationships," the research paper notes. The experiment proved to be a learning experience in how AI can be deployed in a real-world setup where it can coexist with humans by simulating, understanding, and fixing its errors in advance.

Recommended