AI Can Be A Dangerous Tool: Here Are Some Of The Biggest Concerns Among Workers
Are people who know more about AI less likely to trust it? Recently, The Guardian newspaper ran a story about AI workers who warn others to stay away from AI. The interviewees were people who had been employed to train AI. They expressed concerns about unchecked biases, having to assess responses to medical matters they weren't qualified to address, unclear instructions, a lack of training, and unreasonably short deadlines. Many of them now warn their friends and family about the dangers of AI and have banned their children from using it.
Accusations of misinformation or bias in AI are not new. The Guardian article, however, is interesting because it reflects the opinions of people nobody usually thinks to ask – AI testers with first-hand experience of the hours of low-paid human labor behind every AI launch. When high-profile AI experts talk about the risks of AI, people are more likely to listen. The campaign group Pause AI has an AI Probability of Doom list, based on what different individuals say is the likelihood of a "very bad outcome" from AI. The list features AI experts who have written books and academic papers about the dangers of artificial intelligence.
Even big AI power brokers, with a vested interest in people funding and subscribing to AI, urge caution about blindly trusting AI. On the OpenAI podcast in June 2025, CEO Sam Altman said, "People have a very high degree of trust in ChatGPT, which is interesting because AI hallucinates. It should be the tech that you don't trust that much." Unlike the Guardian interviewees, however, Altman isn't discouraging people from using ChatGPT altogether.
What do AI Raters do?
Like the AI workers interviewed by the Guardian, I have served my time in the trenches of AI rating jobs. I imagine that many freelance writers have done the same during quiet spells. This was for third-party companies that are all still actively recruiting, so there's still a high demand for workers. In my case, I never knew which company's AI product I was helping to shape.
You're given tasks like assessing AI responses against benchmarks, or composing AI prompts that test things like an LLM's ability to perform multi-stage requests or handle unclear instructions. Some tasks involved deliberately trying to get the LLM to violate its own rules on offensive content. Many of The Guardian's interviewees said that the time allocated to complete tasks was too short to allow for a considered, thorough result, which was also my experience. As one AI worker put it, "We're expected to help make the model better, yet we're often given vague or incomplete instructions, minimal training, and unrealistic time limits to complete tasks."
Did the experience of being part of a system that prioritized quick turnaround times above all else make me wary of AI? As a human writer who writes about AI, my relationship with it is complicated. I love the tech, but I hate the amount of low-quality AI slop it produces. I have a healthy skepticism about the quality of content it produces, but I wouldn't tell people not to use it. And while all the concerns expressed by the interviewees in the article are valid, it's worth remembering that human AI raters are only one part of the process when it comes to testing and fine-tuning AI models.
How are AI models trained?
There are two main stages when training a GPT large language model. These are language modeling and fine-tuning. During the language modeling stage, AI is trained on enormous amounts of data, including web pages, books, and other text-based data. It uses this data to learn the general patterns of language. It's during the fine-tuning stage that human testers get involved. People review and rank the model's responses, in a process that is intended to make it safer and more helpful, and to ensure it responds in a way that humans understand and can relate to. Companies like OpenAI employ senior research engineers to carry out the more specialized challenges, while a lot of the routine evaluation work is outsourced to third parties and picked up by workers all over the world.
Much of the testing is ongoing, continuing after each model version's release. For example, "red-teaming" is a phrase used for workers who deliberately probe the model for errors, biases, or unsafe behavior. They're effectively trying to break it, and the issues they uncover are used to improve later training. AI companies also encourage people to report errors and to give feedback on the quality of responses.
Despite all these processes, AI still makes mistakes. And sometimes these errors are downright dangerous. The Guardian recently investigated medical advice provided by Google AI Overviews, and found examples where the overview incorrectly answered questions about liver function test results, which meant that people with serious health issues might believe themselves to be fine. As a result of the newspapers' report, Google has now updated the AI and removed the overview for questions about liver function tests.