Google Slapped With Lawsuit Over Data Used To Train Its AI

The number of lawsuits over the data used to train various AI models is growing quickly, and this time Google is in the crosshairs. The company has been hit with a lawsuit that takes aim at its use of data publicly available on the internet to train its various AI models, which are used to provide tools like Bard. DeepMind, the once-independent company that was acquired years ago and integrated with the Google Brain team back in April, is also part of the lawsuit, which claims that Google "has been secretly stealing everything ever created and shared on the internet" for use as training data.

The news comes only days after OpenAI was slapped with (another) lawsuit involving its own models — in that case, the GPT-3.5 and GPT-4 upon which the ChatGPT name is based. Authors including comedian Sarah Silverman accused OpenAI — via the lawsuit — of violating their book copyrights by including them in training data without permission. Even more, that lawsuit suggested that OpenAI may have used illegal shadow libraries to source the books.

The sticky issue of using public data to train AI

The seemingly sudden explosion of publicly-available chatbots that utilize very capable large language models (LLMs) raised uncomfortable questions about the nature of copyright and how creators can be properly involved in (or, at least, compensated for) the AI-training process. At the heart of the matter are the datasets used to train various AI models, which can include everything from content scraped from random blogs to scientific journals, libraries of published books, social media platforms, and more. Some companies that wield vast quantities of human-generated content like Reddit and Twitter have scrambled to ensure they're paid for the info.

While big companies fight with lawsuits, there are many people indirectly swept up in the matter who don't have the resources to individually challenge tech giants, which is where class action lawsuits may come into play. It's no surprise, then, that Google is facing a proposed class action suit that wants, among other things, for the company to hit pause on providing commercial access to its AI models. 

The legal action comes from Clarkson Law Firm, and one of the attorneys on the case, Tim Giordano, explained the reasoning in a statement to CNN: "Google needs to understand that 'publicly available' has never meant free to use for any purpose. Our personal information and our data is our property, and it's valuable, and nobody has the right to just take it and use it for any purpose." Alphabet, Google, and DeepMind haven't commented on the lawsuit at the time of writing.