What simply occurred? It isn’t simply OpenAI that retains being sued for allegedly scraping content material to coach its methods. Reddit is suing Anthropic over claims it scraped content material generated by discussion board customers to coach its Claude chatbot with out consent.
In a grievance filed in San Francisco this week, Reddit claims that Anthropic deliberately skilled its LLMs on content material created by Reddit customers with out requesting consent, thereby violating Reddit’s person settlement.
The submitting provides that Anthropic accessed Reddit greater than 100,000 instances, regardless of the AI agency stating that its bots had been blocked from scraping knowledge from the positioning. It provides that Anthropic has been coaching its Claude chatbot on Reddit knowledge since at the least December 2021. The go well with truly features a screenshot showing to point out Claude acknowledging that it was skilled on Reddit person knowledge.
Reddit signed agreements with Google and OpenAI in 2024, value $60 million and $70 million per yr respectively, that enable them entry to posts created by Reddit’s 100+ million each day customers. This enables the posts to seem in solutions offered by the businesses’ respective chatbots. However Anthropic allegedly prefers to not pay for this type of factor.
“Anthropic refused to interact,” the grievance states. “Thus, whereas different AI giants have entered into licensing agreements and agreed to respect customers’ selections (together with by deleting posts that Redditors selected to delete), Anthropic has not.”
Reddit challenges Anthropic’s assertion that it’s “the white knight of the AI business.”
“This case is in regards to the two faces of Anthropic: the general public face that makes an attempt to ingratiate itself into the buyer’s consciousness with claims of righteousness and respect for boundaries and the legislation, and the non-public face that ignores any guidelines that intervene with its makes an attempt to additional line its pockets.”
Reddit CEO Steve Huffman beforehand complained that blocking firms unwilling to pay for knowledge harvesting has been “an actual ache within the ass,” which is why it modified its robots.txt file to exclude bots and crawlers that did not have permission to entry its knowledge.
Anthropic mentioned that it disagrees with Reddit’s claims and can defend itself vigorously.
A Reddit spokesperson mentioned, “We imagine within the Open Web – that doesn’t give Anthropic the appropriate to scrape Reddit content material unlawfully, exploit it for billions of {dollars} in revenue, and disrespect the rights and privateness of our customers.”
“This is not a misunderstanding, it is a sustained effort to extract worth from Reddit whereas ignoring authorized and moral boundaries.”
“We’re submitting this lawsuit according to our Public Content material Coverage and as our closing choice to pressure Anthropic to cease its illegal practices and abide by its claimed values.”
Anthropic isn’t any stranger to lawsuits. In 2024, writers and journalists Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson filed a class-action in opposition to the agency over claims it used pirated copies of their work to coach Claude.
Anthropic has additionally been combating a case in opposition to Common Music that began in 2023. The AI agency is alleged to have used lyrics from at the least 500 songs by musicians, together with Beyoncé, the Rolling Stones and the Seaside Boys, to coach Claude with out permission. In March, a choose rejected a preliminary bid from the music publishers to dam their lyrics from getting used this manner.