OpenAI accidentally deleted potential evidence in New York Times copyright lawsuit case

21.11.2024 21:18

Mashable

OpenAI may have accidentally deleted important data related to its ongoing copyright lawsuit brought by the New York Times.

First reported by TechCrunch, counsel for the Times and its co-plaintiff Daily News sent a letter to the judge overseeing the case, detailing how "an entire week’s worth of its experts' and lawyers' work" was "irretrievably lost." OpenAI had provided the plaintiffs with two dedicated virtual machines for researching alleged instances of copyright infringement. According to the letter, on Nov. 14, "programs and search result data stored on one of the dedicated virtual machines was erased by OpenAI engineers."

The Times has accused OpenAI, and Microsoft which uses OpenAI's models for its Bing AI chatbot, of copyright infringement by training its models on paywalled and unauthorized content. The lawsuit detailed multiple instances of "near-verbatim" copy in ChatGPT responses. OpenAI has refuted this claim, saying their models were trained on publicly available data, and therefore fair use under copyright laws. The case hinges on the Times being able to prove that OpenAI's models copied and used its content without compensation or credit.

OpenAI was able to recover most of the erased data, but the "folder structure and file names" of the work was unrecoverable, rendering the data unusable. Now, the plaintiffs' counsel must start their evidence gathering from scratch. In the letter, counsel affirmed that there's "no reason to believe [the erasure] was intentional," but also pointed out how "OpenAI is in the best position to search its own datasets." The AI company has avoided sharing any detail about its training data.

Other similar copyright claims have been filed against OpenAI. But a lawsuit from Raw Story and AlterNet was recently dismissed because the plaintiffs could not prove enough harm to support their claims. Meanwhile, OpenAI has struck licensing deals with several media companies, to use their work for training and providing ChatGPT responses with citations. Recently, Adweek reported that OpenAI is paying publishing giant Dotdash Meredith at least $16 million a year to license its content.