Stay up to date with notifications from The Independent

Notifications can be managed in browser preferences.

ChatGPT-maker braces for fight with New York Times and authors on 'fair use' of copyrighted works

A barrage of high-profile lawsuits in a New York federal court will test the future of ChatGPT and other artificial intelligence products that wouldn’t be so eloquent had they not ingested huge troves of copyrighted human works

Matt O'Brien
Tuesday 09 January 2024 21:25 GMT

A barrage of high-profile lawsuits in a New York federal court will test the future of ChatGPT and other artificial intelligence products that wouldn't be so eloquent had they not ingested huge troves of copyrighted human works.

But are AI chatbots — in this case, widely commercialized products made by OpenAI and its business partner Microsoft — breaking copyright and fair competition laws? Professional writers and media outlets will face a difficult fight to win that argument in court.

“I would like to be optimistic of behalf of the authors, but I’m not. I just think they have an uphill battle here,” said copyright attorney Ashima Aggarwal, who used to work for academic publishing giant John Wiley & Sons.

One lawsuit comes from The New York Times. Another from a group of well-known novelists such as John Grisham, Jodi Picoult and George R.R. Martin. A third from bestselling nonfiction writers, including an author of the Pulitzer Prize-winning biography on which the hit movie “Oppenheimer” was based.

THE LAWSUITS

Each of the lawsuits makes different allegations, but they all center on the San Francisco-based company OpenAI “building this product on the back of other peoples’ intellectual property,” said attorney Justin Nelson, who is representing the nonfiction writers and whose law firm is also representing the Times.

“What OpenAI is saying is that they have a free ride to take anybody else’s intellectual property really since the dawn of time, as long as it’s been on the internet,” Nelson said.

The Times sued in December, arguing that ChatGPT and Microsoft's Copilot are competing with the same outlets they are trained on and diverting web traffic away from the newspaper and other copyright holders who depend on advertising revenue generated from their content to keep producing their journalism. It also provided evidence of the chatbots spitting out Times articles word-for-word. At other times the chatbots falsely attributed misinformation to the paper in a way it said damaged its reputation.

One senior federal judge is so far presiding over all three cases, as well as a fourth from two more nonfiction authors who filed another lawsuit last week. U.S. District Judge Sidney H. Stein has been at the Manhattan-based court since 1995 when he was nominated by then-President Bill Clinton.

THE RESPONSE

OpenAI and Microsoft haven't yet filed formal counter-arguments on the New York cases, but OpenAI made a public statement this week describing the Times lawsuit as “without merit” and saying that the chatbot's ability to regurgitate some articles verbatim was a “rare bug.”

“Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents,” said a Monday blog post from the company. It went on to suggest that the Times “either instructed the model to regurgitate or cherry-picked their examples from many attempts.”

OpenAI cited licensing agreements made last year with The Associated Press, the German media company Axel Springer and other organizations as offering a glimpse into how the company is trying to support a healthy news ecosystem. OpenAI is paying an undisclosed fee to license AP’s archive of news stories. The New York Times was engaged in similar talks before deciding to sue.

OpenAI said earlier this year that access to AP's “high-quality, factual text archive” would improve the capabilities of its AI systems. But its blog post this week downplayed the importance of news content for AI training, arguing that large language models learn from an “enormous aggregate of human knowledge” and that “any single data source — including The New York Times — is not significant for the model’s intended learning.”

WHO'S GOING TO WIN?

Much of the AI industry's argument rests on the “fair use” doctrine of U.S. copyright law that allows for limited uses of copyrighted materials such as for teaching, research or transforming the copyrighted work into something different.

So far, courts have largely sided with tech companies in interpreting how copyright laws should treat AI systems. In a defeat for visual artists, a federal judge in San Francisco last year dismissed much of the first big lawsuit against AI image-generators, though allowed some of the case to proceed. Another California judge shot down comedian Sarah Silverman's arguments that Facebook parent Meta infringed on the text of her memoir to build its AI model.

Subsequent cases filed over the past year have brought more detailed evidence, but Aggarwal said when it comes to using copyrighted content to train AI systems that deliver a "small portion of that to users, the courts just don’t seem inclined to find that to be copyright infringement.”

Most tech companies cite as precedent Google’s success in beating back legal challenges to its online book library. The U.S. Supreme Court in 2016 let stand lower court rulings that rejected authors’ claim that Google’s digitizing of millions of books and showing snippets of them to the public amounted to copyright infringement.

But judges interpret fair use arguments on a case-by-case basis and it is “actually very fact-dependent,” depending on economic impact and other factors, said Cathy Wolfe, an executive at the Dutch firm Wolters Kluwer who also sits on the board of the Copyright Clearance Center, which helps negotiate print and digital media licenses in the U.S.

"Just because something is free on the internet, on a website, doesn't mean you can copy it and email it, let alone use it to conduct commercial business," Wolfe said. "Who’s going to win, I don’t know, but I’m certainly a proponent for protecting copyright for all of us. It drives innovation."

BEYOND THE COURTS

Some media outlets and other content creators are looking beyond the courts and calling for lawmakers or the U.S. Copyright Office to strengthen copyright protections for the AI era. A panel of the U.S. Senate Judiciary Committee will hear testimony Wednesday from media executives and advocates in a hearing dedicated to AI's effect on journalism.

Roger Lynch, chief executive of the Conde Nast magazine chain, plans to tell senators that generative AI companies “are using our stolen intellectual property to build tools of replacement.”

“We believe that a legislative fix can be simple — clarifying that the use of copyrighted content in conjunction with commercial Gen AI is not fair use and requires a license,” says a copy of Lynch's prepared remarks.

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in