In a recent unfolding of events, popular comedian Sarah Silverman has joined a class-action lawsuit against OpenAI, and separately against Meta, alleging copyright infringement. According to court documents, Silverman asserts that these tech companies have utilized her copyrighted work without consent to train their artificial intelligence (AI) algorithms.

AI Trained on Unauthorized Works

Silverman, along with authors Christopher Golden and Richard Kadrey, initiated their legal action in the U.S. District Court of the Northern District of California, San Francisco Division, on a recent Friday. The lawsuits accuse the tech companies of illicitly copying and ingesting the authors’ works – notably, Silverman’s memoir, “The Bedwetter” – from illegitimate online “shadow libraries” housing thousands of texts.

The suit against Meta draws on the company’s own research paper about their large-language model, LLaMA. The research paper, publicly accessible since February, acknowledges the use of text from The Pile as part of their training dataset. This admission, the lawsuit argues, implies usage of copyrighted content from the said shadow libraries.

The plaintiffs demand compensation for damages and propose changes to the LLaMA and ChatGPT AI models to prevent further infringement.

Training of ChatGPT Questioned

The source of training data for OpenAI’s ChatGPT remains unclear. However, the lawsuit alleges that ChatGPT’s ability to produce summaries of the plaintiffs’ works indicates it was trained on their copyrighted material.

For instance, the suit includes a sample of text generated by ChatGPT when asked to summarise Silverman’s memoir, “The Bedwetter”. The accuracy of the summary suggests direct training on the copyrighted work, leading to legal contention.

Ongoing Legal Challenges Against AI Training

The lawyers for the three authors, Joseph Saveri and Matthew Butterick, are no strangers to this type of legal dispute. They are currently engaged in other litigation involving Copilot, an AI-powered coding assistant on GitHub, and an image generator by Stability AI.

On their website, the attorneys claim that copyrighted content forms a large part of the training data used by both OpenAI and Meta. This material was used without permission, recognition, or compensation.

Implications for AI and Copyright Laws

These lawsuits represent the tip of the iceberg in a growing number of legal cases that could shape how AI learns and the role copyright law will play in these processes.

Digital media and intellectual property specialist Robert deBrauwere, from the law firm Pryor Cashman, predicts more cases will follow.