Following The New York Times' lawsuit against OpenAI, three US digital news media outlets filed a copyright infringement lawsuit against OpenAI on February 28th. They argue that OpenAI violated the Digital Millennium Copyright Act, threatening journalists' positions while profiting from their hard work.
The controversy surrounding the copyright infringement lawsuit between OpenAI and The New York Times persists. On February 29th, Latest noted that several news media outlets have recently filed copyright infringement lawsuits against OpenAI.
According to the Associated Press, digital news media outlets The Intercept, Raw Story, and AlterNet recently officially joined the ranks against artificial intelligence’s unauthorized use of their news content and filed a copyright infringement lawsuit against OpenAI on February 28th.
These media organizations stated that thousands of their articles were used by OpenAI to train chatbots without permission or payment, essentially “piggybacking” on their news reports.
However, the three media outlets suing OpenAI did not provide specific examples of the stories they claim were stolen.
Media Collectively Accuse OpenAI
Annie Chabel, CEO of The Intercept, stated, “At a time when newsrooms nationwide are suffering from financial cuts, OpenAI is benefiting from our content. We hope to send a strong signal to AI developers through this lawsuit.”
According to The Guardian on February 28th, John Byrne, CEO of Raw Story and AlterNet, stated in a joint statement, “Raw Story believes news organizations must bravely stand up to confront OpenAI because OpenAI violates the Digital Millennium Copyright Act, threatening journalists’ positions while profiting from journalists’ hard work.”
John Byrne stated that the lawsuits from Raw Story and AlterNet do not include Microsoft, and neither OpenAI nor Microsoft has currently responded.
The legal dispute between technology giant Microsoft and its generative artificial intelligence partner OpenAI over chatbot ChatGPT originated from The New York Times.
In December 2023, The New York Times sued OpenAI and its partner Microsoft for copyright infringement, accusing both companies of using millions of its articles without permission to train artificial intelligence models. However, The New York Times did not specify the amount of compensation sought in the lawsuit but pointed out that the defendant OpenAI should be responsible for the “illegal reproduction and use of The New York Times’ unique and valuable works” and the associated “billions of dollars in statutory and actual damages.”
On February 26, 2024, OpenAI requested a federal court in the United States to dismiss part of The New York Times’ lawsuit against the company, claiming that The New York Times employed hackers to repeatedly generate content from The New York Times’ articles in violation of OpenAI’s terms of use and used this misleading evidence to support the lawsuit.
In response to the accusations against OpenAI, Ian Crosby, a lawyer representing The New York Times, responded on February 27th that the newspaper only used OpenAI’s product to find evidence that it illegally copied The New York Times’ copyrighted works, rather than engaging in so-called “hacking activities.”
Furthermore, The New York Times requested both companies to destroy any chatbot models and training data that used The New York Times’ copyrighted material.
Data Issues Caused by Large Models
This wave of collective lawsuits against OpenAI reflects the media industry’s current concerns about artificial intelligence technology. Media concerns mainly focus on the fact that generative artificial intelligence will compete with established publishers and become the primary source of information for internet users. This will further reduce media advertising revenue and undermine the quality of online news.
Zhang Zhaolong, a well-known security expert in the industry who has been working in the information security industry for nearly 20 years, recently stated in an Latest interview that large generative models are prone to risks during application. “Large models are knowledge-based models. If the previous training data is inferred well, sensitive data from the original training can be restored.” He further pointed out that the restoration degree of large models is roughly 80%, even if it is slightly lower, it can still reach 50%.
With the development of generative large models, what data problems will it bring? Zhang Zhaolong said that the first problem is illegal collection. Zhang Zhaolong said that the essence of model training is “feeding data.” Some open platforms, if not certified, or have illegal purposes, may abuse the collected data and users’ personal information.
In addition, the security of large models itself is also worthy of attention. Zhang Zhaolong said that even if the model is secure, people can still use large models to reverse-engineer the original data. For example, many companies that obtain personal data illegally from the privacy information and personal behaviors in large data models include government data, industry and enterprise data. Therefore, whether data protection is in place, whether the data source is legal, whether the data will be abused, how to manage and control it, are all issues that need attention.
Critics of digital media, and professors from the School of Journalism and Communication at Peking University, analyze that The New York Times’ lawsuit against OpenAI and Microsoft and other cases show that courts are currently trying to address the complex impact of artificial intelligence technology on copyright, privacy, and data use laws, and the legal landscape is constantly changing. This lawsuit highlights the intricate balance between promoting artificial intelligence innovation and protecting copyright.
As artificial intelligence technology increasingly demonstrates the ability to generate human-like content, a challenging issue also emerges: to what extent can existing content be used for artificial intelligence development without violating copyright laws? Regardless of the outcome of The New York Times’ lawsuit against OpenAI, this lawsuit may have a lasting impact on the artificial intelligence industry, affecting how artificial intelligence companies, content creators, and legal experts navigate the complex interaction between artificial intelligence technology and copyright law. It also emphasizes the importance of ethical considerations in artificial intelligence development across various fields, highlighting the need for responsible and legal use of artificial intelligence.