Here come the lawyers.
Last week, the New York Times sued Microsoft and OpenAI, in which Microsoft has invested billion and counting, for copyright violations. The Times claims Microsoft’s genAI-based Copilot and OpenAI’s ChatGPT, which powers Copilot, were trained using millions of articles without the Times’s permission.
It goes on to argue that those tools (and Microsoft’s search engine, Bing) “now compete with the news outlet as a source of reliable information.”
The Times isn’t seeking a specific amount of damages – yet. Ultimately, though, it wants a lot — “billions of dollars in statutory and actual damages” — because of the “unlawful copying and use of The Times’s uniquely valuable works.”
Beyond that, the filing demands that Microsoft and OpenAI destroy both the datasets used to train the tools and the tools themselves.
This isn’t the first lawsuit claiming AI companies violated copyrights in building their chatbots, and it won’t be the last. But it is the Big Kahuna – the Times is among the best-known newspapers in the world and the gold standard in journalism. And its move could prove to be among the most influential lawsuits of the computer and internet age, perhaps the most influential.
That’s because the outcome could well determine the future of generative AI.
Who’s right here? Is the Times just grubbing for money, and using the lawsuit to negotiate a better rights deal with Microsoft and OpenAI for use of its articles? Or is it standing up for the rights of all copyright holders, no matter how small, against the onslaught of the AI titans?
What’s in the lawsuit?
To get a better understanding of what’s involved, let’s first take a closer look at the underlying technology involved and the suit itself. GenAI chatbots like Copilot and ChatGPT are trained on large language models (LLMs) — which include tremendous amounts of data — to be effective and useful. The more data, the better. And just as important is the quality of the data. The better the quality of the data, the better the genAI results.
Microsoft and OpenAI use content available on the internet to train their tools, regardless of whether that content is public domain information, open source data, or copyrighted material; it all gets ingested by the great, hungry maw of genAI. That means millions and millions of articles from the Times and myriad other publications are used for training.
Microsoft and OpenAI contend that those articles and all other copyrighted material are covered by the fair use doctrine. Fair use is an exceedingly complicated and confusing legal concept, and there’s an unending stream of lawsuits that determine what’s fair use and what isn’t. It’s widely open to interpretation.
That’s why the Times lawsuit is so important. It will determine whether all genAI tools, not just those owned by Microsoft and OpenAI, can continue to be trained on copyrighted material….
2024-01-13 09:00:04
Source from www.computerworld.com rnrn