Large language models (LLMs), the algorithmic platforms on which generative AI (genAI) tools like ChatGPT are built, are highly inaccurate when connected to corporate databases and becoming less transparent, according to two studies.
One study by Stanford University showed that as LLMs continue to ingest massive amounts of information and grow in size, the genesis of the data they use is becoming harder to track down. That, in turn, makes it difficult for businesses to know whether they can safely build applications that use commercial genAI foundation models and for academics to rely on them for research.
It also makes it more difficult for lawmakers to design meaningful policies to rein in the powerful technology, and “for consumers to understand model limitations or seek redress for harms caused,” the Stanford study said.
LLMs (also known as foundation models) such as GPT, LLaMA, and DALL-E emerged over the past year and have transformed artificial intelligence (AI), giving many of the companies experimenting with them a boost in productivity and efficiency. But those benefits come with a heavy dollop of uncertainty.
“Transparency is an essential precondition for public accountability, scientific innovation, and effective governance of digital technologies,” said Rishi Bommasani, society lead at Stanford’s Center for Research on Foundation Models. “A lack of transparency has long been a problem for consumers of digital technologies.”
Stanford University
For example, deceptive online ads and pricing, unclear wage practices in ride-sharing, dark patterns that trick users into unknowing purchases, and a myriad number of transparency issues around content moderation created a vast ecosystem of mis- and disinformation on social media, Bommasani noted.
“As transparency around commercial [foundation models] wanes, we face similar sorts of threats to consumer protection,” he said.
For example, OpenAI, which has the word “open” right in its name, has clearly stated that it will not be transparent about most aspects of its flagship model, GPT-4, the Stanford researchers noted.
To assess transparency, Stanford brought together a team that included researchers from MIT and Princeton to design a scoring system called the Foundation Model Transparency Index (FMTI). It evaluates 100 different aspects or indicators of transparency, including how a company builds a foundation model, how it works, and how it is used downstream.
The Stanford study evaluated 10 LLMs and found the mean transparency score was just 37%. LLaMA scored highest, with a transparency rating of 52%; it was followed by GPT-4 and PaLM 2, which scored 48% and 47%, respectively.
“If you don’t have transparency, regulators can’t even pose the right questions, let alone take action in these areas,” Bommasani said.
Meanwhile, almost all senior bosses (95%) believe genAI tools are regularly used by employees, with more than half (53%) saying…
2023-11-30 18:41:02
Link from www.computerworld.com rnrn