As generative AI (genAI) platforms such as ChatGPT, Dall-E2, and AlphaCode barrel ahead at a breakneck pace, keeping the tools from hallucinating and spewing erroneous or offensive responses is nearly impossible.
To date, there have been few methods to ensure accurate information is coming out of the large language models (LLMs) that serve as the basis for genAI.
As AI tools evolve and get better at mimicking natural language, it will soon be impossible to discern fake results from real ones, prompting companies to set up “guardrails” against the worst outcomes, whether they be accidental or intentional efforts by bad actors.
GenAI tools are essentially next-word prediction engines. Those next-word generators, such as ChatGPT, Microsoft’s Copilot, and Google’s Bard, can go off the rails and start spewing false or misleading information.
In September, a startup founded by two former Meta AI researchers released an automated evaluation and security platform that helps companies use LLMs safely by using adversarial tests to monitor the models for inconsistencies, inaccuracies, hallucinations, and biases.
Patronus AI said its tools can detect inaccurate information and when an LLM is unintentionally exposing private or sensitive data.
“All these large companies are diving into LLMs, but they’re doing so blindly; they are trying to become a third-party evaluator for models,” said Anand Kannanappan, founder and CEO of Patronus. “People don’t trust AI because they’re unsure if it’s hallucinating. This product is a validation check.”
Patronus’ SimpleSafetyTests suite of diagnostic tool uses 100 test prompts designed to probe AI systems for critical safety risks. The company has used its software to test some of the most popular genAI platforms, including OpenAI’s ChatGPT and other AI chatbots to see, for instance, whether they could understand SEC filings. Patronus said the chatbots failed about 70% of the time and only succeeded when told exactly where to look for relevant information.
“We help companies catch language model mistakes at scale in an automated way,” Kannanappan explained. “Large companies are spending millions of dollars on internal QA teams and external consultants to manually catch errors in spreadsheets. Some of those quality assurance companies are spending expensive engineering time creating test cases to prevent these errors from happening.”
Avivah Litan, a vice president and distinguished analyst with research firm Gartner, said AI hallucination rates “are all over the place” from 3% to 30% of the time. There simply isn’t a lot of good data around the issue yet.
Gartner did, however, predict that through 2025, genAI will require more cybersecurity resources to secure, causing a 15% hike in spending.
Companies dabbling in AI deployments must recognize they cannot allow them to run on “autopilot” without having a human in the loop to identify problems,…
2024-01-08 15:00:03
Article from www.computerworld.com rnrn