Patronus, a startup firm, develops a diagnostic tool to detect errors made by genAI

Patronus, a startup firm, develops a diagnostic tool to detect errors made by genAI

As​ generative AI (genAI) platforms such as ChatGPT, Dall-E2, and AlphaCode ‍barrel​ ahead at a breakneck pace, keeping the tools from hallucinating ⁢and ‌spewing erroneous or ‍offensive ​responses is nearly impossible.

To date, there have been few methods to ensure accurate ⁤information is coming out of the large language models (LLMs)‍ that serve as the basis ‌for ⁢genAI.

As AI tools evolve and get better at mimicking natural ⁢language, it will ‌soon be impossible to ⁢discern fake results from real ones, prompting companies⁣ to set up “guardrails” against the worst outcomes, whether they be accidental⁣ or intentional efforts by bad actors.

GenAI tools are essentially next-word prediction​ engines. Those next-word generators, ⁢such as ChatGPT, Microsoft’s Copilot, and​ Google’s Bard, can go ​off the rails and start spewing false or misleading information.

In September, a startup founded⁣ by two former Meta AI researchers released an automated evaluation and security platform⁢ that helps companies use LLMs safely by using adversarial tests to monitor the models for inconsistencies, inaccuracies, hallucinations, and biases.

Patronus AI said its tools can detect inaccurate information and when an LLM is unintentionally exposing private or sensitive data.

“All these large companies are diving into LLMs, ⁤but they’re doing so blindly; they are trying ⁣to become a third-party ​evaluator for models,” said Anand Kannanappan, founder ‍and CEO of Patronus. “People don’t trust AI because they’re unsure if it’s hallucinating. This⁢ product is a validation check.”

Patronus’ SimpleSafetyTests suite of diagnostic tool uses 100 test prompts designed ‌to probe AI systems for critical safety risks. The company has used its software to test some of the⁤ most popular⁤ genAI ⁤platforms, including OpenAI’s ChatGPT and other AI chatbots to see, for ‌instance, whether they could understand SEC filings. ​Patronus said the chatbots failed about 70% of the ⁢time and only succeeded when told exactly where to look for ​relevant information.

“We help companies⁤ catch language model mistakes at scale in an automated way,” Kannanappan explained. “Large companies are⁣ spending ​millions of dollars on internal QA teams and external consultants to manually catch errors in spreadsheets. Some of those quality assurance companies are spending expensive‌ engineering time creating test cases to prevent these errors ‍from happening.”

Avivah ‌Litan, a ‍vice president and distinguished analyst with research⁣ firm Gartner,⁤ said AI hallucination rates “are ⁢all ⁣over the place” from 3% to 30% of the time. ​There simply isn’t ⁤a lot of good data around​ the issue yet.

Gartner did, however, predict that‍ through 2025, genAI will‍ require⁣ more cybersecurity resources⁤ to secure,‌ causing a 15% hike in spending.

Companies​ dabbling in AI deployments must recognize⁤ they cannot allow them to run on “autopilot” without having a human in the loop to identify problems,…

2024-01-08 15:00:03
Article from www.computerworld.com rnrn

Exit mobile version