Anthropic’s Claude 2.1 LLM Enhances Performance with Turbocharging, Provides Beta Tool Access

Anthropic’s Claude 2.1 LLM Enhances Performance with Turbocharging, Provides Beta Tool Access

Anthropic has upped the ante for how much information a large language model (LLM) can consume at once, announcing on Tuesday that⁣ its just-released Claude 2.1 has a context window of 200,000 ⁣tokens. That’s roughly the equivalent of 500,000 words or more ⁤than 500 printed pages of information, Anthropic ​said.

The ⁣latest ⁢Claude version ⁤also is more accurate than its predecessor, ‍has a​ lower price, and includes beta tool use,⁣ the ‌company said in its announcement.

The new model​ powers Anthropic’s Claude generative AI chatbot, so both free and paying users ⁢can take advantage of most of ​Claude 2.1’s improvements. However, the 200,000 token context window is for paying Pro users, while free ⁢users still have a 100,000 token limit — significantly higher than GPT-3.5’s 16,000.

Claude 2’s beta tool‌ feature ​will allow developers to integrate APIs and defined functions‍ with the ‍Claude model, similar to what’s been ⁣available in OpenAI’s models.

Claude’s previous 100,000 token context window had been significantly ahead of ⁢OpenAI ⁣in that metric until⁢ last month, when OpenAI⁤ announced⁢ a ⁤preview version of GPT-4 Turbo ⁤with a 128,000 token context windows. However,⁢ only ChatGPT Plus ⁤customers with $20/month subscriptions can access that ⁣model in ‍chatbot form. ⁤(Developers can pay per usage for access to the GPT-4‍ API.)

While a‍ large context window — the ‌amount of data it can process at a time — looks compelling if you have a large document or other⁣ information,⁤ it’s ⁢not clear that LLMs can process large amounts ‍of data⁣ as well as ‍info in a smaller chunk. Greg Kamradt, an AI practitioner and entrepreneur who’s been tracking this‌ issue, has run what he calls “needle in a ⁢haystack” analysis to see if tiny pieces of info within a large document are actually found when the LLM is queried. He repeats the tests ⁢putting ​a ‍random statement in various portions of a large document that’s fed into the LLM and⁣ queried.

“At 200K tokens (nearly 470 pages), Claude 2.1 was able to recall facts at some document depths,” he​ posted on X (formerly Twitter), noting that he had been granted early‍ access‍ to Claude 2.1. “Starting at​ ~90K tokens,⁢ performance of recall at the bottom of the document started to get increasingly worse.” GPT-4 did not have perfect recall at its largest context either.

Running the tests on Claude 2.1 cost about $1,000 in⁤ API calls‌ (Anthropic offered credits so he could‌ run the ‍same tests he ‍had done on GPT-4).

His conclusions: How you craft your ‍prompts matters, don’t​ assume information ⁣will always ‍be retrieved, and smaller inputs​ will yield better results.

In fact, many ‍developers seeking to ⁤query‍ information from large amounts of data create applications that split that data into⁣ smaller pieces in order to improve retrieval results, ‍even if the context ‍window would allow more.

Looking at⁤ the new model’s accuracy, in tests with what Anthropic called “a large set of complex, factual questions ‍that probe known ‍weaknesses in…

2023-11-24 ​18:41:02
Article from⁢ www.computerworld.com rnrn

Exit mobile version