As demand for generative AI grows, cloud service providers such as Microsoft, Google, and AWS, along with large language model (LLM) providers such as OpenAI, have all reportedly considered developing their own custom chips for AI workloads.
Speculation that some of these companies — notably OpenAI and Microsoft — have been making efforts to develop their own custom chips for handling generative AI workloads due to chip shortages have dominated headlines for the last few weeks.
While OpenAI is rumored to be looking to acquire a firm to further its chip-design plans, Microsoft is reportedly working with AMD to produce a custom chip, code-named Athena.
Google and AWS both have already developed their own chips for AI workloads in the form of Tensor Processing Units (TPUs), on the part of Google, and AWS’ Trainium and Inferentia chips.
But what factors are driving these companies to make their own chips? The answer, according to analysts and experts, lies around the cost of processing generative AI queries and the efficiency of currently available chips, mainly graphics processing units (GPUs). Nvidia’s A100 and H100 GPUs currently dominate the AI chip market.
“GPUs are probably not the most efficient processor for generative AI workloads and custom silicon might help their cause,” said Nina Turner, research manager at IDC.
GPUs are general-purpose devices that happen to be hyper-efficient at matrix inversion, the essential math of AI, noted Dan Hutcheson, vice chairman of TechInsights.
“They are very expensive to run. I would think these companies are going after a silicon processor architecture that’s optimized for their workloads, which would attack the cost issues,” Hutcheson said.
Using custom silicon, according to Turner, may allow companies such as Microsoft and OpenAI to cut back on power consumption and improve compute interconnect or memory access, thereby lowering the cost of queries.
OpenAI spends approximately $694,444 per day or 36 cents per query to operate ChatGPT, according to a report from research firm SemiAnalysis.
“AI workloads don’t exclusively require GPUs,” Turner said, adding that though GPUs are great for parallel processing, there are other architectures and accelerators better suited for such AI-based operations.
Other advantages of custom silicon include control over access to chips and designing elements specifically for LLMs to improve query speed, Turner said.
Developing custom chips is not easy
Some analysts also likened the move to design custom silicon to Apple’s strategy of producing chips for its devices. Just like Apple made the switch from general-purpose processors to custom silicon in order to improve the performance of its devices, the generative AI service providers are also looking to specialize their chip architecture, said Glenn O’Donnell, research director at Forrester.
“Despite Nvidia’s GPUs being so wildly popular right now, they too are…
2023-10-10 23:48:03
Source from www.networkworld.com rnrn