#Nvidia Plans New Chip to Speed AI Processing, Shake Up Computing Market. Under pressure from rivals, the chip giant is set to offer a new product focused on rapid processing of AI queries for ‘inference’ demand.
Nvidia NVDA -4.16%decrease; red down pointing triangle plans to unveil a new processor specially tailored to help OpenAI and other customers build faster, more efficient tools, a major shake-up to its business that is poised to reset the AI race.
The company is designing a new system for “inference” computing, a form of processing that allows AI models to respond to queries, according to people familiar with the plans. The new platform, set to be revealed at Nvidia’s GTC developer conference in San Jose next month, will incorporate a chip designed by the startup Groq, the people said.
Inference computing has been the subject of intense industry competition. Rivals Google and Amazon have designed chips that compete with Nvidia’s flagship systems. And the explosion of autonomous coding in the tech workforce has created demand for new chips that can more efficiently handle complex AI-related tasks.
OpenAI has agreed to become one of the largest customers of the new processor, some of the people said, representing a major win for Nvidia. The ChatGPT maker, which is one of Nvidia’s largest customers, has spent the past few months shopping for more efficient alternatives to Nvidia’s chips, and signed a deal with a chip startup last month that provides it with new options.
Earlier Friday, OpenAI alluded to the new processor when it announced it would sign up for a major purchase of “dedicated inference capacity” from Nvidia, alongside a $30 billion investment from the chip giant. It also signed a major new deal to use Amazon’s Trainium chips.
Nvidia has dominated the business of designing and selling GPUs—graphics processing units—a type of processor that can perform billions of simple tasks simultaneously. But for the first time since the start of the AI boom, it is confronting the limits of its flagship product. As the market shifts towards inference, Nvidia is feeling pressure from some customers to produce chips that can more efficiently power AI applications.
The company’s powerful Hopper, Blackwell and Rubin series GPUs are considered best-in-class for training gigantic AI models and command top prices. Most analysts estimate that Nvidia controls 90% or more of the GPU market.
Nvidia Chief Executive Jensen Huang has long claimed that Nvidia’s GPUs are the market leader for both training and inference, and that such versatility was a key appeal of the product.
But over the past year, demand for advanced computing has shifted from training to inference as companies deploy AI agents and other tools that they hope will upend hundreds of industries and generate enormous profits from subscription fees. Agents are AI systems that act relatively autonomously to carry out tasks on behalf of users.
Many companies that build and operate AI agents find that GPUs are too costly, consume too much energy and aren’t as well-suited to actually running their models. With the meteoric rise of agentic AI, Nvidia is under pressure to develop inference chips that are less expensive and more energy-efficient.
Last month, OpenAI signed a multibillion-dollar computing partnership with Cerebras, which offers an inference-focused chip that its CEO Andrew Feldman says is faster than Nvidia’s GPUs. OpenAI entered into negotiations with Cerebras last fall after its engineers asked for a faster inference chip for agentic coding applications, The Journal previously reported.
Nvidia agreed to pay $20 billion late last year to license key technologies from Groq and hire its top leadership, including founder Jonathan Ross, in one of Silicon Valley’s largest-ever “acqui-hire” deals, The Wall Street Journal reported.
Groq designed chips that use a different architecture from Nvidia’s, called “language processing units,” which are highly efficient for inference functions. So far, however, Nvidia has kept mum about how it intends to use Groq’s technology.
AI inference computing is divided into two main tasks: pre-fill, or the process by which a model interprets a user prompt, and decode, by which the model generates a response, one word at a time. Pre-fill is usually the faster of the two processes, while decode tends to be especially slow, for larger AI models.
Coding applications have emerged as one of the most important—and profitable—uses of enterprise AI, with Anthropic’s Claude Code generally regarded as the market leader. But Anthropic relies primarily on chips designed by Amazon Web Services and Alphabet’s Google Cloud unit, rather than by Nvidia, to power its models.
One of Claude’s closest competitors, however, is OpenAI’s fast-growing Codex tool. The ChatGPT-maker plans to use the new Nvidia system to improve Codex, people familiar with the matter said.
Typically, Nvidia has paired its Vera chips, which are central processing units, or CPUs, with its Rubin GPUs in powerful data center servers, but some large customers have found that certain agentic AI workloads can be run more efficiently on CPUs alone.
This month, Nvidia announced an expanded partnership with Meta Platforms that included the first-ever significant CPU-only deployment to support Meta’s ad-targeting AI agents. The deal offered an early window into Nvidia’s strategy to look beyond the GPU to lock up pockets of the #AI #market.
Latest News on Umojja.com