Microsoft Azure delivers game-changing efficiency for generative AI Inference

0
795
Microsoft Azure delivers game-changing efficiency for generative AI Inference


Microsoft Azure has delivered industry-leading outcomes for AI inference workloads amongst cloud service suppliers in the latest MLPerf Inference outcomes printed publicly by MLCommons. The Azure outcomes have been achieved utilizing the brand new NC H100 v5 sequence digital machines (VMs) powered by NVIDIA H100 NVL Tensor Core GPUs and bolstered the dedication from Azure to designing AI infrastructure that’s optimized for coaching and inferencing within the cloud.

The evolution of generative AI fashions

Models for generative AI are quickly increasing in dimension and complexity, reflecting a prevailing pattern within the {industry} towards ever-larger architectures. Industry-standard benchmarks and cloud-native workloads constantly push the boundaries, with fashions now reaching billions and even trillions of parameters. A chief instance of this pattern is the latest unveiling of Llama2, which boasts a staggering 70 billion parameters, marking it as MLPerf’s most important take a look at of generative AI to this point (determine 1). This monumental leap in mannequin dimension is clear when evaluating it to earlier {industry} requirements such because the Large Language Model GPT-J, which pales as compared with 10x fewer parameters. Such exponential development underscores the evolving calls for and ambitions throughout the AI {industry}, as prospects try to sort out more and more advanced duties and generate extra subtle outputs.

Tailored particularly to handle the dense or generative inferencing wants that fashions like Llama 2 require, the Azure NC H100 v5 VMs marks a big leap ahead in efficiency for generative AI purposes. Its purpose-driven design ensures optimized efficiency, making it a perfect selection for organizations in search of to harness the facility of AI with reliability and effectivity. With the NC H100 v5-series, prospects can anticipate enhanced capabilities with these new requirements for his or her AI infrastructure, empowering them to sort out advanced duties with ease and effectivity. 

Graph highlighting that the size of the models in the MLPerf Benchmarking suite is increasing, up to 70 billion parameters.
Figure 1: Evolution of the scale of the fashions within the MLPerf Inference benchmarking suite. 

However, the transition to bigger mannequin sizes necessitates a shift towards a unique class of {hardware} that’s able to accommodating the big fashions on fewer GPUs. This paradigm shift presents a singular alternative for high-end techniques, highlighting the capabilities of superior options just like the NC H100 v5 sequence. As the {industry} continues to embrace the period of mega-models, the NC H100 v5 sequence stands prepared to satisfy the challenges of tomorrow’s AI workloads, providing unparalleled efficiency and scalability within the face of ever-expanding mannequin sizes.

a person sitting at a table using a laptop

Azure AI infrastucture

World-class infrastructure efficiency for AI workloads

Enhanced efficiency with purpose-built AI infrastructure

The NC H100 v5-series shines with purpose-built infrastructure, that includes a superior {hardware} configuration that yields outstanding efficiency beneficial properties in comparison with its predecessors. Each GPU inside this sequence is supplied with 94GB of HBM3 reminiscence. This substantial improve in reminiscence capability and bandwidth interprets in a 17.5% increase in reminiscence dimension and a 64% increase in reminiscence bandwidth over the earlier generations. . Powered by NVIDIA H100 NVL PCIe GPUs and 4th-generation AMD EPYC™ Genoa processors, these digital machines function as much as 2 GPUs, alongside as much as 96 non-multithreaded AMD EPYC Genoa processor cores and 640 GiB of system reminiscence.

In at the moment’s announcement from MLCommons, the NC H100 v5 sequence premiered efficiency ends in the MLPerf Inference v4.0 benchmark suite. Noteworthy amongst these achievements is a 46% efficiency acquire over competing merchandise geared up with GPUs of 80GB of reminiscence (determine 2), solely primarily based on the spectacular 17.5% improve in reminiscence dimension (94 GB) of the NC H100 v5-series. This leap in efficiency is attributed to the sequence’ capacity to suit the big fashions into fewer GPUs effectively. For smaller fashions like GPT-J with 6 billion parameters, there’s a notable 1.6x speedup from the earlier era (NC A100 v4) to the brand new NC H100 v5. This enhancement is especially advantageous for purchasers with dense Inferencing jobs, because it allows them to run a number of duties in parallel with larger velocity and effectivity whereas using fewer sources.

chart, bar chart, waterfall chart
Figure 2: Azure outcomes on the mannequin Llama2 (70 billion parameters) from MLPerf Inference v4.0 in March 2024 (4.0-0004) and (4.0-0068). 

Performance delivering a aggressive edge

The improve in efficiency is essential not simply in comparison with earlier generations of comparable infrastructure options In the MLPerf benchmarks outcomes, Azure’s NC H100 v5 sequence digital machines outcomes are standout in comparison with different cloud computing submissions made. Notably, when in comparison with cloud choices with smaller reminiscence capacities per accelerator, similar to these with 16GB reminiscence per accelerator, the NC H100 v5 sequence VMs exhibit a considerable efficiency increase. With almost six instances the reminiscence per accelerator, Azure’s purpose-built AI infrastructure sequence demonstrates a efficiency speedup of 8.6x to 11.6x (determine 3). This represents a efficiency improve of fifty% to 100% for each byte of GPU reminiscence, showcasing the unparalleled capability of the NC H100 v5 sequence. These outcomes underscore the sequence’ capability to steer the efficiency requirements in cloud computing, providing organizations a sturdy resolution to handle their evolving computational necessities.

Figure 3: The throughput of the Azure NC H100 v5 virtual machine is up to 11.6 times higher that its equivalents with 16GB of memory per GPU.
Figure 3: Performance outcomes on the mannequin GPT-J (6 billion parameters) from MLPerf Inference v4.0 in March 2024 on Azure NC H100 v5 (4.0-0004) and an providing with 16GB of reminiscence per accelerator (4.0-0045) – with one accelerator every.

In conclusion, the launch of the NC H100 v5 sequence marks a big milestone in Azure’s relentless pursuit of innovation in cloud computing. With its excellent efficiency, superior {hardware} capabilities, and seamless integration with Azure’s ecosystem, the NC H100 v5 sequence is revolutionizing the panorama of AI infrastructure, enabling organizations to completely leverage the potential of generative AI Inference workloads. The newest MLPerf Inference v4.0 outcomes underscore the NC H100 v5 sequence’ unparalleled capability to excel in probably the most demanding AI workloads, setting a brand new commonplace for efficiency within the {industry}. With its distinctive efficiency metrics and enhanced effectivity, the NC H100 v5 sequence reaffirms its place as a frontrunner within the realm of AI infrastructure, empowering organizations to unlock new prospects and obtain larger success of their AI initiatives. Furthermore, Microsoft’s dedication, as introduced in the course of the NVIDIA GPU Technology Conference (GTC), to proceed innovating by introducing much more highly effective GPUs to the cloud, such because the NVIDIA  Grace Blackwell GB200 Tensor Core GPUs, additional enhances the prospects for advancing AI capabilities and driving transformative change within the cloud computing panorama.

Learn extra about Azure generative AI



LEAVE A REPLY

Please enter your comment!
Please enter your name here