Join us in Atlanta on April 10 to explore the future of a safe workforce. We’ll explore the vision, benefits, and use cases of artificial intelligence for security teams. Request an invitation here.
MLCommons today released the MLPerf 4.0 inference benchmark, once again demonstrating the relentless pace of software and hardware improvements.
As generative AI continues to grow and become more popular, there is a clear need for a set of vendor-neutral performance benchmarks, and this is the MLPerf benchmark set provided by MLCommons. There are multiple MLPerf benchmarks, of which training and inference are the most useful. The new MLPerf 4.0 inference results are the first update to the inference benchmark since the release of MLPerf 3.1 results in September 2023.
Needless to say, a lot has been going on in the AI world over the past six months, and major hardware vendors including Nvidia and Intel have been busy improving hardware and software to further optimize inference. MLPerf 4.0 inference results show significant improvements for both Nvidia and Intel technologies.
The MLPerf inference benchmark has also changed. A large language model (LLM) is included in the GPT-J 6B (billion) parameter model to perform text summarization according to the MLPerf 3.1 benchmark. With the new MLPerf 4.0 benchmark, the popular Llama 2 70 billion parameter open model is undergoing a question and answer (Q&A) benchmark. MLPerf 4 also includes for the first time the gen AI image generation benchmark with stable diffusion.
VB event
Artificial Intelligence Impact Tour – Atlanta
request an invitation
“MLPerf is truly an industry-standard benchmark that helps improve the speed, efficiency and accuracy of artificial intelligence,” David Kanter, founder and executive director of MLCommons, said in a press release.
Why AI benchmarks matter
There are more than 8,500 performance results in MLCommons’ latest benchmark, testing various combinations and permutations of hardware, software and AI inference use cases. Kanter emphasizes that the MLPerf benchmarking process has a real purpose.
“Remind people of the principles behind benchmarks. In fact, our goal is to establish good metrics for the performance of artificial intelligence,” he said. “The point is, once we can measure these things, we can start improving them.”
Another goal of MLCommons is to help the entire industry become aligned. Benchmark results are from tests conducted on different hardware and software using similar data sets and configuration parameters. All submitters of a given test will see the results, so if other submitters have any issues, they can be resolved.
Ultimately, a standardized way to measure AI performance is to enable businesses to make informed decisions.
“This helps provide buyers with information that helps them make decisions and understand how the system, whether it’s on-premises, cloud or embedded, performs the workload in question,” Kanter said. “If you want to buy a system to run large language model inference, you can use benchmarks to help guide you on what those systems should look like.”
Nvidia triples AI inference performance on the same hardware
Nvidia once again dominates the MLPerf benchmark with a series of impressive results.
While new hardware is expected to produce better performance, Nvidia will also be able to squeeze better performance from its existing hardware. Leveraging Nvidia’s TensorRT-LLM open source inference technology, Nvidia was able to nearly triple the text summary inference performance of GPT-J LLM on its H100 Hopper GPU.
Dave Salvator, Nvidia’s director of accelerated computing products, emphasized during a briefing with media and analysts that the performance improvements were accomplished in just six months.
“We’ve invested and been able to double the performance that we’ve seen, and we’re very, very pleased with the results,” Salvatore said. “Our engineering team will continue to work hard to find ways to get more performance from the Hopper architecture.”
Nvidia just released the latest generation of Blackwell GPUs at GTC last week, which is the successor to the Hopper architecture. In response to VentureBeat’s questions, Salvator said he wasn’t sure when Blackwell-based GPUs would be benchmarked against MLPerf, but he hoped to do so soon.
Even before Blackwell benchmarks, the MLPerf 4.0 results mark the debut of H200 GPU results, further improving the H100’s inference capabilities. When using Llama 2 for inference evaluation, H200 results are 45% faster than H100.
Intel reminds industry that CPUs are still important for inference
Intel is also very active in the MLPerf 4.0 benchmark with its Habana AI accelerator and Xeon CPU technology.
For Gaudi, Intel’s real-world performance results trailed those of the Nvidia H100, although the company claimed it offered a better price-performance ratio. Perhaps even more interesting are the impressive gains in inference delivered by fifth-generation Intel Xeon processors.
In a briefing to media and analysts, Intel Xeon AI product director Ronak Shah said that the fifth-generation Intel Xeon is 1.42 times faster in inference than the previous fourth-generation Intel Xeon across a range of MLPerf categories. times. Looking specifically at the GPT-J LLM text summarization use case, the fifth-generation Xeon is 1.9 times faster.
“We recognize that for many enterprise customers who are deploying AI solutions, they will be doing so in a hybrid general-purpose and AI environment,” Shah said. “So we designed CPUs that combine powerful general-purpose capabilities with powerful AI capabilities with our AMX engine.”
#Nvidia #Intel #achieve #generative #inference #performance #improvement #MLPerf #benchmark
Discover more from Yawvirals Gurus' Zone
Subscribe to get the latest posts sent to your email.