Join top executives in San Francisco on July 11-12 and learn how business leaders are getting ahead of the generative AI revolution. Learn More
As the hype and momentum behind generative AI continue to grow, so too does the performance of the underlying systems that enable machine learning (ML) training.
MLCommons today announced the latest set of results for its MLPerf training 3.0 benchmark. This aims to provide an industry standard set of measurements for ML model training performance. MLCommons is an open engineering consortium focused on ML benchmarks, datasets and best practices to accelerate the development of AI. The group has a series of benchmarks for ML including MLPerf inference, which was last updated in April. Its MLPerf Training 2.1 results were released in November 2022.
The big new inclusion with MLPerf Training 3.0 is the introduction of testing for training large language models (LLMs), specifically starting with GPT-3. The addition of LLMs to the benchmark suite comes at a critical time as organizations build out generative AI technologies.
Overall, the latest round of training benchmarks includes more than 250 different performance results from 16 vendors including: ASUSTek, Microsoft Azure, Dell, Fujitsu, GIGABYTE, H3C, IEI, Intel and Habana Labs, Krai, Lenovo, Nvidia, CoreWeave + Nvidia, Quanta Cloud Technology, Supermicro and xFusion.
Event
Transform 2023
MLPerf 3.0 builds on the previous MLPerf benchmarking suite, launched earlier this year, and introduces Low Level Machine (LLM) performance benchmarks that measure the performance of training deep learning models like convolutional networks, recurrent networks, and object detection. The new benchmark suite attempts to measure the performance of AI training systems in real-world scenarios, and introduce an unprecedented level of transparency into AI training performance.
The MLPerf 3.0 suite is the first to include Low Level Machine workloads, allowing users to measure the true performance of their training systems. This is an important advancement as it allows researchers and others to accurately measure the true benefit of their work on AI development and optimization.
The benchmark also highlights a remarkable surge in AI training performance across a range of AI optimizers. MLPerf3.0 shows that while training was significantly slower in most cases prior to the introduction of the suite, it is now possible to train complex deep learning models more quickly than ever before. This is especially true for optimizers like NVIDIA’s TensorRT library and Google’s TensorFlow, while traditional AI optimizers like Caffe and Theano have also seen an increase in performance.
MLPerf 3.0 is an important advancement in both AI performance benchmarking and the development of powerful deep learning training solutions. With the introduction of LLM workloads, it is now possible to accurately measure the performance of AI training systems in realistic scenarios. This transparency can enable further breakthroughs in deep learning training and pave the way for the future of AI.