One of the tricky parts of keeping up with the innovation in AI hardware is that each vendor showcases applications and benchmarks that bring out the best in its own products. Since the field is relatively new, there haven’t been any good, broad, benchmarks to use for comparison. ImageNet is one perennial favorite, but with many new applications and network architectures being deployed, simple object recognition in 2D images doesn’t tell us that much about which hardware is fastest, or best for other workloads.
Now, a team of industry heavyweights, including Google, Intel, Baidu, and Nvidia has stepped up to meet the need with an early version (currently v0.5) of MLPerf, a benchmarking suite for machine learning that includes training a variety of networks. Nvidia announced that it has topped the initial results, but digging into the details shows it was pretty much the only game in town. If nothing else, it shows how dominant the GPU maker has been in the AI market.
MLPerf v0.5: Covering the Inferencing Waterfront
MLPerf currently consists of tests that time network training in seven application areas, starting with the classic standby of training ResNet-50 on ImageNet. It adds lightweight and heavyweight Object Detection (COCO), Recurrent and non-Recurrent Translation (WMT E-G), Recommendation (MovieLens-20M), and Reinforcement Learning (Mini Go). The only platform with results for all seven is the reference submission run on a Pascal P100. Inferencing benchmarks are planned for future versions.
Most of Nvidia’s results were run on one or more DGX-1 or DGX-2 supercomputers, and Google’s were run on its v2 and v3 TPU processors. Intel submitted some ImageNet times for its SKX 8180, but none of them were very competitive. However, systems using its $ 10K, 28-core, SKX 8180 was the sole competitive submission in the reinforcement learning category. That category is likely to be short-lived once there’s a non-CPU-bound version of that benchmark available.
With enough high-end GPUs neural network training is a lot less painful. These results are anywhere from 2x to 10x faster than the fastest single-node results.
One big issue with the results so far is that they don’t reveal anything about price or power. For example, while Google’s TPU results don’t quite match the fastest runs on Nvidia best and most expensive GPUs, it is quite possible they offer a great value. You can see the current results, meager as they are, online. Hopefully, there will be a lot more data points soon.
What This Really Shows Is That Nvidia Hardware Dominates AI
As a practical matter, most AI training is done on Nvidia hardware. Not just because of the price-performance of its GPUs, but also because of the prevalence of CUDA-based tools. So while Google submitted some benchmarks for its TPUs, its chips were originally built as an inferencing tool, and only in the latest generation have started to be used for training tasks. Similarly, so far AMD is nowhere to be found in the benchmark results, although AMD is one of the listed supporters of the MLPerf effort, so that will presumably change. In the meantime, Nvidia is pretty much competing with itself.