Quick Look: Comparing Vulkan & DX12 API Overhead on 3DMark
Earlier this week the crew over at Futuremark released a major update to their API Overhead testing tool, which is built into the larger 3DMark testing suite. The API Overhead tool, first rolled out in 2015, is a relatively straightforward test that throws increasingly large number of draw calls at a system to see how many calls a system can sustain. The primary purpose of the tool is to show off the vast improvement in draw call performance afforded by modern, low-level APIs that can efficiently spread their work over multiple threads, as opposed to classic APIs like DirectX 11 which are essentially single-threaded and have a high degree of overhead within that sole thread.
The latest iteration of the API Overhead test, now up to version 1.5, has added support for Vulkan, making it one of the first feature-level benchmarks to add support for the API. Khronos’s take on a low-level API – and a decedent of sorts of Mantle – Vulkan has been available now for just a bit over a year. However outside of a very successful outing with Doom, in the PC ream it has been flying somewhat under the radar, as few other games have (meaningfully) implemented support for the API thus far. By the end of 2017 we should be seeing some wider support for the API, but for the moment it’s still in the process of finding its footing among PC developers.
In any case, like OpenGL versus Direct3D 9/10/11 before it, there’s a lot of curiosity (and arguments) over which API is better. Now that Futuremark is supporting the API for their Overhead test, let’s take a quick look at how the two APIs compare here, and whether one API offers lower overhead than the other.
|CPU:||Intel Core i7-4960X @ 4.2GHz|
|Motherboard:||ASRock Fatal1ty X79 Professional|
|Power Supply:||Corsair AX1200i|
|Hard Disk:||Samsung SSD 840 EVO (750GB)|
|Memory:||G.Skill RipjawZ DDR3-1866 4 x 8GB (9-10-9-26)|
|Case:||NZXT Phantom 630 Windowed Edition|
|Video Cards:||NVIDIA GeForce GTX 1080 Ti Founders Edition|
NVIDIA GeForce GTX 1060 Founders Edition
AMD Radeon RX 480 8GB
|Video Drivers:||NVIDIA Release 378.92|
AMD Radeon Software Crimson 17.3.3
|OS:||Windows 10 Pro|
As a reminder for the API Overhead test, this is not a cross-system test or even a cross-GPU test. The purpose of the test is solely to measure overhead within a single setup. In practice it’s something of a combined GPU and driver test, as depending on where the bottleneck lies, the limiting factor can be CPU overhead from the driver or just outright hitting the limits of the GPU’s command processor.
The purpose of the test is to compare API performance on a single system. It should not be used to compare component performance across different systems. Specifically, this test should not be used to compare graphics cards, since the benefit of reducing API overhead is greatest in situations where the CPU is the limiting factor.
With that out of the way, let’s start somewhere in the middle of the pack with the GeForce GTX 1060 6GB.
Going into this I was not expecting Vulkan and DX12 overhead to be meaningfully different, so having run the GTX 1060 6GB first, it definitely caught me by surprise that Vulkan’s overhead was so much lower. The net result is that using the Vulkan API, the GTX 1060 can sustain 26.4M draw calls per second, 32% more than DirectX 12. And to be sure, this is consistent across multiple runs.
However before jumping to conclusions, let’s take a look at a couple of other cards.
AMD’s fastest GCN4 card and card most comparable to the GTX 1060 is the Radeon RX 480 8GB. Running the API Overhead test on this card produces something notably different from the GTX 1060. Rather than finding Vulkan well in the lead, we get not quite the inverse: DX12 is holding a small lead at 26M draw calls versus 24.9M draw calls.
The two obvious differences here are the GPU and driver – in other words, the two things that matter the most – and is why cross-GPU results are not directly comparable. However it does go to show that whatever is causing Vulkan to perform better on the GTX 1060 is not a consistent factor. Switching things up can easily put Vulkan performance on the back foot.
Finally, to give the test as much GPU power as possible, I’ve also gone ahead and run it on NVIDIA’s recently released GTX 1080 Ti. This gives us a datapoint where GPU bottlenecking has been reduced as much as possible, and also gives us another datapoint with NVIDIA’s driver set.
The results still put Vulkan in the lead, but not by anywhere near what we saw on the GTX 1060. 32.4M calls versus 31.3M calls is a much narrower 4% difference between the APIs. What this does hint at is that on NVIDIA cards, the Vulkan API path has an edge in overhead, but even within just the NVIDIA ecosystem it’s not a massive difference. As with the AMD RX 480, for the GTX 1080 Ti this is essentially a draw between the two APIs. Which to be fair, is what we’d expect to find.
Overall the latest 3DMark API Overhead benchmark proved both reassuring, and more interesting than I was expecting to find. At a high level, neither Vulkan nor DirectX 12 hold a consistent lead with regards to API overhead, which indicates that both AMD and NVIDIA have done a good job optimizing their drivers and runtimes for this API. And more to the point, no matter the API used, it’s still vastly more efficient than DirectX 11, to the point where the draw call throughput is significantly greater than anything a developer could hope to use in the real world.
However the GTX 1060 results present an interesting anomaly, with the Vulkan API path showcasing noticeably lower overhead. I do have to stress that this is absolutely academic – these low-level feature tests are designed to test one small aspect of a GPU/system, and game performance won’t be anything like this – but it is an unexpected find that hints that Vulkan and DX12 may not be so neck-and-neck at all times on NVIDIA cards. The question we’re left to ponder is whether this is product of NVIDIA’s drivers, or if there’s something at the API level that just maps a bit better to NVIDIA’s command processor…