Select Page

In the beginning, the GPU was intended to send video to the display. Back then, the data flowed primarily in one direction, from CPU to GPU and then to display.

In 2007, NVidia released CUDA, a radical shift. CUDA would allow data to flow from CPU to GPU, processed within GPU, and then returned to CPU. This development would allow the highly parallel advantages of the GPU to perform tasks poorly suited to the CPU’s more serial nature. This was before the Ai trend.

NVidia CUDA logo

Apple followed NVidia in 2009 with OpenCL, a framework with similar intent. Unlike CUDA, OpenCL is open source and compatible with both AMD and NVidia. Apple had a history of switching between AMD / ATI and NVidia graphics in the Mac, so this made a lot of sense. Apple deprecated support for OpenCL in 2014 (macOS 10.14) in favor of Metal.

OpenCL has been essentially the only choice for compute-based software that aims to be cross-platform. Despite being deprecated, macOS can still run it. Windows can run it. Linux can run it. OpenCL still runs on Apple Silicon, AMD, Intel, and NVidia graphics. Is cross-platform reason enough to pick OpenCL for the software of the future?

Yet for NVidia cards, CUDA performance usually beats OpenCL. Combined with superior development tools, CUDA became the preferred platform for code that doesn’t aim to run cross-platform. NVidia cards remain more popular than AMD and have for decades, usually attributed to allegedly superior drivers and other optimizations.

Apple appeared to part with NVidia in 2013. The last Mac to feature NVidia graphics was the 15″ 2014 MacBook Pro. This change deepened the divide between CUDA and OpenCL. High-end PCs featured more CUDA cores than Macs anyway and this made the PC the preferred platform for CUDA applications even before the split.

AMD knows that CUDA is a competitive advantage for NVidia. In the age of Ai, it’s easy to understand that AMD might want to run CUDA-based applications on AMD hardware to erode that advantage.

ZLUDA seemed like AMD’s big break. This translation layer would allow CUDA applications to run on AMD and Intel dGPUs and broaden compatibility across platforms. AMD even funded these efforts. NVidia was understandably not happy about it.

AMD says its own ROCm compute platform is a priority. Released in 2016, AMD’s compute platform hasn’t had as much time to reach maturity as CUDA. It has faced criticism for poor documentation and is only recently compatible with Windows. We believe Windows support is critical for mainstream adoption of the NPU to help justify the existence of our first ever “Ai PC” build. AMD has promoted the “Ai PC” and NPU in consumer-grade hardware for local inference.

Microsoft wants cross-platform compute, too. The goal of Windows is to support valuable features across as many platforms as possible. That’s part of why it has developed DirectML with language and branding more specific to Ai. This is ideal for consumer applications where the NPU is available and lower latency is required.

ZLUDA is effectively dead, ROCm is new, and DirectML is newer. It will take serious effort, strong hardware and software partnerships, and time to unseat CUDA as the default compute platform. But the supply of NVidia GPUs is challenging. If AMD can leverage hardware and software correctly, it might have a unique opportunity in both data center and consumer hardware and software. We hope for strong competition in both spaces.

Cross-platform solutions are usually best for consumer choice. Proprietary platforms help the corporations that create them, but most users would rather not be bound to a particular platform. Getting developers to switch platforms is not easy, especially since NVidia maintains a massive share of the market in consumer and data center. The success of the NPU depends on software that can run on consumer-grade hardware in addition to servers in a data center.

CUDA is still winning. Most Ai models are optimized to train or infer on CUDA. They run faster there and despite Apple Silicon’s superior idle states, CUDA can be more computationally efficient than OpenCL. This could change, but when?

Full disclosure: Devin owns shares of NVDA and graphics cards from both NVidia and AMD.