Nvidia is a market leader in artificial intelligence computing hardware. The green corporation makes billions, and new devices are shipped by the ton. At the same time, both AMD and Intel have alternative solutions. The latter offers a powerful computing accelerator called Gaudi 2, which we don't hear much about in the news but has high potential. Stability AI, which develops the famous Stable Diffusion AI, shared its tests of Intel Gaudi 2 and Nvidia H100. And suddenly it turned out that the Intel accelerator provides better results.
In the calculations of the new Stable Diffusion 3 model, the Intel Gaudi 2 accelerator showed exceptional results. For testing, a model with 2 billion parameters was run on two nodes with 16 accelerators each. It turned out that the configuration with Gaudi 2 provides image processing 56% faster! And if we compare it with the older Nvidia A100, then the difference is 2.4 times!
Also presented are the calculation results of the Stable Beluga 2.5 model with 70 billion parameters based on the LLaMA 2 model. Without additional operations and optimizations when managing PyTorch, a configuration of 256 Gaudi 2 accelerators provides an average throughput of 116,777 tokens per second, which is 28% faster than the A100 configuration.
The Intel Gaudi 2 device is based on a powerful chip of its own architecture. It is known to be focused on heterogeneous computing, equipped with 24 large tensor cores, 48 MB of SRAM and 96 gigabytes of HBM2e memory, plus 24 integrated Gigabit Ethernet. The large amount of memory may be one of the factors that determined the Gaudi 2's success in tests. The original Nvidia H100 accelerator is equipped with 80 gigabytes, which is not enough for large AI models. But we must remember that the green giant has already announced the H200 with 141 GB of HBM3e memory.
Source:
Wccftech