NVIDIA Ada Lovelace GPU architecture information has been leaked for a while now, specific configurations will use the next generation AD10* series chips for GeForce RTX 40 series graphics cards which have also been leaked. Now we can talk exclusively about the next generation graphics chip itself.
NVIDIA GA102 Ampere block diagram
Network detective Kopte7kimi shared details about the next generation architecture block diagram, he compares the top GPU AD102 with various other NVIDIA GPUs.
Namely Ampere GA102 and Turing TU102, game oriented while Hopper is added to the list GH100 and Ampere GA100, focused on high performance workstations. The comparison only compares the AD102 to its gaming predecessors, as the super system oriented chips are very different from the consumer oriented offerings.
NVIDIA AD102 Ada Lovelace Block Diagram
NVIDIA Ada Lovelace GPU AD102 will have up to 12 Graphics Processing Clusters (GPCs). This is an increase in 70% compared with GA102, which only has 7 GPCs. Each chip will consist of 6 TPCs and 2 SMS, which come in the same configuration as the existing chip. Each streaming multiprocessor (SM) will contain four sub-core processors, which are also the same as the GA102 GPU.
What has changed is the FP32 and INT32 core configuration. Each sub-core will include 128 FP32 blocks, but the combined FP32 + INT32 blocks increased to 192. This is because FP32 blocks do not share the same sub-core as IN32 blocks. The 128 FP32 cores are separated from the 64 INT32 cores.
So in total, each sub-core will consist of 128 FP32s plus 64 INT32 units for a total of 192 units. Each SM will have a total of 512 FP32 units plus 256 INT32 units for a total of 768 units. And since there are a total of 24 SM units (2 per GPC), that works out to 12,288 FP32 units and 6,144 INT32 units for the total number of cores 18432. Each SM will also include two processing circuits (32 threads/CLK) for 64 computations per SM. This is an increase in fifty% cores (FP32 + INT32) and increase by 33% at cores/threads compared to GA102.
Preliminary specifications NVIDIA Ada Lovelace
NVIDIA has made a big leap in cache manipulation over existing Ampere GPUs. Ada Lovelace will contain 192 KB cache L1 on SM what on fifty% more than Ampere. This is in total 4.5 MB cache L1 on the top-end GPU AD102. Cache L2 will be increased to 96 MBas mentioned in the leaks. This is a 16x increase over the Ampere GPU, which contains just 6 MB cache L2 and it will be shared between the GPU.
There is also data on the elements responsible for writing pixels to memory (ROPS), which are also increased to 32 on the GPC, 2 times increased compared to Ampere. The next generation flagship will have up to 384 ROPs vs just 112 on the fastest Ampere GPU in the Geforce RTX 3090 Ti.
In addition, the latest 4th Gen Tensor and 3rd Gen RT (Raytracing) cores will be installed on Ada Lovelace GPUs to help push DLSS and Raytracing performance to the next level. Overall, the Ada Lovelace AD102 GPU will offer:
2x GPCs against Ampere. 50% more cores compared to Ampere. 50% more L1 cache compared to Ampere. 16x more L2 cache than Ampere. Doubling ROPs against Ampere. 4th generation Tensor and 3rd generation Raytracing cores.
It is worth mentioning that the clock speeds, which are said to be in the range 2-3 GHz, are not taken into account in the equation, so they will also play an important role in improving per-core performance compared to Ampere.
Video cards of the series NVIDIA GeForce RTX 4000 with next generation Ada Lovelace gaming GPUs are likely to be released in the second half of 2022 on the same TSMC 4N manufacturing process as the Hopper H100.
Preliminary specifications of the NVIDIA graphics core