Apple recently announced a suite of AI-powered Apple Intelligence features for its iPad, Mac, and older iPhone 15 devices. Now, Apple Intelligence documentation has been released, revealing that the company has suddenly moved away from Nvidia’s cutting-edge H100 compute accelerators and servers built on them to train its AI models, using systems with Google’s specialized TPUv4 and TPUv5 chips instead.
Google TPU (Tensor Processing Unit)-based systems were used to train Apple Foundation Models (AFM) models, which come in AFM-server and AFM-on-device variants for online and offline features. AFM-server is Apple’s largest LLM language model, trained on 8,192 TPUv4 chips in an 8×1024 chip configuration. Pre-training was a three-step process, processing a total of over seven trillion tokens. The training data came from the Applebot web crawler, and various licensed “high-quality” datasets were also used. The use of curated code, mathematical computations, and publicly available datasets is mentioned. In the case of the AFM-on-device models, a cluster with 2,048 Google TPUv5p chips was used.
Apple’s internal tests show that its models perform very well, so the company hopes to have a strong position in the AI race despite being the last to enter the market.
Source:
Tom’s Hardware