Arm KleidiAI enables ExecuTorch 0.7 on-device GenAI with SDOT across Android devices and edge hardware
AI Impact Summary
Arm is enabling automatic acceleration for edge GenAI via KleidiAI by default in ExecuTorch 0.7, embedding optimization into edge stacks like XNNPack, MediaPipe, MNN, ONNX Runtime, and llama.cpp. The approach leverages SDOT-based int8/int4 matmul (I8MM) on Armv8.2+/Armv8.6 CPUs to accelerate LLM workloads, enabling Llama 3.2 1B on Android devices and edge hardware such as the Raspberry Pi 5. With SDOT broadly supported on about 3 billion Arm-based devices (72% of devices), this unlocks practical on-device GenAI with measurable gains (e.g., >20% higher prefill performance on Galaxy S24+; hundreds of tokens per second in prefill/decode).
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info