Swift Transformers enables on-device LLMs on Apple devices via Core ML (Llama 2, Falcon 7B)
AI Impact Summary
The post describes running LLMs on Apple devices using Core ML via the Swift Transformers ecosystem, demonstrating on-device inference for models like Llama 2 7B and Falcon 7B. It outlines a multi-toolchain approach (transformers-to-coreml, exporters, coremltools) and practical steps for conversion, tokenizer handling, and performance optimization to balance latency and memory. While on-device execution reduces cloud calls and improves privacy, teams must plan for conversion reliability, flexible input handling, and hardware constraints across CPU, GPU, and Neural Engine.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info