Canonical announced the beta release of optimized inference snaps, a new way to deploy AI models on Ubuntu devices, with automatic selection of optimized engines, quantizations and architectures based on the specific silicon of the device. Canonical is working with a wide range of silicon providers to deliver their optimizations of well-known LLMs to the developers and devices.
A single well-known model like Qwen 2.5 VL or DeepSeek R1 has many different sizes and setup configurations, each of which is optimized for specific silicon. It can be difficult for an end-user to know which model size and runtime to use on their device. Now, a single command gets you the best combination, automatically. Canonical is working with silicon partners to integrate their optimizations. As new partners publish their optimizations, the models will become more efficient on more devices.
sudo snap install deepseek-r1 --beta
This enables developers to integrate well-known AI capabilities seamlessly into their applications and have them run optimally across desktops, servers, and edge devices.
A snap package can dynamically load components. We fetch the recommended build for the host system, simplifying dependency management while improving latency. This public beta includes Ampere®-optimized DeepSeek R1 and Qwen 2.5 VL as examples, and open sources the framework by which these are built.