Ampere Optimized ONNX Runtime
Ampere’s inference acceleration engine is fully integrated with ONNX Runtime framework. ONNX models and software written with ONNX Runtime API can run as-is, without any modifications.
Ampere’s inference acceleration engine is fully integrated with ONNX Runtime framework. ONNX models and software written with ONNX Runtime API can run as-is, without any modifications.
Ampere® Processors, with high performance Ampere Optimized Frameworks in Docker images, offer the best-in-class Artificial Intelligence inference performance for standard frameworks including TensorFlow, PyTorch and ONNXRT and llama.cpp. Ampere optimized containers come fully integrated with their respective frameworks.
Ampere instances on OCI are some of the most cost-effective instances available in the Cloud today. OCI Ampere A1 started this with extremely high-performance shapes that with the ability to use the OCI Flex Shapes feature to provision at the single core resolution making this infrastructure very efficient.
Azure series VMs in B, D and E series instances are available and offer some of the best price-performance for workloads on the Azure cloud. Ampere publishes 3 optimized frameworks for ease of access on Azure cloud marketplace, tested and proven AI inference for any model compatible with the framework.
Ampere instances on Google GCP can be found in a variety of instance sizes and configurations. Ampere instances on GCP are available now and offer some of the best price-performance for workloads on the GCP cloud. Ampere publishes 3 optimized frameworks for ease of access on Google cloud marketplace, tested and proven AI inference for any model compatible with the framework.