Skip to content

AIM Engine

AIM (AMD Inference Microservice) Engine is a Kubernetes operator that simplifies the deployment and management of AI inference workloads on AMD GPUs. It provides a declarative, cloud-native approach to running ML models at scale.

Quick Example

Deploy an inference service with a single resource:

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
  name: qwen-chat
spec:
  model:
    image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5

AIM images (like amdenterpriseai/aim-qwen-qwen3-32b) are container images that package open-source models optimized for AMD Instinct GPUs. Each image includes the model weights and a serving runtime tuned for specific GPU configurations and precision modes.

AIM Engine automatically resolves the model, selects an optimal runtime configuration for your hardware, deploys a KServe InferenceService, and optionally creates HTTP routing through Gateway API.

Where to Start

  • Cluster Administrators


    Install AIM Engine, configure KServe, manage GPU resources, and set up cluster-wide defaults.

    Installation

  • Developers & Integrators


    Deploy inference services, configure scaling, set up routing, and integrate with your applications.

    Quickstart

  • Data Scientists


    Browse the model catalog, deploy models for experimentation, and tune inference parameters.

    Model Catalog

Key Features

  • Simple Service Deployment -- Deploy inference endpoints with minimal configuration using AIMService resources
  • Automatic Optimization -- Smart template selection picks the best runtime profile based on GPU availability, precision, and optimization goals
  • Model Catalog -- Maintain a catalog of available models with automatic discovery from container registries
  • Model Caching -- Pre-download model artifacts to shared PVCs for faster startup and reduced bandwidth
  • HTTP Routing -- Expose services through Gateway API with customizable path templates
  • Autoscaling -- KEDA integration with OpenTelemetry metrics for demand-based scaling
  • Multi-tenancy -- Namespace-scoped and cluster-scoped resources for flexible team isolation

Documentation

Getting Started

  • Installation -- Prerequisites and Helm chart installation
  • Quickstart -- Deploy your first model in 5 minutes
  • Architecture -- High-level architecture and component overview

Guides

Task-oriented walkthroughs for common workflows:

Administration

Concepts

Reference