AIM Engine¶

AIM (AMD Inference Microservice) Engine is a Kubernetes operator that simplifies the deployment and management of AI inference workloads on AMD GPUs. It provides a declarative, cloud-native approach to running ML models at scale.

Quick Example¶

Deploy an inference service with a single resource:

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
  name: qwen-chat
spec:
  model:
    image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5

AIM images (like amdenterpriseai/aim-qwen-qwen3-32b) are container images that package open-source models optimized for AMD Instinct GPUs. Each image includes the model weights and a serving runtime tuned for specific GPU configurations and precision modes.

AIM Engine automatically resolves the model, selects an optimal runtime configuration for your hardware, deploys a KServe InferenceService, and optionally creates HTTP routing through Gateway API.

Where to Start¶

Cluster Administrators

Install AIM Engine, configure KServe, manage GPU resources, and set up cluster-wide defaults.

Installation
Developers & Integrators

Deploy inference services, configure scaling, set up routing, and integrate with your applications.

Quickstart
Data Scientists

Browse the model catalog, deploy models for experimentation, and tune inference parameters.

Model Catalog

Key Features¶

Simple Service Deployment -- Deploy inference endpoints with minimal configuration using AIMService resources
Automatic Optimization -- Smart template selection picks the best runtime profile based on GPU availability, precision, and optimization goals
Model Catalog -- Maintain a catalog of available models with automatic discovery from container registries
Model Caching -- Pre-download model artifacts to shared PVCs for faster startup and reduced bandwidth
HTTP Routing -- Expose services through Gateway API with customizable path templates
Autoscaling -- KEDA integration with OpenTelemetry metrics for demand-based scaling
Multi-tenancy -- Namespace-scoped and cluster-scoped resources for flexible team isolation

Documentation¶

Getting Started¶

Installation -- Prerequisites and Helm chart installation
Quickstart -- Deploy your first model in 5 minutes
Architecture -- High-level architecture and component overview

Guides¶

Task-oriented walkthroughs for common workflows:

Deploying Services -- Deploy and manage inference endpoints
Model Catalog -- Browse and select models
Scaling and Autoscaling -- Replicas, KEDA, and metrics
Model Caching -- Pre-cache models for faster startup
Routing and Ingress -- Gateway API patterns and path templates
Private Registries -- Authentication for HuggingFace, S3, and OCI
Multi-Tenancy -- Namespace isolation patterns

Administration¶

Installation Reference -- Full install reference with all Helm values
KServe Configuration -- Install and configure KServe
GPU Management -- GPU allocation, node selectors, topology
Storage Configuration -- PVCs, shared storage for caching
Upgrading -- Version migration and CRD upgrades
Monitoring -- Metrics, observability, and log formats
Troubleshooting -- Common issues and diagnostic steps
Security -- RBAC, network policies, and secrets management

Concepts¶

AIM Services -- Service deployment lifecycle, template selection, and caching
AIM Models -- Model catalog, discovery, and resolution
Model Sources -- Automatic model discovery from container registries
Service Templates -- Runtime profiles, derivation, and discovery cache
Runtime Configuration -- Storage defaults, routing, and environment resolution
Model Caching -- Cache hierarchy, ownership, and deletion behavior
Resource Lifecycle -- Ownership, finalizers, and deletion behavior

Reference¶

CRD API Reference -- Complete API specification for all custom resources
Helm Chart Values -- All configurable Helm chart values
CLI and Operator Flags -- Operator binary flags and endpoints
Environment Variables -- Operator and downloader configuration
Naming and Labels -- Derived naming algorithm and label conventions
Conditions -- Full catalog of conditions across all CRDs