AIM Engine¶
AIM (AMD Inference Microservice) Engine is a Kubernetes operator that simplifies the deployment and management of AI inference workloads on AMD GPUs. It provides a declarative, cloud-native approach to running ML models at scale.
Quick Example¶
Deploy an inference service with a single resource:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: qwen-chat
spec:
model:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
AIM images (like amdenterpriseai/aim-qwen-qwen3-32b) are container images that package open-source models optimized for AMD Instinct GPUs. Each image includes the model weights and a serving runtime tuned for specific GPU configurations and precision modes.
AIM Engine automatically resolves the model, selects an optimal runtime configuration for your hardware, deploys a KServe InferenceService, and optionally creates HTTP routing through Gateway API.
Where to Start¶
-
Cluster Administrators
Install AIM Engine, configure KServe, manage GPU resources, and set up cluster-wide defaults.
-
Developers & Integrators
Deploy inference services, configure scaling, set up routing, and integrate with your applications.
-
Data Scientists
Browse the model catalog, deploy models for experimentation, and tune inference parameters.
Key Features¶
- Simple Service Deployment -- Deploy inference endpoints with minimal configuration using
AIMServiceresources - Automatic Optimization -- Smart template selection picks the best runtime profile based on GPU availability, precision, and optimization goals
- Model Catalog -- Maintain a catalog of available models with automatic discovery from container registries
- Model Caching -- Pre-download model artifacts to shared PVCs for faster startup and reduced bandwidth
- HTTP Routing -- Expose services through Gateway API with customizable path templates
- Autoscaling -- KEDA integration with OpenTelemetry metrics for demand-based scaling
- Multi-tenancy -- Namespace-scoped and cluster-scoped resources for flexible team isolation
Documentation¶
Getting Started¶
- Installation -- Prerequisites and Helm chart installation
- Quickstart -- Deploy your first model in 5 minutes
- Architecture -- High-level architecture and component overview
Guides¶
Task-oriented walkthroughs for common workflows:
- Deploying Services -- Deploy and manage inference endpoints
- Model Catalog -- Browse and select models
- Scaling and Autoscaling -- Replicas, KEDA, and metrics
- Model Caching -- Pre-cache models for faster startup
- Routing and Ingress -- Gateway API patterns and path templates
- Private Registries -- Authentication for HuggingFace, S3, and OCI
- Multi-Tenancy -- Namespace isolation patterns
Administration¶
- Installation Reference -- Full install reference with all Helm values
- KServe Configuration -- Install and configure KServe
- GPU Management -- GPU allocation, node selectors, topology
- Storage Configuration -- PVCs, shared storage for caching
- Upgrading -- Version migration and CRD upgrades
- Monitoring -- Metrics, observability, and log formats
- Troubleshooting -- Common issues and diagnostic steps
- Security -- RBAC, network policies, and secrets management
Concepts¶
- AIM Services -- Service deployment lifecycle, template selection, and caching
- AIM Models -- Model catalog, discovery, and resolution
- Model Sources -- Automatic model discovery from container registries
- Service Templates -- Runtime profiles, derivation, and discovery cache
- Runtime Configuration -- Storage defaults, routing, and environment resolution
- Model Caching -- Cache hierarchy, ownership, and deletion behavior
- Resource Lifecycle -- Ownership, finalizers, and deletion behavior
Reference¶
- CRD API Reference -- Complete API specification for all custom resources
- Helm Chart Values -- All configurable Helm chart values
- CLI and Operator Flags -- Operator binary flags and endpoints
- Environment Variables -- Operator and downloader configuration
- Naming and Labels -- Derived naming algorithm and label conventions
- Conditions -- Full catalog of conditions across all CRDs