AIM Models¶
AIM Model resources form a catalog that maps model identifiers to specific container images. This document explains the model resource types, discovery mechanism, and lifecycle.
Overview¶
Model resources serve two purposes:
- Registry: Translate abstract model references into concrete container images
- Version control: Update which container serves a model without changing service configurations
Cluster vs Namespace Scope¶
AIMClusterModel¶
Cluster-scoped models are typically installed by administrators through GitOps workflows or Helm charts. They represent curated model catalogs maintained by platform teams or model publishers.
Cluster models provide a consistent baseline across all namespaces. Any namespace can reference a cluster model unless it defines a namespace-scoped model with the same name, which takes precedence.
Discovery for cluster models runs in the operator namespace (default: aim-system). Auto-generated templates are created as cluster-scoped resources.
AIMModel¶
Namespace-scoped models allow teams to:
- Define team-specific model variants
- Override cluster-level definitions for testing
- Control model access at the namespace level
When both cluster and namespace models exist with the same metadata.name, the namespace resource takes precedence within that namespace.
Discovery for namespace models runs in the model's namespace. Auto-generated templates are created as namespace-scoped resources.
Model Specification¶
An AIM Model uses metadata.name as the canonical model identifier:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterModel
metadata:
name: qwen-qwen3-32b
spec:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
discovery:
extractMetadata: true
createServiceTemplates: true
resources:
limits:
cpu: "8"
memory: 64Gi
requests:
cpu: "4"
memory: 32Gi
Fields¶
| Field | Purpose |
|---|---|
image |
Container image URI implementing this model. The operator inspects this image during discovery. |
discovery |
Controls metadata extraction and automatic template generation. Discovery is attempted automatically. |
discovery.createServiceTemplates |
When true (default), creates ServiceTemplates from recommended deployments published by the image. |
defaultServiceTemplate |
Default template name to use when services reference this model without specifying a template. Optional. |
imagePullSecrets |
Secrets for pulling the container image during discovery and inference. Must exist in the same namespace as the model (or operator namespace for cluster models). |
serviceAccountName |
Service account to use for discovery jobs and metadata extraction. If empty, uses the default service account. |
resources |
Default resource requirements. These serve as baseline values that templates and services can override. |
Discovery Mechanism¶
Discovery is an automatic process that extracts metadata from container images and creates templates.
Discovery Process¶
When discovery is enabled:
-
Registry Inspection: The controller directly queries the container registry using the operator's network context and any configured imagePullSecrets
-
Image Metadata Fetch: Using go-containerregistry, the controller pulls image metadata (labels) without downloading the full image
-
Metadata Storage: Extracted metadata is written to
status.imageMetadata -
Template Generation: If
createServiceTemplates: true, the controller examines the image's recommended deployments and creates corresponding ServiceTemplate resources
Expected Labels¶
AIM discovery looks for container image labels with the following prefix:
- com.amd.aim.model.canonicalName
- com.amd.aim.model.deployments
Images without these labels will have minimal metadata. If createServiceTemplates: true
but no recommendedDeployments are found, no templates are created.
Lifecycle and Status¶
Status Field¶
The status field tracks discovery progress:
| Field | Description |
|---|---|
status |
Enum: Pending, Progressing, Ready, Degraded, Failed |
conditions |
Detailed conditions including RuntimeConfigReady, ImageMetadataReady, and ServiceTemplatesReady |
resolvedRuntimeConfig |
Metadata about the runtime config that was resolved (name, namespace, scope, UID) |
imageMetadata |
Extracted metadata from the container image including model and OCI metadata |
Status Values¶
- Pending: Initial state, waiting for reconciliation
- Progressing: Discovery job running or templates being created
- Ready: Discovery succeeded and all auto-generated templates are healthy
- Degraded: Discovery succeeded but some templates have issues
- Failed: Discovery failed or required labels missing
Conditions¶
RuntimeConfigReady: Reports runtime config resolution status. Common reasons:
ConfigFound: Runtime configuration was successfully resolvedDefaultConfigNotFound: No default runtime config found (non-fatal)ConfigNotFound: Explicitly referenced runtime config not found
ImageMetadataReady: Reports image inspection status. Common reasons:
ImageMetadataFound: Metadata extraction succeededImageFound: Image is reachable, but metadata labels are missingMetadataExtractionFailed: Failed to extract metadata from the image
Toggling Discovery¶
You can enable discovery after image creation:
The controller runs extraction on the next reconciliation and updates status accordingly.
Disabling discovery after templates exist leaves templates in place. Existing templates are not deleted automatically.
Resource Resolution¶
When services reference a model, the controller merges resources from multiple sources:
- Service-level:
AIMService.spec.resources(highest precedence) - Template-level:
AIMServiceTemplate.spec.resources - Model-level:
AIMModel.spec.resources(baseline)
If GPU quantities remain unset after merging, the controller copies them from discovery metadata recorded on the template (status.profile.metadata.gpu_count).
Model Lookup¶
For namespace-scoped lookups (from templates or services in a namespace):
- Check for
AIMModelin the same namespace - Fall back to
AIMClusterModelwith the same name
This allows namespace models to override cluster baselines.
Examples¶
Cluster Model with Discovery¶
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterModel
metadata:
name: qwen-qwen3-32b
spec:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
runtimeConfigName: platform-default
discovery:
extractMetadata: true
createServiceTemplates: true
resources:
limits:
cpu: "8"
memory: 64Gi
requests:
cpu: "4"
memory: 32Gi
Namespace Model Without Discovery¶
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMModel
metadata:
name: qwen-qwen3-32b-dev
namespace: ml-team
spec:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
runtimeConfigName: ml-team
defaultServiceTemplate: custom-template-name
discovery:
extractMetadata: false # skip image metadata extraction
createServiceTemplates: false
resources:
limits:
cpu: "6"
memory: 48Gi
Enabling Discovery for Private Container Images¶
# Secret in namespace
apiVersion: v1
kind: Secret
metadata:
name: private-registry
namespace: ml-team
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: BASE64_CONFIG
---
# Runtime config in namespace
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMRuntimeConfig
metadata:
name: default
namespace: ml-team
spec:
serviceAccountName: aim-runtime
imagePullSecrets:
- name: private-registry
---
# Model with discovery
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMModel
metadata:
name: proprietary-model
namespace: ml-team
spec:
image: private.registry/models/proprietary:v1
runtimeConfigName: default # uses config above
discovery:
extractMetadata: true
createServiceTemplates: true
Troubleshooting¶
Discovery Fails¶
Check the operator logs for registry access errors:
Common causes: - Missing or invalid imagePullSecrets (secrets must exist in operator namespace for cluster models) - Image doesn't exist or tag is invalid - Network connectivity issues to the registry
Templates Not Auto-Created¶
Check the model status:
Look for:
discovery.extractMetadata: false- metadata extraction is disableddiscovery.createServiceTemplates: false- auto-template creation is disabled- Model condition reasons such as
NoTemplatesExpectedorCreatingTemplates
ImageMetadataReady Condition False¶
The container image is missing required labels or the discovery job failed. Check:
kubectl get aimclustermodel <name> -o jsonpath='{.status.conditions[?(@.type=="ImageMetadataReady")]}'
Inspect the container image labels:
Auto-Creation from Services¶
When a service uses spec.model.image directly (instead of spec.model.name), AIM automatically creates a model resource if one doesn't already exist with that image URI. Auto-created models are namespace-scoped.
Discovery for Auto-Created Models¶
The runtime config's spec.model.autoDiscovery field controls whether auto-created models run discovery:
Example¶
Service using direct image reference:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: my-service
namespace: ml-team
spec:
model:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
runtimeConfigName: default
If the runtime config has autoDiscovery: true, AIM creates a namespace-scoped model and discovery runs automatically:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMModel
metadata:
name: auto-<hash-of-image>
namespace: ml-team
spec:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
discovery:
extractMetadata: true
createServiceTemplates: true
Custom Models¶
Custom models allow you to deploy models from external sources (S3, HuggingFace) without requiring a pre-built AIM container image. The AIM operator uses a generic base container that downloads model weights at runtime.
Overview¶
Unlike image-based models where model weights are embedded in the container image, custom models:
- Download weights from external sources (S3 or HuggingFace)
- Use the
amdenterpriseai/aim-basecontainer for inference - Skip discovery (no image metadata extraction needed)
- Require explicit hardware specifications
Creating Custom Models¶
There are two ways to create custom models:
1. Direct AIMModel with modelSources¶
Create an AIMModel or AIMClusterModel with modelSources instead of relying on image discovery:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMModel
metadata:
name: my-custom-qwen
namespace: ml-team
spec:
image: amdenterpriseai/aim-base:latest
modelSources:
- modelId: Qwen/Qwen3-32B
sourceUri: s3://my-bucket/models/qwen3-32b
# size: 16Gi # Optional - auto-discovered by download job if omitted
custom:
hardware:
gpu:
requests: 1
models:
- MI300X
2. Inline Custom Model in AIMService¶
Create an AIMService with spec.model.custom to auto-create a custom model:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: my-qwen-service
namespace: ml-team
spec:
model:
custom:
baseImage: amdenterpriseai/aim-base:latest
modelSources:
- modelId: Qwen/Qwen3-32B
sourceUri: hf://Qwen/Qwen3-32B
# size is optional - auto-discovered by download job
hardware:
gpu:
requests: 1
template:
allowUnoptimized: true # Required - custom models default to unoptimized
The service automatically creates a namespace-scoped AIMModel. Custom models are shared resources that persist independently of the service, allowing them to be reused by other services or manually managed.
Model Sources¶
Each model source specifies:
| Field | Required | Description |
|---|---|---|
modelId |
Yes | Canonical identifier in {org}/{name} format. Determines the cache mount path. |
sourceUri |
Yes | Download location. Schemes: hf://org/model (HuggingFace) or s3://bucket/key (S3). For S3, use the bucket name directly without the service hostname (e.g., s3://my-bucket/models/qwen3-32b). |
size |
No | Storage size for PVC provisioning. If omitted, the download job automatically discovers the size. Can be set explicitly to pre-allocate storage. |
env |
No | Per-source credential overrides (e.g., HF_TOKEN, AWS_ACCESS_KEY_ID) |
Hardware Requirements¶
Custom models require explicit hardware specifications since discovery doesn't run.
These go under spec.custom.hardware for AIMModel, or spec.model.custom.hardware for inline AIMService:
# For AIMModel:
spec:
custom:
hardware:
gpu:
requests: 2 # Number of GPUs required
models: # Optional: specific GPU models for node affinity
- MI300X
- MI250
minVram: 64Gi # Optional: minimum VRAM per GPU for capacity planning
cpu:
requests: "4" # Required if cpu field is specified: CPU requests
limits: "8" # Optional: CPU limits
If no models are specified, the workload can run on any available GPU. The minVram field is used for capacity planning when the model size is known.
Template Generation¶
When modelSources is specified:
- Without custom.templates: A single template is auto-generated using
custom.hardware - With custom.templates: Templates are created per entry, each inheriting from
custom.hardwareunless overridden
Templates also inherit the type field from spec.custom.type, which defaults to unoptimized. This can be overridden per-template via customTemplates[].type.
spec:
modelSources:
- modelId: Qwen/Qwen3-32B
sourceUri: s3://bucket/model
custom:
type: unoptimized # Default - can be omitted
hardware:
gpu:
requests: 1
templates:
- name: high-memory # Generated as {modelName}-custom-[{name}][-{precision}][-{gpu}]-{hash}
hardware:
gpu:
requests: 2 # Override
env:
- name: VLLM_GPU_MEMORY_UTILIZATION
value: "0.95"
- name: standard
# Inherits hardware and type from custom.*
Unoptimized Templates and allowUnoptimized¶
Custom models generate templates with type: unoptimized by default because no discovery job runs to validate performance characteristics. This has an important implication:
Services will not auto-select unoptimized templates unless explicitly allowed.
When creating an AIMService that uses a custom model, you must either:
- Set
allowUnoptimized: trueon the service's template selector:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: my-service
spec:
model:
name: my-custom-model
template:
allowUnoptimized: true # Required for custom model templates
- Explicitly specify the template name to bypass auto-selection:
This safety mechanism prevents accidentally deploying unoptimized configurations in production. See Template Resolution for more details on how templates are selected and the role of optimization levels.
Authentication¶
Configure credentials for private sources:
HuggingFace¶
spec:
modelSources:
- modelId: Qwen/Qwen3-32B
sourceUri: hf://Qwen/Qwen3-32B
size: 16Gi
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-credentials
key: token
S3-Compatible Storage¶
spec:
modelSources:
- modelId: my-org/custom-model
sourceUri: s3://my-bucket/models/custom
size: 32Gi
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: s3-credentials
key: access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: s3-credentials
key: secret-key
- name: AWS_ENDPOINT_URL
value: "https://s3.my-provider.com"
Lifecycle Differences¶
| Aspect | Image-Based Models | Custom Models |
|---|---|---|
| Model weights | source URI embedded in image | source URI in spec |
| Discovery | Runs to extract metadata | Skipped |
| Hardware | Optional (from discovery) | Required |
| Templates | Auto-generated from image labels | Auto-generated from spec |
| Caching | Uses shared template cache | Uses dedicated template cache |
Status¶
Custom models report sourceType: Custom in their status:
Example: Full Custom Model Deployment¶
# Secret for HuggingFace access
apiVersion: v1
kind: Secret
metadata:
name: hf-token
namespace: ml-team
type: Opaque
stringData:
token: hf_xxxxxxxxxxxxx
---
# Custom model service
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: qwen-custom
namespace: ml-team
spec:
model:
custom:
modelSources:
- modelId: Qwen/Qwen3-32B
sourceUri: hf://Qwen/Qwen3-32B
# size is optional - auto-discovered by download job
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-token
key: token
hardware:
gpu:
requests: 1
models:
- MI300X
template:
allowUnoptimized: true # Required - custom models default to unoptimized
replicas: 1
Related Documentation¶
- Templates - Understanding ServiceTemplates and discovery
- Runtime Config Concepts - Resolution details including model creation
- Services Usage - Deploying services
- Caching - Model caching and download architecture
Note on Terminology¶
AIM Model resources (AIMModel and AIMClusterModel) define the mapping between model identifiers and container images. While we sometimes refer to the "model catalog" conceptually, the Kubernetes resources are always AIMModel and AIMClusterModel.