Model Caching¶
Model caching pre-downloads model artifacts to shared persistent volumes, reducing startup time and bandwidth usage across service replicas and restarts.
Caching Modes¶
Control caching behavior with spec.caching.mode:
| Mode | Behavior |
|---|---|
Shared |
Reuses shared cache assets across services that use the same template. This is the default. |
Dedicated |
Creates service-owned dedicated cache assets, isolated from other services. |
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: qwen-chat
spec:
model:
image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
caching:
mode: Shared
Note
The caching mode is immutable after creation. Legacy values Always, Auto, and Never are accepted for backward compatibility (Always/Auto map to Shared, Never maps to Dedicated).
How Caching Works¶
When caching is active, AIM Engine creates a hierarchy of resources:
- AIMTemplateCache — Groups all model artifacts for a specific template on a shared PVC
- AIMArtifact — Manages the download of individual model sources
- PVC + Download Job — The actual storage and download execution
The template cache is owned by the template (not the service), so multiple services sharing the same template reuse the same cache.
Download Protocols¶
For HuggingFace models (hf:// sources), AIM Engine tries download protocols in sequence:
| Protocol | Description |
|---|---|
XET |
XetHub protocol (fastest for large models) |
HF_TRANSFER |
HuggingFace Transfer (optimized multi-part download) |
HTTP |
Standard HTTP download (slowest, most compatible) |
The default order is XET,HF_TRANSFER. If a protocol fails, the downloader cleans up incomplete files and tries the next one.
Override the protocol order via runtime configuration:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMRuntimeConfig
metadata:
name: default
namespace: ml-team
spec:
env:
- name: AIM_DOWNLOADER_PROTOCOL
value: "HF_TRANSFER,HTTP"
Storage Sizing¶
AIM Engine automatically sizes PVCs based on discovered model sizes plus a headroom percentage (default 10%). Configure the headroom via cluster runtime config:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterRuntimeConfig
metadata:
name: default
spec:
storage:
pvcHeadroomPercent: 15
defaultStorageClassName: longhorn
Monitoring Cache Status¶
Check the status of template caches and artifacts:
# List template caches
kubectl get aimtemplatecache -n <namespace>
# List artifacts
kubectl get aimartifact -n <namespace>
# Check artifact download progress
kubectl get aimartifact <name> -n <namespace> -o jsonpath='{.status}' | jq
Next Steps¶
- Model Caching Concepts — Cache hierarchy, ownership, and deletion behavior
- Storage Configuration — PVC and storage class setup
- Environment Variables — Downloader configuration