Runtime Configuration Architecture¶
Runtime configurations provide storage defaults and routing parameters. This document explains the resolution algorithm, inheritance model, and status tracking.
Resolution Model¶
The AIM operator resolves runtime settings from two Custom Resource Definitions:
AIMClusterRuntimeConfig: Cluster-wide defaults that apply across namespaces, useful for single-tenant clustersAIMRuntimeConfig: Namespace-scoped configuration including authentication secrets, useful for multi-tenant clusters
Resolution Algorithm¶
When a workload references runtimeConfigName: my-config:
- The controller first looks for
AIMRuntimeConfignamedmy-configin the workload's namespace - If both namespace and cluster configs exist, they are merged (namespace values take precedence). Note also that any runtimeconfig embedded in AIMService takes precedence over namespaced runtimeconfig values.
- If not found, the controller falls back to
AIMClusterRuntimeConfignamedmy-config - The resolved configuration is published in the consumer's
status.resolvedRuntimeConfig
When runtimeConfigName is omitted, the controller resolves a config named default. If this is not found, no error is raised. However, if a config that is not named default is specified, it must exist, otherwise an error is raised.
Resolved Runtime Config Tracking¶
The resolved configuration is published in status.resolvedRuntimeConfig with:
- Reference to the source object (namespace or cluster scope)
- UID of the resolved config for identity tracking
Namespace Config Status¶
status:
resolvedRuntimeConfig:
kind: AIMRuntimeConfig
name: default
namespace: ml-team
scope: Namespace
uid: abc123-def456-...
Cluster Config Status¶
status:
resolvedRuntimeConfig:
kind: AIMClusterRuntimeConfig
name: default
namespace: ""
scope: Cluster
uid: xyz123-uvw123-...
Only one ref (namespace or cluster) is present, never both.
Resources Supporting Runtime Config¶
The following AIM resources accept runtimeConfigName:
AIMModel/AIMClusterModelAIMServiceTemplate/AIMClusterServiceTemplateAIMServiceAIMTemplateCache
Each resource independently resolves its runtime config and publishes the result in status.
Configuration Scoping¶
Cluster Runtime Configuration¶
AIMClusterRuntimeConfig captures non-secret defaults shared across namespaces:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterRuntimeConfig
metadata:
name: default
spec:
defaultStorageClassName: fast-nvme
Use cases: - Platform-wide storage class defaults - Shared routing configurations for clusters without multi-tenancy
Limitations: - Cannot enforce namespace-specific policies
Namespace Runtime Configuration¶
AIMRuntimeConfig provides namespace-specific configuration including authentication:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMRuntimeConfig
metadata:
name: default
namespace: ml-team
spec:
defaultStorageClassName: team-ssd
routing:
enabled: true
gatewayRef:
name: kserve-gateway
namespace: kgateway-system
pathTemplate: "/{.metadata.namespace}/{.metadata.labels['team']}"
Use cases: - Namespace-level routing policies - Custom storage classes per team
Routing Templates¶
Runtime configs can supply a reusable HTTP route template via spec.routing.pathTemplate. The template is rendered against the AIMService object using JSONPath expressions.
Template Syntax¶
spec:
routing:
pathTemplate: "/{.metadata.namespace}/{.metadata.labels['team']}/{.spec.aimImageName}/"
Rendering Process¶
During reconciliation:
- Evaluation: Each placeholder (e.g.,
{.metadata.namespace}) is evaluated with JSONPath - Validation: Missing fields, invalid expressions, or multi-value results fail the render
- Normalization: Each path segment is:
- Lowercased
- RFC 3986 encoded
- De-duplicated (multiple slashes collapsed)
- Length Check: Final path must be ≤ 200 characters
- Trailing Slash: Removed
Rendering Failures¶
A rendered path that:
- Exceeds 200 characters
- Contains invalid JSONPath
- References missing labels/fields
...degrades the AIMService with reason PathTemplateInvalid and skips HTTPRoute creation. The InferenceService remains intact.
Precedence¶
Services evaluate path templates in this order:
AIMService.spec.routing.pathTemplate(highest precedence)- Runtime config's
spec.routing.pathTemplate - Default:
/<namespace>/<service-uid>
This allows:
- Runtime configs: Set namespace-wide path conventions
- Services: Override with specific paths when needed
Example¶
Runtime config with path template:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMRuntimeConfig
metadata:
name: default
namespace: ml-team
spec:
routing:
enabled: true
gatewayRef:
name: inference-gateway
namespace: gateways
pathTemplate: "/ml/{.metadata.namespace}/{.metadata.labels['project']}"
Service using template:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: qwen-chat
namespace: ml-team
labels:
project: conversational-ai
spec:
model:
name: qwen-qwen3-32b
# routing.pathTemplate omitted - uses runtime config template
Rendered path: /ml/ml-team/conversational-ai
Service with override:
Rendered path: /custom/qwen-chat (runtime config template ignored)
Error and Warning Behavior¶
Missing Explicit Config¶
When a workload explicitly references a non-existent config:
Result:
- Reconciliation fails
- Workload enters Failed or Degraded state with reason ConfigNotFound
- Reconciliation retries until the config appears
Missing Default Config¶
When the implicit default config doesn't exist:
- A
RuntimeConfigReadycondition is set toTruewith reasonDefaultConfigNotFound - A Normal event is emitted on the first reconcile with reason
DefaultConfigNotFound - Reconciliation continues without runtime config overrides
- Workloads relying on private registries may fail later unless a namespace config supplies credentials This allows workloads without special requirements to proceed even when no default config exists.
Label Propagation¶
Runtime configurations support automatic label propagation from parent AIM resources to their child Kubernetes resources. This feature helps maintain consistent metadata across the resource hierarchy for tracking, cost allocation, and compliance purposes.
Configuration¶
Label propagation is configured in the runtime config's labelPropagation section:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMRuntimeConfig
metadata:
name: default
namespace: ml-team
spec:
labelPropagation:
enabled: true
match:
- "org.example/cost-center"
- "org.example/team"
- "compliance.example/*" # Wildcard matches any label with this prefix
Propagation Behavior¶
When enabled, labels matching the specified patterns are automatically copied from parent resources to child resources:
- AIMService → InferenceService, HTTPRoute, PVCs, auto-created AIMModel
- AIMTemplateCache → AIMArtifact resources
- AIMArtifact → PVCs, download Jobs
- AIMModel/AIMClusterModel → auto-created AIMServiceTemplates
- AIMServiceTemplate → AIMTemplateCache
- AIMClusterModelSource → auto-created AIMClusterModel resources
Pattern Matching¶
The match field accepts exact label keys or wildcard patterns:
"org.example/team"- Matches exactly this label key"org.example/*"- Matches any label with the prefixorg.example/"compliance.*/severity"- Matches labels likecompliance.sec/severity,compliance.audit/severity
Special Handling¶
For Job resources, propagated labels are applied to both: 1. The Job's metadata labels 2. The Job's PodTemplateSpec labels (enabling pod-level tracking)
Example Use Case¶
A typical configuration for multi-tenant cost tracking:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterRuntimeConfig
metadata:
name: default
spec:
labelPropagation:
enabled: true
match:
- "org.example/cost-center"
- "org.example/department"
- "org.example/project"
When users create an AIMService with these labels:
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
name: qwen-chat
namespace: ml-team
labels:
org.example/cost-center: "eng-ml"
org.example/department: "engineering"
org.example/project: "chatbot-v2"
spec:
model:
ref: qwen-qwen3-32b
The operator propagates these labels to the InferenceService, HTTPRoute, and any PVCs created for the service, enabling cost tracking and chargeback at the infrastructure level.
Environment Variable Overrides¶
Runtime configurations can inject environment variables into managed workloads via spec.env. This is useful for setting defaults across an entire namespace or cluster, such as the download protocol strategy for model artifacts.
Download Protocol Strategy¶
The AIM_DOWNLOADER_PROTOCOL environment variable controls the sequence of protocols tried when downloading HuggingFace models. See Model Caching – Download Protocol Strategy for full details.
Example: Cluster default for environments where XET is unreliable¶
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterRuntimeConfig
metadata:
name: default
spec:
env:
- name: AIM_DOWNLOADER_PROTOCOL
value: "HTTP,XET"
Example: Namespace override preferring plain HTTP¶
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMRuntimeConfig
metadata:
name: default
namespace: ml-team
spec:
env:
- name: AIM_DOWNLOADER_PROTOCOL
value: "HTTP"
Merge Precedence¶
Environment variables are merged with the following precedence (highest first):
AIMArtifact.spec.env(per-artifact)AIMRuntimeConfig.spec.env(namespace-scoped)AIMClusterRuntimeConfig.spec.env(cluster-scoped)- Operator defaults (e.g.,
AIM_DOWNLOADER_PROTOCOL=XET,HF_TRANSFER)
This means an individual artifact can always override any runtime config setting when needed.
Operator Namespace¶
The AIM controllers determine the operator namespace from the AIM_SYSTEM_NAMESPACE environment variable (default: aim-system).
Cluster-scoped workflows such as: - Cluster template discovery - Cluster image inspection - Auto-generated cluster templates
...run auxiliary pods in this namespace and resolve namespaced runtime configs there.
Related Documentation¶
- Models - How models use runtime configs for discovery and auto-creation
- Templates - Template discovery and runtime config resolution
- Services Usage - Practical service configuration
- Model Caching - Download protocol strategy and cache architecture