API Reference¶
Packages¶
aim.eai.amd.com/v1alpha1¶
Package v1alpha1 contains API Schema definitions for the aim v1alpha1 API group.
Resource Types¶
- AIMArtifact
- AIMArtifactList
- AIMClusterModel
- AIMClusterModelList
- AIMClusterModelSource
- AIMClusterModelSourceList
- AIMClusterRuntimeConfig
- AIMClusterRuntimeConfigList
- AIMClusterServiceTemplate
- AIMClusterServiceTemplateList
- AIMModel
- AIMModelList
- AIMRuntimeConfig
- AIMRuntimeConfigList
- AIMService
- AIMServiceList
- AIMServiceTemplate
- AIMServiceTemplateList
- AIMTemplateCache
- AIMTemplateCacheList
AIMArtifact¶
AIMArtifact is the Schema for the artifacts API
Appears in: - AIMArtifactList
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMArtifact |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec AIMArtifactSpec |
|||
status AIMArtifactStatus |
AIMArtifactList¶
AIMArtifactList contains a list of AIMArtifact
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMArtifactList |
||
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
items AIMArtifact array |
AIMArtifactMode¶
Underlying type: string
AIMArtifactMode indicates the ownership mode of a artifact, derived from owner references.
Validation: - Enum: [Dedicated Shared]
Appears in: - AIMArtifactStatus
| Field | Description |
|---|---|
Dedicated |
ArtifactModeDedicated indicates the cache has owner references and will be garbage collected when its owners are deleted. |
Shared |
ArtifactModeShared indicates the cache has no owner references and persists independently, available for sharing across services. |
AIMArtifactSpec¶
AIMArtifactSpec defines the desired state of AIMArtifact
Appears in: - AIMArtifact
| Field | Description | Default | Validation |
|---|---|---|---|
sourceUri string |
SourceURI specifies the source location of the model to download. Supported protocols: hf:// (HuggingFace) and s3:// (S3-compatible storage). This field uniquely identifies the artifact and is immutable after creation. Example: hf://meta-llama/Llama-3-8B |
MinLength: 1 Pattern: ^(hf\|s3)://[^ \t\r\n]+$ |
|
modelId string |
ModelID is the canonical identifier in {org}/{name} format. Determines the cache download path: /workspace/cache/{modelId} For HuggingFace sources, this is typically derived from the URI (e.g., "meta-llama/Llama-3-8B"). For S3 sources, this must be explicitly provided (e.g., "my-team/fine-tuned-llama"). When not specified, derived from SourceURI for HuggingFace sources. |
Pattern: ^[a-zA-Z0-9_-]+/[a-zA-Z0-9._-]+$ Optional: {} |
|
storageClassName string |
StorageClassName specifies the storage class for the cache volume. When not specified, uses the cluster default storage class. |
Optional: {} |
|
size Quantity |
Size specifies the size of the cache volume | Optional: {} |
|
env EnvVar array |
Env lists the environment variables to use for authentication when downloading models. These variables are used for authentication with model registries (e.g., HuggingFace tokens). |
Optional: {} |
|
modelDownloadImage string |
ModelDownloadImage specifies the container image used to download and initialize the artifact. This image runs as a job to download model artifacts from the source URI to the cache volume. When not specified, defaults to "ghcr.io/silogen/aim-artifact-downloader:0.2.0". |
ghcr.io/silogen/aim-artifact-downloader:0.2.0 | Optional: {} |
imagePullSecrets LocalObjectReference array |
ImagePullSecrets references secrets for pulling AIM container images. | Optional: {} |
|
runtimeConfigName string |
Name is the name of the runtime config to use for this resource. If a runtime config with this name exists both as a namespace and a cluster runtime config, the values are merged together, the namespace config taking priority over the cluster config when there are conflicts. If this field is empty or set to default, the namespace / clusterruntime config with the name default is used, if it exists. |
Optional: {} |
AIMArtifactStatus¶
AIMArtifactStatus defines the observed state of AIMArtifact
Appears in: - AIMArtifact
| Field | Description | Default | Validation |
|---|---|---|---|
observedGeneration integer |
|||
conditions Condition array |
Conditions represent the latest available observations of the artifact's state | ||
status AIMStatus |
Status represents the current status of the artifact | Pending | Enum: [Pending Progressing Ready Degraded Failed NotAvailable] |
progress DownloadProgress |
Progress represents the download progress when Status is Progressing | Optional: {} |
|
download DownloadState |
Download represents the current download attempt state, patched by the downloader pod. Shows which protocol is active, what attempt we're on, etc. |
Optional: {} |
|
displaySize string |
DisplaySize is the human-readable effective size (spec or discovered) | Optional: {} |
|
lastUsed Time |
LastUsed represents the last time a model was deployed that used this cache | ||
persistentVolumeClaim string |
PersistentVolumeClaim represents the name of the created PVC | ||
mode AIMArtifactMode |
Mode indicates the ownership mode of this artifact, derived from owner references. - Dedicated: Has owner references, will be garbage collected when owners are deleted. - Shared: No owner references, persists independently and can be shared. |
Enum: [Dedicated Shared] Optional: {} |
|
discoveredSizeBytes integer |
DiscoveredSizeBytes is the model size discovered via check-size job. Populated when spec.size is not provided. |
Optional: {} |
|
allocatedSize Quantity |
AllocatedSize is the actual PVC size requested (including headroom). | Optional: {} |
|
headroomPercent integer |
HeadroomPercent is the headroom percentage that was applied to the PVC size. | Optional: {} |
AIMCachingMode¶
Underlying type: string
AIMCachingMode controls caching behavior for a service. Canonical values are Dedicated and Shared. Legacy values are accepted for backward compatibility: - Always maps to Shared - Auto maps to Shared - Never maps to Dedicated
Validation: - Enum: [Dedicated Shared Auto Always Never]
Appears in: - AIMServiceCachingConfig
| Field | Description |
|---|---|
Dedicated |
CachingModeDedicated always creates service-owned dedicated caches/artifacts. |
Shared |
CachingModeShared reuses and creates shared caches/artifacts. |
Auto |
CachingModeAuto is deprecated legacy value that maps to Shared. |
Always |
CachingModeAlways is deprecated legacy value that maps to Shared. |
Never |
CachingModeNever is deprecated legacy value that maps to Dedicated. |
AIMClusterModel¶
AIMClusterModel is a cluster-scoped model catalog entry for AIM container images.
Cluster-scoped models can be referenced by AIMServices in any namespace, making them ideal for shared model deployments across teams and projects. Like namespace-scoped AIMModels, cluster models trigger discovery jobs to extract metadata and generate service templates.
When both cluster and namespace models exist for the same container image, services will preferentially use the namespace-scoped AIMModel when referenced by image URI.
Appears in: - AIMClusterModelList
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMClusterModel |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec AIMModelSpec |
|||
status AIMModelStatus |
AIMClusterModelList¶
AIMClusterModelList contains a list of AIMClusterModel.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMClusterModelList |
||
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
items AIMClusterModel array |
AIMClusterModelSource¶
AIMClusterModelSource automatically discovers and syncs AI model images from container registries.
Appears in: - AIMClusterModelSourceList
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMClusterModelSource |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec AIMClusterModelSourceSpec |
|||
status AIMClusterModelSourceStatus |
AIMClusterModelSourceList¶
AIMClusterModelSourceList contains a list of AIMClusterModelSource.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMClusterModelSourceList |
||
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
items AIMClusterModelSource array |
AIMClusterModelSourceSpec¶
AIMClusterModelSourceSpec defines the desired state of AIMClusterModelSource.
Appears in: - AIMClusterModelSource
| Field | Description | Default | Validation |
|---|---|---|---|
registry string |
Registry to sync from (e.g., docker.io, ghcr.io, gcr.io). Defaults to docker.io if not specified. |
docker.io | Optional: {} |
imagePullSecrets LocalObjectReference array |
ImagePullSecrets contains references to secrets for authenticating to private registries. Secrets must exist in the operator namespace (typically aim-system). Used for both registry catalog listing and image metadata extraction. |
Optional: {} |
|
filters ModelSourceFilter array |
Filters define which images to discover and sync. Each filter specifies an image pattern with optional version constraints and exclusions. Multiple filters are combined with OR logic (any match includes the image). |
MaxItems: 100 MinItems: 1 |
|
syncInterval Duration |
SyncInterval defines how often to sync with the registry. Defaults to 1h. Minimum recommended interval is 15m to avoid rate limiting. Format: duration string (e.g., "30m", "1h", "2h30m"). |
1h | Optional: {} |
versions string array |
Versions specifies global semantic version constraints applied to all filters. Individual filters can override this with their own version constraints. Constraints use semver syntax: >=1.0.0, <2.0.0, ~1.2.0, ^1.0.0, etc. Non-semver tags (e.g., "latest", "dev") are silently skipped. Version ranges work on all registries (including ghcr.io, gcr.io) when combined with exact repository names (no wildcards). The controller uses the Tags List API to fetch all tags for the repository and filters them by the semver constraint. Example: registry=ghcr.io, filters=[{image: "silogen/aim-llama"}], versions=[">=1.0.0"] will fetch all tags from ghcr.io/silogen/aim-llama and include only those >=1.0.0. |
Optional: {} |
|
maxModels integer |
MaxModels is the maximum number of AIMClusterModel resources to create from this source. Once this limit is reached, no new models will be created, even if more matching images are discovered. Existing models are never deleted. This prevents runaway model creation from overly broad filters. |
100 | Maximum: 10000 Minimum: 1 Optional: {} |
AIMClusterModelSourceStatus¶
AIMClusterModelSourceStatus defines the observed state of AIMClusterModelSource.
Appears in: - AIMClusterModelSource
| Field | Description | Default | Validation |
|---|---|---|---|
status string |
Status represents the overall state of the model source. | Enum: [Pending Starting Progressing Ready Running Degraded NotAvailable Failed] Optional: {} |
|
lastSyncTime Time |
LastSyncTime is the timestamp of the last successful registry sync. Updated after each successful sync operation. |
Optional: {} |
|
discoveredModels integer |
DiscoveredModels is the count of AIMClusterModel resources managed by this source. Includes both existing and newly created models. |
Optional: {} |
|
availableModels integer |
AvailableModels is the total count of images discovered in the registry that match the filters. This may be higher than DiscoveredModels if maxModels limit was reached. |
Optional: {} |
|
modelsLimitReached boolean |
ModelsLimitReached indicates whether the maxModels limit has been reached. When true, no new models will be created even if more matching images are discovered. |
Optional: {} |
|
conditions Condition array |
Conditions represent the latest available observations of the source's state. Standard conditions: Ready, Syncing, RegistryReachable. |
Optional: {} |
|
observedGeneration integer |
ObservedGeneration reflects the generation of the most recently observed spec. | Optional: {} |
AIMClusterRuntimeConfig¶
AIMClusterRuntimeConfig is a cluster-scoped runtime configuration for AIM services, models, and templates.
Cluster-scoped runtime configs provide platform-wide defaults that apply to all namespaces, making them ideal for organization-level policies such as storage classes, discovery behavior, model creation scope, and routing configuration.
When both cluster and namespace runtime configs exist with the same name, the configs are merged, and the namespace-scoped AIMRuntimeConfig takes precedence for any field that is set in both.
Appears in: - AIMClusterRuntimeConfigList
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMClusterRuntimeConfig |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec AIMClusterRuntimeConfigSpec |
|||
status AIMRuntimeConfigStatus |
AIMClusterRuntimeConfigList¶
AIMClusterRuntimeConfigList contains a list of AIMClusterRuntimeConfig.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMClusterRuntimeConfigList |
||
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
items AIMClusterRuntimeConfig array |
AIMClusterRuntimeConfigSpec¶
AIMClusterRuntimeConfigSpec defines cluster-wide defaults for AIM resources.
Appears in: - AIMClusterRuntimeConfig
| Field | Description | Default | Validation |
|---|---|---|---|
storage AIMStorageConfig |
Storage configures storage defaults for this service's PVCs and caches. When set, these values override namespace/cluster runtime config defaults. |
Optional: {} |
|
routing AIMRuntimeRoutingConfig |
Routing controls HTTP routing configuration for this service. When set, these values override namespace/cluster runtime config defaults. |
Optional: {} |
|
env EnvVar array |
Env specifies environment variables for inference containers. When set on AIMService, these take highest precedence in the merge hierarchy. When set on RuntimeConfig, these provide namespace/cluster-level defaults. Merge order (highest to lowest): Service.Env > Template.Env > RuntimeConfig.Env > Profile.Env |
Optional: {} |
|
model AIMModelConfig |
Model controls model creation and discovery defaults. This field only applies to RuntimeConfig/ClusterRuntimeConfig and is not available for services. |
Optional: {} |
|
labelPropagation AIMRuntimeConfigLabelPropagationSpec |
LabelPropagation controls how labels from parent AIM resources are propagated to child resources. When enabled, labels matching the specified patterns are automatically copied from parent resources (e.g., AIMService, AIMTemplateCache) to their child resources (e.g., Deployments, Services, PVCs). This is useful for propagating organizational metadata like cost centers, team identifiers, or compliance labels through the resource hierarchy. |
Optional: {} |
|
defaultStorageClassName string |
DEPRECATED: Use Storage.DefaultStorageClassName instead. This field will be removed in a future version. For backward compatibility, if this field is set and Storage.DefaultStorageClassName is not set, the value will be automatically migrated. |
Optional: {} |
|
pvcHeadroomPercent integer |
DEPRECATED: Use Storage.PVCHeadroomPercent instead. This field will be removed in a future version. For backward compatibility, if this field is set and Storage.PVCHeadroomPercent is not set, the value will be automatically migrated. |
Optional: {} |
AIMClusterServiceTemplate¶
AIMClusterServiceTemplate is a cluster-scoped template that defines runtime profiles for AIM services.
Cluster-scoped templates can be used by AIMServices in any namespace, making them ideal for platform-wide model configurations that should be shared across teams and projects. Unlike namespace-scoped AIMServiceTemplates, cluster templates do not support caching configuration and must be managed by cluster administrators, since caches themselves are namespace-scoped.
When both cluster and namespace templates exist with the same name, the namespace-scoped template takes precedence for services in that namespace.
Appears in: - AIMClusterServiceTemplateList
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMClusterServiceTemplate |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec AIMClusterServiceTemplateSpec |
|||
status AIMServiceTemplateStatus |
AIMClusterServiceTemplateList¶
AIMClusterServiceTemplateList contains a list of AIMClusterServiceTemplate.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMClusterServiceTemplateList |
||
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
items AIMClusterServiceTemplate array |
AIMClusterServiceTemplateSpec¶
AIMClusterServiceTemplateSpec defines the desired state of AIMClusterServiceTemplate (cluster-scoped).
A cluster-scoped template that selects a runtime profile for a given AIM model.
Appears in: - AIMClusterServiceTemplate
| Field | Description | Default | Validation |
|---|---|---|---|
modelName string |
ModelName is the model name. Matches metadata.name of an AIMModel or AIMClusterModel. Immutable.Example: meta/llama-3-8b:1.1+20240915 |
MinLength: 1 |
|
metric AIMMetric |
Metric selects the optimization goal. - latency: prioritize low end‑to‑end latency- throughput: prioritize sustained requests/second |
Enum: [latency throughput] Optional: {} |
|
precision AIMPrecision |
Precision selects the numeric precision used by the runtime. | Enum: [auto fp4 fp8 fp16 fp32 bf16 int4 int8] Optional: {} |
|
hardware AIMHardwareRequirements |
Hardware specifies GPU and CPU requirements for each replica. For GPU models, defines the GPU count and model types required for deployment. For CPU-only models, defines CPU resource requirements. This field is immutable after creation. |
Optional: {} |
|
runtimeConfigName string |
Name is the name of the runtime config to use for this resource. If a runtime config with this name exists both as a namespace and a cluster runtime config, the values are merged together, the namespace config taking priority over the cluster config when there are conflicts. If this field is empty or set to default, the namespace / clusterruntime config with the name default is used, if it exists. |
Optional: {} |
|
imagePullSecrets LocalObjectReference array |
ImagePullSecrets lists secrets containing credentials for pulling container images. These secrets are used for: - Discovery dry-run jobs that inspect the model container - Pulling the image for inference services The secrets are merged with any model or runtime config defaults. For namespace-scoped templates, secrets must exist in the same namespace. For cluster-scoped templates, secrets must exist in the operator namespace. |
Optional: {} |
|
serviceAccountName string |
ServiceAccountName specifies the Kubernetes service account to use for workloads related to this template. This includes discovery dry-run jobs and inference services created from this template. If empty, the default service account for the namespace is used. |
Optional: {} |
|
resources ResourceRequirements |
Resources defines the default container resource requirements applied to services derived from this template. Service-specific values override the template defaults. |
Optional: {} |
|
modelSources AIMModelSource array |
ModelSources specifies the model sources required to run this template. When provided, the discovery dry-run will be skipped and these sources will be used directly. This allows users to explicitly declare model dependencies without requiring a discovery job. If omitted, a discovery job will be run to automatically determine the required model sources. |
Optional: {} |
|
profileId string |
ProfileId is the specific AIM profile ID that this template should use. When set, the discovery job will be instructed to use this specific profile. |
Optional: {} |
|
type AIMProfileType |
Type indicates the optimization level of this template. - optimized: Template has been tuned for performance - preview: Template is experimental/pre-release - unoptimized: Default, no specific optimizations applied When nil, the type is determined by discovery. When set, overrides discovery. |
Enum: [optimized preview unoptimized] Optional: {} |
|
env EnvVar array |
Env specifies environment variables for inference containers. These variables are passed to the inference runtime and can be used to configure runtime behavior, authentication, or other settings. |
Optional: {} |
AIMCpuRequirements¶
AIMCpuRequirements specifies CPU resource requirements.
Appears in: - AIMHardwareRequirements
| Field | Description | Default | Validation |
|---|---|---|---|
requests Quantity |
Requests is the number of CPU cores to request. Required and must be > 0. | Required: {} |
|
limits Quantity |
Limits is the maximum number of CPU cores to allow. | Optional: {} |
AIMCustomModelSpec¶
AIMCustomModelSpec contains configuration for custom models. These fields are only used when modelSources is specified (custom models). For image-based models, these settings come from discovery.
Appears in: - AIMModelSpec
| Field | Description | Default | Validation |
|---|---|---|---|
hardware AIMHardwareRequirements |
Hardware specifies default hardware requirements for all templates. Individual templates can override these defaults. Required when modelSources is set and customTemplates is empty. |
Optional: {} |
|
type AIMProfileType |
Type specifies default type for all templates. Individual templates can override this default. When nil, templates default to "unoptimized". |
Enum: [optimized preview unoptimized] Optional: {} |
AIMCustomTemplate¶
AIMCustomTemplate defines a custom template configuration for a model. When modelSources are specified directly on AIMModel, customTemplates allow defining explicit hardware requirements and profiles, skipping the discovery job.
Appears in: - AIMModelSpec
| Field | Description | Default | Validation |
|---|---|---|---|
name string |
Name is the template name. If not provided, auto-generated from model name + profile. | MaxLength: 63 Optional: {} |
|
type AIMProfileType |
Type indicates the optimization status of this template. - optimized: Template has been tuned for performance - preview: Template is experimental/pre-release - unoptimized: Default, no specific optimizations applied |
unoptimized | Enum: [optimized preview unoptimized] Optional: {} |
env EnvVar array |
Env specifies environment variable overrides when this template is selected. | MaxItems: 64 Optional: {} |
|
hardware AIMHardwareRequirements |
Hardware specifies GPU and CPU requirements for this template. Optional when spec.hardware is set (inherits from spec). When both are set, values are merged field-by-field with template taking precedence. |
Optional: {} |
|
profile AIMTemplateProfile |
Profile declares runtime profile variables for template selection. Used when multiple templates exist to select based on metric/precision. |
Optional: {} |
AIMDiscoveryProfileMetadata¶
AIMDiscoveryProfileMetadata describes the characteristics of a discovered deployment profile.
Appears in: - AIMDiscoveryProfile
| Field | Description | Default | Validation |
|---|---|---|---|
engine string |
Engine identifies the inference engine used for this profile (e.g., "vllm", "tgi"). | Optional: {} |
|
gpu string |
GPU specifies the GPU model this profile is optimized for (e.g., "MI300X", "MI325X"). | Optional: {} |
|
gpu_count integer |
GPUCount indicates how many GPUs are required per replica for this profile. | Optional: {} |
|
metric AIMMetric |
Metric indicates the optimization goal for this profile ("latency" or "throughput"). | Enum: [latency throughput] Optional: {} |
|
precision AIMPrecision |
Precision specifies the numeric precision used in this profile (e.g., "fp16", "fp8"). | Enum: [auto fp4 fp8 fp16 fp32 bf16 int4 int8] Optional: {} |
|
type AIMProfileType |
Type specifies the optimization level of this profile (optimized, unoptimized, preview). | Enum: [optimized preview unoptimized] Optional: {} |
AIMGpuRequirements¶
AIMGpuRequirements specifies GPU resource requirements.
Appears in: - AIMHardwareRequirements
| Field | Description | Default | Validation |
|---|---|---|---|
requests integer |
Requests is the number of GPUs to set as requests/limits. Set to 0 to target GPU nodes without consuming GPU resources (useful for testing). |
Minimum: 0 Optional: {} |
|
model string |
Model limits deployment to a specific GPU model. Example: "MI300X" Cannot be combined with minVram. |
MaxLength: 64 Optional: {} |
|
minVram Quantity |
MinVRAM limits deployment to GPUs having at least this much VRAM. Used for capacity planning when the model size is known but any GPU with sufficient VRAM is acceptable. Cannot be combined with model. |
Optional: {} |
|
resourceName string |
ResourceName is the Kubernetes resource name for GPU resources. Defaults to "amd.com/gpu" if not specified. |
amd.com/gpu | Optional: {} |
AIMHardwareRequirements¶
AIMHardwareRequirements specifies compute resource requirements for custom models. Used in AIMModelSpec and AIMCustomTemplate to define GPU and CPU needs.
Appears in: - AIMClusterServiceTemplateSpec - AIMCustomModelSpec - AIMCustomTemplate - AIMRuntimeParameters - AIMServiceModelCustom - AIMServiceOverrides - AIMServiceTemplateSpec - AIMServiceTemplateSpecCommon - AIMServiceTemplateStatus
| Field | Description | Default | Validation |
|---|---|---|---|
gpu AIMGpuRequirements |
GPU specifies GPU requirements. If not set, no GPUs are requested (CPU-only model). | Optional: {} |
|
cpu AIMCpuRequirements |
CPU specifies CPU requirements. | Optional: {} |
AIMMetric¶
Underlying type: string
AIMMetric enumerates the targeted service characteristic
Validation: - Enum: [latency throughput]
Appears in: - AIMClusterServiceTemplateSpec - AIMDiscoveryProfileMetadata - AIMProfileMetadata - AIMRuntimeParameters - AIMServiceOverrides - AIMServiceTemplateSpec - AIMServiceTemplateSpecCommon - AIMTemplateProfile
| Field | Description |
|---|---|
latency |
|
throughput |
AIMModel¶
AIMModel is the Schema for namespace-scoped AIM model catalog entries.
Appears in: - AIMModelList
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMModel |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec AIMModelSpec |
|||
status AIMModelStatus |
AIMModelConfig¶
Appears in: - AIMClusterRuntimeConfigSpec - AIMRuntimeConfigCommon - AIMRuntimeConfigSpec
| Field | Description | Default | Validation |
|---|---|---|---|
autoDiscovery boolean |
AutoDiscovery controls whether models run discovery by default. When true, models run discovery jobs to extract metadata and auto-create templates. When false, discovery is skipped. Discovery failures are non-fatal and reported via conditions. |
Optional: {} |
AIMModelDiscoveryConfig¶
AIMModelDiscoveryConfig controls discovery behavior for a model.
Appears in: - AIMModelSpec
| Field | Description | Default | Validation |
|---|---|---|---|
extractMetadata boolean |
ExtractMetadata controls whether metadata extraction runs for this model. During metadata extraction, the controller connects to the image registry and extracts the image's labels. |
true | Optional: {} |
createServiceTemplates boolean |
CreateServiceTemplates controls whether (cluster) service templates are auto-created from the image metadata. | true | Optional: {} |
AIMModelList¶
AIMModelList contains a list of AIMModel.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMModelList |
||
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
items AIMModel array |
AIMModelSource¶
AIMModelSource describes a model artifact that must be downloaded for inference. Discovery extracts these from the container's configuration to enable caching and validation.
Appears in: - AIMClusterServiceTemplateSpec - AIMModelSpec - AIMServiceModelCustom - AIMServiceTemplateSpec - AIMServiceTemplateSpecCommon - AIMServiceTemplateStatus - AIMTemplateCacheSpec
| Field | Description | Default | Validation |
|---|---|---|---|
modelId string |
ModelID is the canonical identifier in {org}/{name} format. Determines the cache mount path: /workspace/cache/{modelId} For HuggingFace sources, this typically mirrors the URI path (e.g., meta-llama/Llama-3-8B). For S3 sources, users define their own organizational structure. |
Pattern: ^[a-zA-Z0-9_-]+/[a-zA-Z0-9._-]+$ Required: {} |
|
sourceUri string |
SourceURI is the location from which the model should be downloaded. Supported schemes: - hf://org/model - Hugging Face Hub model - s3://bucket/key - S3-compatible storage |
Pattern: ^(hf\|s3)://[^ \t\r\n]+$ |
|
size Quantity |
Size is the expected storage space required for this model artifact. Used for PVC sizing and capacity planning during cache creation. Optional - if not specified, the download job will discover the size automatically. Can be set explicitly to pre-allocate storage or override auto-discovery. |
Optional: {} |
|
env EnvVar array |
Env specifies per-source credential overrides. These variables are used for authentication when downloading this specific source. Takes precedence over base-level env for the same variable name. |
Optional: {} |
AIMModelSourceType¶
Underlying type: string
AIMModelSourceType indicates how a model's artifacts are sourced.
Validation: - Enum: [Image Custom]
Appears in: - AIMModelStatus
| Field | Description |
|---|---|
Image |
AIMModelSourceTypeImage indicates the model is discovered from container image labels. |
Custom |
AIMModelSourceTypeCustom indicates the model uses explicit spec.modelSources. |
AIMModelSpec¶
AIMModelSpec defines the desired state of AIMModel.
Appears in: - AIMClusterModel - AIMModel
| Field | Description | Default | Validation |
|---|---|---|---|
image string |
Image is the container image URI for this AIM model. This image is inspected by the operator to select runtime profiles used by templates. Discovery behavior is controlled by the discovery field and runtime config's AutoDiscovery setting. |
MinLength: 1 |
|
discovery AIMModelDiscoveryConfig |
Discovery controls discovery behavior for this model. When unset, uses runtime config defaults. |
Optional: {} |
|
defaultServiceTemplate string |
DefaultServiceTemplate specifies the default AIMServiceTemplate to use when creating services for this model. When set, services that reference this model will use this template if no template is explicitly specified. If this is not set, a template will be automatically selected. |
Optional: {} |
|
custom AIMCustomModelSpec |
Custom contains configuration for custom models (models with inline modelSources). Only used when modelSources are specified; ignored for image-based models. |
Optional: {} |
|
customTemplates AIMCustomTemplate array |
CustomTemplates defines explicit template configurations for this model. These templates are created directly without running a discovery job. Can be used with or without modelSources to define custom deployment configurations. If omitted when modelSources is set, a single template is auto-generated using the custom.hardware requirements. |
MaxItems: 16 Optional: {} |
|
modelSources AIMModelSource array |
ModelSources specifies the model sources to use for this model. When specified, these sources are used instead of auto-discovery from the container image. This enables pre-creating custom models with explicit model sources. The size field is optional - if not specified, it will be discovered by the download job. AIM runtime currently supports only one model source. |
MaxItems: 1 Optional: {} |
|
runtimeConfigName string |
Name is the name of the runtime config to use for this resource. If a runtime config with this name exists both as a namespace and a cluster runtime config, the values are merged together, the namespace config taking priority over the cluster config when there are conflicts. If this field is empty or set to default, the namespace / clusterruntime config with the name default is used, if it exists. |
Optional: {} |
|
imagePullSecrets LocalObjectReference array |
ImagePullSecrets lists secrets containing credentials for pulling the model container image. These secrets are used for: - OCI registry metadata extraction during discovery - Pulling the image for inference services The secrets are merged with any runtime config defaults. For namespace-scoped models, secrets must exist in the same namespace. For cluster-scoped models, secrets must exist in the operator namespace. |
Optional: {} |
|
env EnvVar array |
Env specifies environment variables for authentication during model discovery and metadata extraction. These variables are used for authentication with model registries (e.g., HuggingFace tokens). |
Optional: {} |
|
serviceAccountName string |
ServiceAccountName specifies the Kubernetes service account to use for workloads related to this model. This includes metadata extraction jobs and any other model-related operations. If empty, the default service account for the namespace is used. |
Optional: {} |
|
resources ResourceRequirements |
Resources defines the default resource requirements for services using this model. Template- or service-level values override these defaults. |
Optional: {} |
|
imageMetadata ImageMetadata |
ImageMetadata is the metadata that is used to determine which recommended service templates to create, and to drive clients with richer metadata regarding this particular model. For most cases the user does not need to set this field manually, for images that have the supported labels embedded in them the AIM(Cluster)Model.status.imageMetadata field is automatically filled from the container image labels.This field is intended to be used when there are network restrictions, or in other similar situations. If this field is set, the remote extraction will not be performed at all. |
AIMModelStatus¶
AIMModelStatus defines the observed state of AIMModel.
Appears in: - AIMClusterModel - AIMModel
| Field | Description | Default | Validation |
|---|---|---|---|
observedGeneration integer |
ObservedGeneration is the most recent generation observed by the controller | ||
status AIMStatus |
Status represents the overall status of the image based on its templates | Pending | Enum: [Pending Progressing Ready Degraded Failed NotAvailable] |
conditions Condition array |
Conditions represent the latest available observations of the model's state | ||
resolvedRuntimeConfig AIMResolvedReference |
ResolvedRuntimeConfig captures metadata about the runtime config that was resolved. | Optional: {} |
|
imageMetadata ImageMetadata |
ImageMetadata is the metadata extracted from an AIM image | Optional: {} |
|
sourceType AIMModelSourceType |
SourceType indicates how this model's artifacts are sourced. - "Image": Model discovered from container image labels - "Custom": Model uses explicit spec.modelSources Set by the controller based on whether spec.modelSources is populated. |
Enum: [Image Custom] Optional: {} |
AIMPrecision¶
Underlying type: string
AIMPrecision enumerates supported numeric precisions
Validation: - Enum: [auto fp4 fp8 fp16 fp32 bf16 int4 int8]
Appears in: - AIMClusterServiceTemplateSpec - AIMDiscoveryProfileMetadata - AIMProfileMetadata - AIMRuntimeParameters - AIMServiceOverrides - AIMServiceTemplateSpec - AIMServiceTemplateSpecCommon - AIMTemplateProfile
| Field | Description |
|---|---|
auto |
|
fp4 |
|
fp8 |
|
fp16 |
|
fp32 |
|
bf16 |
|
int4 |
|
int8 |
AIMProfile¶
AIMProfile contains the cached discovery results for a template. This is the processed and validated version of AIMDiscoveryProfile that is stored in the template's status after successful discovery.
The profile serves as a cache of runtime configuration, eliminating the need to re-run discovery for each service that uses this template. Services and caching mechanisms reference this cached profile for deployment parameters and model sources.
See discovery.go for AIMDiscoveryProfile (the raw discovery output) and the relationship between these types.
Appears in: - AIMServiceTemplateStatus
| Field | Description | Default | Validation |
|---|---|---|---|
engine_args JSON |
EngineArgs contains runtime-specific engine configuration as a free-form JSON object. The structure depends on the inference engine being used (e.g., vLLM, TGI). These arguments are passed to the runtime container to configure model loading and inference. |
Schemaless: {} |
|
env_vars object (keys:string, values:string) |
EnvVars contains environment variables required by the runtime for this profile. These may include engine-specific settings, optimization flags, or hardware configuration. |
Optional: {} |
|
metadata AIMProfileMetadata |
Refer to Kubernetes API documentation for fields of metadata. |
||
originalDiscoveryOutput JSON |
OriginalDiscoveryOutput contains the raw discovery job JSON output. This preserves the complete discovery result from the dry-run container, including all fields that may not be mapped to structured fields above. |
Schemaless: {} Optional: {} |
AIMProfileMetadata¶
AIMProfileMetadata describes the characteristics of a cached deployment profile. This is identical to AIMDiscoveryProfileMetadata but exists in the template status namespace.
Appears in: - AIMProfile
| Field | Description | Default | Validation |
|---|---|---|---|
engine string |
Engine identifies the inference engine used for this profile (e.g., "vllm", "tgi"). | Optional: {} |
|
gpu string |
GPU specifies the GPU model this profile is optimized for (e.g., "MI300X", "MI325X"). | Optional: {} |
|
gpuCount integer |
GPUCount indicates how many GPUs are required per replica for this profile. | Optional: {} |
|
metric AIMMetric |
Metric indicates the optimization goal for this profile ("latency" or "throughput"). | Enum: [latency throughput] Optional: {} |
|
precision AIMPrecision |
Precision specifies the numeric precision used in this profile (e.g., "fp16", "fp8"). | Enum: [auto fp4 fp8 fp16 fp32 bf16 int4 int8] Optional: {} |
|
type AIMProfileType |
Type indicates the optimization level of this profile (optimized, preview, unoptimized). | Enum: [optimized preview unoptimized] Optional: {} |
AIMProfileType¶
Underlying type: string
AIMProfileType indicates the optimization level of a deployment profile.
Validation: - Enum: [optimized preview unoptimized]
Appears in: - AIMClusterServiceTemplateSpec - AIMCustomModelSpec - AIMCustomTemplate - AIMDiscoveryProfileMetadata - AIMProfileMetadata - AIMServiceTemplateSpec - AIMServiceTemplateSpecCommon
| Field | Description |
|---|---|
optimized |
AIMProfileTypeOptimized indicates the profile has been fully optimized. |
preview |
AIMProfileTypePreview indicates the profile is in preview/beta state. |
unoptimized |
AIMProfileTypeUnoptimized indicates the profile has not been optimized. |
AIMResolutionScope¶
Underlying type: string
AIMResolutionScope describes the scope of a resolved reference.
Validation: - Enum: [Namespace Cluster Merged Unknown]
Appears in: - AIMResolvedReference
| Field | Description |
|---|---|
Namespace |
AIMResolutionScopeNamespace denotes a namespace-scoped resource. |
Cluster |
AIMResolutionScopeCluster denotes a cluster-scoped resource. |
Merged |
AIMResolutionScopeMerged denotes that both cluster and namespace configs were merged. |
Unknown |
AIMResolutionScopeUnknown denotes that the scope could not be determined. |
AIMResolvedArtifact¶
Appears in: - AIMTemplateCacheStatus
| Field | Description | Default | Validation |
|---|---|---|---|
uid string |
UID of the AIMArtifact resource | ||
name string |
Name of the AIMArtifact resource | ||
model string |
Model is the name of the model that is cached | ||
status AIMStatus |
Status of the artifact | ||
persistentVolumeClaim string |
PersistentVolumeClaim name if available | ||
mountPoint string |
MountPoint is the mount point for the artifact |
AIMResolvedReference¶
AIMResolvedReference captures metadata about a resolved reference.
Appears in: - AIMModelStatus - AIMServiceCacheStatus - AIMServiceStatus - AIMServiceTemplateStatus - AIMTemplateCacheStatus
| Field | Description | Default | Validation |
|---|---|---|---|
name string |
Name is the resource name that satisfied the reference. | ||
namespace string |
Namespace identifies where the resource was found when namespace-scoped. Empty indicates a cluster-scoped resource. |
||
scope AIMResolutionScope |
Scope indicates whether the resolved resource was namespace or cluster scoped. | Enum: [Namespace Cluster Merged Unknown] |
|
kind string |
Kind is the fully-qualified kind of the resolved reference, when known. | Optional: {} |
|
uid UID |
UID captures the unique identifier of the resolved reference, when known. | Optional: {} |
AIMRuntimeConfig¶
AIMRuntimeConfig is the Schema for namespace-scoped AIM runtime configurations.
Appears in: - AIMRuntimeConfigList
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMRuntimeConfig |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec AIMRuntimeConfigSpec |
|||
status AIMRuntimeConfigStatus |
AIMRuntimeConfigCommon¶
AIMRuntimeConfigCommon captures configuration fields shared across cluster and namespace scopes. These settings apply to both AIMRuntimeConfig (namespace-scoped) and AIMClusterRuntimeConfig (cluster-scoped). It embeds AIMServiceRuntimeConfig which contains fields that can also be overridden at the service level.
Appears in: - AIMClusterRuntimeConfigSpec - AIMRuntimeConfigSpec
| Field | Description | Default | Validation |
|---|---|---|---|
storage AIMStorageConfig |
Storage configures storage defaults for this service's PVCs and caches. When set, these values override namespace/cluster runtime config defaults. |
Optional: {} |
|
routing AIMRuntimeRoutingConfig |
Routing controls HTTP routing configuration for this service. When set, these values override namespace/cluster runtime config defaults. |
Optional: {} |
|
env EnvVar array |
Env specifies environment variables for inference containers. When set on AIMService, these take highest precedence in the merge hierarchy. When set on RuntimeConfig, these provide namespace/cluster-level defaults. Merge order (highest to lowest): Service.Env > Template.Env > RuntimeConfig.Env > Profile.Env |
Optional: {} |
|
model AIMModelConfig |
Model controls model creation and discovery defaults. This field only applies to RuntimeConfig/ClusterRuntimeConfig and is not available for services. |
Optional: {} |
|
labelPropagation AIMRuntimeConfigLabelPropagationSpec |
LabelPropagation controls how labels from parent AIM resources are propagated to child resources. When enabled, labels matching the specified patterns are automatically copied from parent resources (e.g., AIMService, AIMTemplateCache) to their child resources (e.g., Deployments, Services, PVCs). This is useful for propagating organizational metadata like cost centers, team identifiers, or compliance labels through the resource hierarchy. |
Optional: {} |
|
defaultStorageClassName string |
DEPRECATED: Use Storage.DefaultStorageClassName instead. This field will be removed in a future version. For backward compatibility, if this field is set and Storage.DefaultStorageClassName is not set, the value will be automatically migrated. |
Optional: {} |
|
pvcHeadroomPercent integer |
DEPRECATED: Use Storage.PVCHeadroomPercent instead. This field will be removed in a future version. For backward compatibility, if this field is set and Storage.PVCHeadroomPercent is not set, the value will be automatically migrated. |
Optional: {} |
AIMRuntimeConfigLabelPropagationSpec¶
Appears in: - AIMClusterRuntimeConfigSpec - AIMRuntimeConfigCommon - AIMRuntimeConfigSpec
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled, if true, allows propagating parent labels to all child resources it creates directly Only label keys that match the ones in Match are propagated. |
false | Optional: {} |
match string array |
Match is a list of label keys that will be propagated to any child resources created. Wildcards are supported, so for example org.my/my-key-* would match any label with that prefix. |
Optional: {} |
AIMRuntimeConfigList¶
AIMRuntimeConfigList contains a list of AIMRuntimeConfig.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMRuntimeConfigList |
||
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
items AIMRuntimeConfig array |
AIMRuntimeConfigSpec¶
AIMRuntimeConfigSpec defines namespace-scoped overrides for AIM resources.
Appears in: - AIMRuntimeConfig
| Field | Description | Default | Validation |
|---|---|---|---|
storage AIMStorageConfig |
Storage configures storage defaults for this service's PVCs and caches. When set, these values override namespace/cluster runtime config defaults. |
Optional: {} |
|
routing AIMRuntimeRoutingConfig |
Routing controls HTTP routing configuration for this service. When set, these values override namespace/cluster runtime config defaults. |
Optional: {} |
|
env EnvVar array |
Env specifies environment variables for inference containers. When set on AIMService, these take highest precedence in the merge hierarchy. When set on RuntimeConfig, these provide namespace/cluster-level defaults. Merge order (highest to lowest): Service.Env > Template.Env > RuntimeConfig.Env > Profile.Env |
Optional: {} |
|
model AIMModelConfig |
Model controls model creation and discovery defaults. This field only applies to RuntimeConfig/ClusterRuntimeConfig and is not available for services. |
Optional: {} |
|
labelPropagation AIMRuntimeConfigLabelPropagationSpec |
LabelPropagation controls how labels from parent AIM resources are propagated to child resources. When enabled, labels matching the specified patterns are automatically copied from parent resources (e.g., AIMService, AIMTemplateCache) to their child resources (e.g., Deployments, Services, PVCs). This is useful for propagating organizational metadata like cost centers, team identifiers, or compliance labels through the resource hierarchy. |
Optional: {} |
|
defaultStorageClassName string |
DEPRECATED: Use Storage.DefaultStorageClassName instead. This field will be removed in a future version. For backward compatibility, if this field is set and Storage.DefaultStorageClassName is not set, the value will be automatically migrated. |
Optional: {} |
|
pvcHeadroomPercent integer |
DEPRECATED: Use Storage.PVCHeadroomPercent instead. This field will be removed in a future version. For backward compatibility, if this field is set and Storage.PVCHeadroomPercent is not set, the value will be automatically migrated. |
Optional: {} |
AIMRuntimeConfigStatus¶
AIMRuntimeConfigStatus records the resolved config reference surfaced to consumers.
Appears in: - AIMClusterRuntimeConfig - AIMRuntimeConfig
| Field | Description | Default | Validation |
|---|---|---|---|
observedGeneration integer |
ObservedGeneration is the last reconciled generation. | ||
conditions Condition array |
Conditions communicate reconciliation progress. |
AIMRuntimeParameters¶
AIMRuntimeParameters contains the runtime configuration parameters shared across templates and services. Fields use pointers to allow optional usage in different contexts (required in templates, optional in service overrides).
Appears in: - AIMClusterServiceTemplateSpec - AIMServiceOverrides - AIMServiceTemplateSpec - AIMServiceTemplateSpecCommon
| Field | Description | Default | Validation |
|---|---|---|---|
metric AIMMetric |
Metric selects the optimization goal. - latency: prioritize low end‑to‑end latency- throughput: prioritize sustained requests/second |
Enum: [latency throughput] Optional: {} |
|
precision AIMPrecision |
Precision selects the numeric precision used by the runtime. | Enum: [auto fp4 fp8 fp16 fp32 bf16 int4 int8] Optional: {} |
|
hardware AIMHardwareRequirements |
Hardware specifies GPU and CPU requirements for each replica. For GPU models, defines the GPU count and model types required for deployment. For CPU-only models, defines CPU resource requirements. This field is immutable after creation. |
Optional: {} |
AIMRuntimeRoutingConfig¶
AIMRuntimeRoutingConfig configures HTTP routing defaults for inference services. These settings control how Gateway API HTTPRoutes are created and configured.
Appears in: - AIMClusterRuntimeConfigSpec - AIMRuntimeConfigCommon - AIMRuntimeConfigSpec - AIMServiceRuntimeConfig - AIMServiceSpec
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled controls whether HTTP routing is managed for inference services using this config. When true, the operator creates HTTPRoute resources for services that reference this config. When false or unset, routing must be explicitly enabled on each service. This provides a namespace or cluster-wide default that individual services can override. |
Optional: {} |
|
gatewayRef ParentReference |
GatewayRef specifies the Gateway API Gateway resource that should receive HTTPRoutes. This identifies the parent gateway for routing traffic to inference services. The gateway can be in any namespace (cross-namespace references are supported). If routing is enabled but GatewayRef is not specified, service reconciliation will fail with a validation error. |
Optional: {} |
|
pathTemplate string |
PathTemplate defines the HTTP path template for routes, evaluated using JSONPath expressions. The template is rendered against the AIMService object to generate unique paths. Example templates: - /\{.metadata.namespace\}/\{.metadata.name\} - namespace and service name- /\{.metadata.namespace\}/\{.metadata.labels['team']\}/inference - with label- /models/\{.metadata.name\} - based on service nameThe template must: - Use valid JSONPath expressions wrapped in {...} - Reference fields that exist on the service - Produce a path ≤ 200 characters after rendering - Result in valid URL path segments (lowercase, RFC 1123 compliant) If evaluation fails, the service enters Degraded state with PathTemplateInvalid reason. Individual services can override this template via spec.routing.pathTemplate. |
Optional: {} |
|
requestTimeout Duration |
RequestTimeout defines the HTTP request timeout for routes. This sets the maximum duration for a request to complete before timing out. The timeout applies to the entire request/response cycle. If not specified, no timeout is set on the route. Individual services can override this value via spec.routing.requestTimeout. |
Optional: {} |
|
annotations object (keys:string, values:string) |
Annotations defines default annotations to add to all HTTPRoute resources. Services can add additional annotations or override these via spec.routingAnnotations. When both are specified, service annotations take precedence for conflicting keys. Common use cases include ingress controller settings, rate limiting, monitoring labels, and security policies that should apply to all services using this config. |
Optional: {} |
AIMService¶
AIMService manages a KServe-based AIM inference service for the selected model and template. Note: KServe uses {name}-{namespace} format which must not exceed 63 characters. This constraint is validated at runtime since CEL cannot access metadata.namespace.
Appears in: - AIMServiceList
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMService |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec AIMServiceSpec |
|||
status AIMServiceStatus |
AIMServiceAutoScaling¶
AIMServiceAutoScaling configures KEDA-based autoscaling with custom metrics. This enables automatic scaling based on metrics collected from OpenTelemetry.
Appears in: - AIMServiceSpec
| Field | Description | Default | Validation |
|---|---|---|---|
metrics AIMServiceMetricsSpec array |
Metrics is a list of metrics to be used for autoscaling. Each metric defines a source (PodMetric) and target values. |
Optional: {} |
AIMServiceCacheStatus¶
AIMServiceCacheStatus captures cache-related status for an AIMService.
Appears in: - AIMServiceStatus
| Field | Description | Default | Validation |
|---|---|---|---|
templateCacheRef AIMResolvedReference |
TemplateCacheRef references the TemplateCache being used, if any. | Optional: {} |
|
retryAttempts integer |
RetryAttempts tracks how many times this service has attempted to retry a failed cache. Each service gets exactly one retry attempt. When a TemplateCache enters Failed state, this counter is incremented from 0 to 1 after deleting failed Artifacts. If the retry fails (cache enters Failed again with attempts == 1), the service degrades. |
Optional: {} |
AIMServiceCachingConfig¶
AIMServiceCachingConfig controls caching behavior for a service.
Appears in: - AIMServiceSpec
| Field | Description | Default | Validation |
|---|---|---|---|
mode AIMCachingMode |
Mode controls when to use caching. Canonical values: - Shared (default): reuse/create shared cache assets - Dedicated: create service-owned dedicated cache assets Legacy values are accepted and normalized: - Always -> Shared - Auto -> Shared - Never -> Dedicated |
Shared | Enum: [Dedicated Shared Auto Always Never] Optional: {} |
AIMServiceList¶
AIMServiceList contains a list of AIMService.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMServiceList |
||
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
items AIMService array |
AIMServiceMetricTarget¶
AIMServiceMetricTarget defines the target value for a metric. Specifies how the metric value should be interpreted and what target to maintain.
Appears in: - AIMServicePodMetricSource
| Field | Description | Default | Validation |
|---|---|---|---|
type string |
Type specifies how to interpret the metric value. "Value": absolute value target (use Value field) "AverageValue": average value across all pods (use AverageValue field) "Utilization": percentage utilization for resource metrics (use AverageUtilization field) |
Enum: [Value AverageValue Utilization] |
|
value string |
Value is the target value of the metric (as a quantity). Used when Type is "Value". Example: "1" for 1 request, "100m" for 100 millicores |
Optional: {} |
|
averageValue string |
AverageValue is the target value of the average of the metric across all relevant pods (as a quantity). Used when Type is "AverageValue". Example: "100m" for 100 millicores per pod |
Optional: {} |
|
averageUtilization integer |
AverageUtilization is the target value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods. Used when Type is "Utilization". Only valid for Resource metric source type. Example: 80 for 80% utilization |
Optional: {} |
AIMServiceMetricsSpec¶
AIMServiceMetricsSpec defines a single metric for autoscaling. Specifies the metric source type and configuration.
Appears in: - AIMServiceAutoScaling
| Field | Description | Default | Validation |
|---|---|---|---|
type string |
Type is the type of metric source. Valid values: "PodMetric" (per-pod custom metrics). |
Enum: [PodMetric] |
|
podmetric AIMServicePodMetricSource |
PodMetric refers to a metric describing each pod in the current scale target. Used when Type is "PodMetric". Supports backends like OpenTelemetry for custom metrics. |
Optional: {} |
AIMServiceModel¶
AIMServiceModel specifies which model to deploy. Exactly one field must be set.
Appears in: - AIMServiceSpec
| Field | Description | Default | Validation |
|---|---|---|---|
name string |
Name references an existing AIMModel or AIMClusterModel by metadata.name. The controller looks for a namespace-scoped AIMModel first, then falls back to cluster-scoped AIMClusterModel. Example: meta-llama-3-8b |
Optional: {} |
|
image string |
Image specifies a container image URI directly. The controller searches for an existing model with this image, or creates one if none exists. Auto-created models are namespace-scoped and can be reused by other services. Example: ghcr.io/silogen/llama-3-8b:v1.2.0 |
Optional: {} |
|
custom AIMServiceModelCustom |
Custom specifies a custom model configuration with explicit base image, model sources, and hardware requirements. The controller will search for an existing matching AIMModel or auto-create one if not found. |
Optional: {} |
AIMServiceModelCustom¶
AIMServiceModelCustom specifies a custom model configuration with explicit base image, model sources, and hardware requirements. Used for ad-hoc custom model deployments.
Appears in: - AIMServiceModel
| Field | Description | Default | Validation |
|---|---|---|---|
baseImage string |
BaseImage is the container image URI for the AIM base image. This will be used as the image for the auto-created AIMModel. Example: ghcr.io/silogen/aim-base:0.7.0 |
Required: {} |
|
modelSources AIMModelSource array |
ModelSources specifies the model sources to use. The controller will search for or create an AIMModel with these sources. The size field is optional - if not specified, it will be discovered by the download job. AIM runtime currently supports only one model source. |
MaxItems: 1 MinItems: 1 Required: {} |
|
hardware AIMHardwareRequirements |
Hardware specifies the GPU and CPU requirements for this custom model. GPU is optional - if not set, no GPUs are requested (CPU-only model). |
Required: {} |
AIMServiceOverrides¶
AIMServiceOverrides allows overriding template parameters at the service level. All fields are optional. When specified, they override the corresponding values from the referenced AIMServiceTemplate.
Appears in: - AIMServiceSpec
| Field | Description | Default | Validation |
|---|---|---|---|
metric AIMMetric |
Metric selects the optimization goal. - latency: prioritize low end‑to‑end latency- throughput: prioritize sustained requests/second |
Enum: [latency throughput] Optional: {} |
|
precision AIMPrecision |
Precision selects the numeric precision used by the runtime. | Enum: [auto fp4 fp8 fp16 fp32 bf16 int4 int8] Optional: {} |
|
hardware AIMHardwareRequirements |
Hardware specifies GPU and CPU requirements for each replica. For GPU models, defines the GPU count and model types required for deployment. For CPU-only models, defines CPU resource requirements. This field is immutable after creation. |
Optional: {} |
AIMServicePodMetric¶
AIMServicePodMetric identifies the pod metric and its backend. Supports multiple metrics backends including OpenTelemetry.
Appears in: - AIMServicePodMetricSource
| Field | Description | Default | Validation |
|---|---|---|---|
backend string |
Backend defines the metrics backend to use. If not specified, defaults to "opentelemetry". |
opentelemetry | Enum: [opentelemetry] Optional: {} |
serverAddress string |
ServerAddress specifies the address of the metrics backend server. If not specified, defaults to "keda-otel-scaler.keda.svc:4317" for OpenTelemetry backend. |
Optional: {} |
|
metricNames string array |
MetricNames specifies which metrics to collect from pods and send to ServerAddress. Example: ["vllm:num_requests_running"] |
Optional: {} |
|
query string |
Query specifies the query to run to retrieve metrics from the backend. The query syntax depends on the backend being used. Example: "vllm:num_requests_running" for OpenTelemetry. |
Optional: {} |
|
operationOverTime string |
OperationOverTime specifies the operation to aggregate metrics over time. Valid values: "last_one", "avg", "max", "min", "rate", "count" Default: "last_one" |
Optional: {} |
AIMServicePodMetricSource¶
AIMServicePodMetricSource defines pod-level metrics configuration. Specifies the metric identification and target values for pod-based autoscaling.
Appears in: - AIMServiceMetricsSpec
| Field | Description | Default | Validation |
|---|---|---|---|
metric AIMServicePodMetric |
Metric contains the metric identification and backend configuration. Defines which metrics to collect and how to query them. |
||
target AIMServiceMetricTarget |
Target specifies the target value for the metric. The autoscaler will scale to maintain this target value. |
AIMServiceRoutingStatus¶
AIMServiceRoutingStatus captures observed routing details.
Appears in: - AIMServiceStatus
| Field | Description | Default | Validation |
|---|---|---|---|
path string |
Path is the HTTP path prefix used when routing is enabled. Example: /tenant/svc-uuid |
Optional: {} |
AIMServiceRuntimeConfig¶
AIMServiceRuntimeConfig contains runtime configuration fields that apply to services. This struct is shared between AIMService.spec (inlined) and AIMRuntimeConfigCommon, allowing services to override these specific runtime settings while inheriting defaults from namespace/cluster RuntimeConfigs.
Appears in: - AIMClusterRuntimeConfigSpec - AIMRuntimeConfigCommon - AIMRuntimeConfigSpec - AIMServiceSpec
| Field | Description | Default | Validation |
|---|---|---|---|
storage AIMStorageConfig |
Storage configures storage defaults for this service's PVCs and caches. When set, these values override namespace/cluster runtime config defaults. |
Optional: {} |
|
routing AIMRuntimeRoutingConfig |
Routing controls HTTP routing configuration for this service. When set, these values override namespace/cluster runtime config defaults. |
Optional: {} |
|
env EnvVar array |
Env specifies environment variables for inference containers. When set on AIMService, these take highest precedence in the merge hierarchy. When set on RuntimeConfig, these provide namespace/cluster-level defaults. Merge order (highest to lowest): Service.Env > Template.Env > RuntimeConfig.Env > Profile.Env |
Optional: {} |
AIMServiceRuntimeStatus¶
AIMServiceRuntimeStatus captures runtime status including replica counts from HPA.
Appears in: - AIMServiceStatus
| Field | Description | Default | Validation |
|---|---|---|---|
currentReplicas integer |
CurrentReplicas is the current number of replicas as reported by the HPA. | Optional: {} |
|
desiredReplicas integer |
DesiredReplicas is the desired number of replicas as determined by the HPA. | Optional: {} |
|
minReplicas integer |
MinReplicas is the minimum number of replicas configured for autoscaling. | Optional: {} |
|
maxReplicas integer |
MaxReplicas is the maximum number of replicas configured for autoscaling. | Optional: {} |
|
replicas string |
Replicas is a formatted display string for kubectl output. Shows "current" for fixed replicas or "current/desired (min-max)" for autoscaling. |
Optional: {} |
AIMServiceSpec¶
AIMServiceSpec defines the desired state of AIMService.
Binds a canonical model to an AIMServiceTemplate and configures replicas, caching behavior, and optional overrides. The template governs the base runtime selection knobs, while the overrides field allows service-specific customization.
Appears in: - AIMService
| Field | Description | Default | Validation |
|---|---|---|---|
model AIMServiceModel |
Model specifies which model to deploy using one of the available reference methods. Use name to reference an existing AIMModel/AIMClusterModel by name, or use imageto specify a container image URI directly (which will auto-create a model if needed). |
||
template AIMServiceTemplateConfig |
Template contains template selection and configuration. Use Template.Name to specify an explicit template, or omit to auto-select. |
Optional: {} |
|
caching AIMServiceCachingConfig |
Caching controls caching behavior for this service. When nil, defaults to Shared mode. |
Optional: {} |
|
cacheModel boolean |
DEPRECATED: Use Caching.Mode instead. This field will be removed in a future version. This field is no longer honored by the controller. |
Optional: {} |
|
replicas integer |
Replicas specifies the number of replicas for this service. When not specified, defaults to 1 replica. This value overrides any replica settings from the template. For autoscaling, use MinReplicas and MaxReplicas instead. |
1 | Optional: {} |
minReplicas integer |
MinReplicas specifies the minimum number of replicas for autoscaling. Defaults to 1. Scale to zero is not supported. When specified with MaxReplicas, enables autoscaling for the service. |
Minimum: 1 Optional: {} |
|
maxReplicas integer |
MaxReplicas specifies the maximum number of replicas for autoscaling. Required when MinReplicas is set or when AutoScaling configuration is provided. |
Minimum: 1 Optional: {} |
|
autoScaling AIMServiceAutoScaling |
AutoScaling configures advanced autoscaling behavior using KEDA. Supports custom metrics from OpenTelemetry backend. When specified, MinReplicas and MaxReplicas should also be set. |
Optional: {} |
|
runtimeConfigName string |
Name is the name of the runtime config to use for this resource. If a runtime config with this name exists both as a namespace and a cluster runtime config, the values are merged together, the namespace config taking priority over the cluster config when there are conflicts. If this field is empty or set to default, the namespace / clusterruntime config with the name default is used, if it exists. |
Optional: {} |
|
storage AIMStorageConfig |
Storage configures storage defaults for this service's PVCs and caches. When set, these values override namespace/cluster runtime config defaults. |
Optional: {} |
|
routing AIMRuntimeRoutingConfig |
Routing controls HTTP routing configuration for this service. When set, these values override namespace/cluster runtime config defaults. |
Optional: {} |
|
env EnvVar array |
Env specifies environment variables for inference containers. When set on AIMService, these take highest precedence in the merge hierarchy. When set on RuntimeConfig, these provide namespace/cluster-level defaults. Merge order (highest to lowest): Service.Env > Template.Env > RuntimeConfig.Env > Profile.Env |
Optional: {} |
|
resources ResourceRequirements |
Resources overrides the container resource requirements for this service. When specified, these values take precedence over the template and image defaults. |
Optional: {} |
|
overrides AIMServiceOverrides |
Overrides allows overriding specific template parameters for this service. When specified, these values take precedence over the template values. |
Optional: {} |
|
imagePullSecrets LocalObjectReference array |
ImagePullSecrets references secrets for pulling AIM container images. | Optional: {} |
|
serviceAccountName string |
ServiceAccountName specifies the Kubernetes service account to use for the inference workload. This service account is used by the deployed inference pods. If empty, the default service account for the namespace is used. |
Optional: {} |
AIMServiceStatus¶
AIMServiceStatus defines the observed state of AIMService.
Appears in: - AIMService
| Field | Description | Default | Validation |
|---|---|---|---|
observedGeneration integer |
ObservedGeneration is the most recent generation observed by the controller. | ||
conditions Condition array |
Conditions represent the latest observations of template state. | ||
resolvedRuntimeConfig AIMResolvedReference |
ResolvedRuntimeConfig captures metadata about the runtime config that was resolved. | Optional: {} |
|
resolvedModel AIMResolvedReference |
ResolvedModel captures metadata about the image that was resolved. | Optional: {} |
|
status AIMStatus |
Status represents the current high‑level status of the service lifecycle. Values: Pending, Starting, Running, Degraded, Failed. |
Pending | Enum: [Pending Starting Running Degraded Failed] |
routing AIMServiceRoutingStatus |
Routing surfaces information about the configured HTTP routing, when enabled. | Optional: {} |
|
resolvedTemplate AIMResolvedReference |
ResolvedTemplate captures metadata about the template that satisfied the reference. | ||
cache AIMServiceCacheStatus |
Cache captures cache-related status for this service. | Optional: {} |
|
runtime AIMServiceRuntimeStatus |
Runtime captures runtime status including replica counts. | Optional: {} |
AIMServiceTemplate¶
AIMServiceTemplate is the Schema for namespace-scoped AIM service templates.
Appears in: - AIMServiceTemplateList
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMServiceTemplate |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec AIMServiceTemplateSpec |
|||
status AIMServiceTemplateStatus |
AIMServiceTemplateConfig¶
AIMServiceTemplateConfig contains template selection configuration for AIMService.
Appears in: - AIMServiceSpec
| Field | Description | Default | Validation |
|---|---|---|---|
name string |
Name is the name of the AIMServiceTemplate or AIMClusterServiceTemplate to use. The template selects the runtime profile and GPU parameters. When not specified, a template will be automatically selected based on the model. |
Optional: {} |
|
allowUnoptimized boolean |
AllowUnoptimized, if true, will allow automatic selection of templates that resolve to an unoptimized profile. |
Optional: {} |
AIMServiceTemplateList¶
AIMServiceTemplateList contains a list of AIMServiceTemplate.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMServiceTemplateList |
||
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
items AIMServiceTemplate array |
AIMServiceTemplateScope¶
Underlying type: string
AIMServiceTemplateScope is retained for backwards compatibility with existing consumers.
Validation: - Enum: [Namespace Cluster Unknown]
Appears in: - AIMTemplateCacheSpec
AIMServiceTemplateSpec¶
AIMServiceTemplateSpec defines the desired state of AIMServiceTemplate (namespace-scoped).
A namespaced and versioned template that selects a runtime profile for a given AIM model (by canonical name). Templates are intentionally narrow: they describe runtime selection knobs for the AIM container and do not redefine the full Kubernetes deployment shape.
Appears in: - AIMServiceTemplate
| Field | Description | Default | Validation |
|---|---|---|---|
modelName string |
ModelName is the model name. Matches metadata.name of an AIMModel or AIMClusterModel. Immutable.Example: meta/llama-3-8b:1.1+20240915 |
MinLength: 1 |
|
metric AIMMetric |
Metric selects the optimization goal. - latency: prioritize low end‑to‑end latency- throughput: prioritize sustained requests/second |
Enum: [latency throughput] Optional: {} |
|
precision AIMPrecision |
Precision selects the numeric precision used by the runtime. | Enum: [auto fp4 fp8 fp16 fp32 bf16 int4 int8] Optional: {} |
|
hardware AIMHardwareRequirements |
Hardware specifies GPU and CPU requirements for each replica. For GPU models, defines the GPU count and model types required for deployment. For CPU-only models, defines CPU resource requirements. This field is immutable after creation. |
Optional: {} |
|
runtimeConfigName string |
Name is the name of the runtime config to use for this resource. If a runtime config with this name exists both as a namespace and a cluster runtime config, the values are merged together, the namespace config taking priority over the cluster config when there are conflicts. If this field is empty or set to default, the namespace / clusterruntime config with the name default is used, if it exists. |
Optional: {} |
|
imagePullSecrets LocalObjectReference array |
ImagePullSecrets lists secrets containing credentials for pulling container images. These secrets are used for: - Discovery dry-run jobs that inspect the model container - Pulling the image for inference services The secrets are merged with any model or runtime config defaults. For namespace-scoped templates, secrets must exist in the same namespace. For cluster-scoped templates, secrets must exist in the operator namespace. |
Optional: {} |
|
serviceAccountName string |
ServiceAccountName specifies the Kubernetes service account to use for workloads related to this template. This includes discovery dry-run jobs and inference services created from this template. If empty, the default service account for the namespace is used. |
Optional: {} |
|
resources ResourceRequirements |
Resources defines the default container resource requirements applied to services derived from this template. Service-specific values override the template defaults. |
Optional: {} |
|
modelSources AIMModelSource array |
ModelSources specifies the model sources required to run this template. When provided, the discovery dry-run will be skipped and these sources will be used directly. This allows users to explicitly declare model dependencies without requiring a discovery job. If omitted, a discovery job will be run to automatically determine the required model sources. |
Optional: {} |
|
profileId string |
ProfileId is the specific AIM profile ID that this template should use. When set, the discovery job will be instructed to use this specific profile. |
Optional: {} |
|
type AIMProfileType |
Type indicates the optimization level of this template. - optimized: Template has been tuned for performance - preview: Template is experimental/pre-release - unoptimized: Default, no specific optimizations applied When nil, the type is determined by discovery. When set, overrides discovery. |
Enum: [optimized preview unoptimized] Optional: {} |
|
env EnvVar array |
Env specifies environment variables for inference containers. These variables are passed to the inference runtime and can be used to configure runtime behavior, authentication, or other settings. |
Optional: {} |
|
caching AIMTemplateCachingConfig |
Caching configures model caching behavior for this namespace-scoped template. When enabled, models will be cached using the specified environment variables during download. |
Optional: {} |
AIMServiceTemplateSpecCommon¶
Appears in: - AIMClusterServiceTemplateSpec - AIMServiceTemplateSpec
| Field | Description | Default | Validation |
|---|---|---|---|
modelName string |
ModelName is the model name. Matches metadata.name of an AIMModel or AIMClusterModel. Immutable.Example: meta/llama-3-8b:1.1+20240915 |
MinLength: 1 |
|
metric AIMMetric |
Metric selects the optimization goal. - latency: prioritize low end‑to‑end latency- throughput: prioritize sustained requests/second |
Enum: [latency throughput] Optional: {} |
|
precision AIMPrecision |
Precision selects the numeric precision used by the runtime. | Enum: [auto fp4 fp8 fp16 fp32 bf16 int4 int8] Optional: {} |
|
hardware AIMHardwareRequirements |
Hardware specifies GPU and CPU requirements for each replica. For GPU models, defines the GPU count and model types required for deployment. For CPU-only models, defines CPU resource requirements. This field is immutable after creation. |
Optional: {} |
|
runtimeConfigName string |
Name is the name of the runtime config to use for this resource. If a runtime config with this name exists both as a namespace and a cluster runtime config, the values are merged together, the namespace config taking priority over the cluster config when there are conflicts. If this field is empty or set to default, the namespace / clusterruntime config with the name default is used, if it exists. |
Optional: {} |
|
imagePullSecrets LocalObjectReference array |
ImagePullSecrets lists secrets containing credentials for pulling container images. These secrets are used for: - Discovery dry-run jobs that inspect the model container - Pulling the image for inference services The secrets are merged with any model or runtime config defaults. For namespace-scoped templates, secrets must exist in the same namespace. For cluster-scoped templates, secrets must exist in the operator namespace. |
Optional: {} |
|
serviceAccountName string |
ServiceAccountName specifies the Kubernetes service account to use for workloads related to this template. This includes discovery dry-run jobs and inference services created from this template. If empty, the default service account for the namespace is used. |
Optional: {} |
|
resources ResourceRequirements |
Resources defines the default container resource requirements applied to services derived from this template. Service-specific values override the template defaults. |
Optional: {} |
|
modelSources AIMModelSource array |
ModelSources specifies the model sources required to run this template. When provided, the discovery dry-run will be skipped and these sources will be used directly. This allows users to explicitly declare model dependencies without requiring a discovery job. If omitted, a discovery job will be run to automatically determine the required model sources. |
Optional: {} |
|
profileId string |
ProfileId is the specific AIM profile ID that this template should use. When set, the discovery job will be instructed to use this specific profile. |
Optional: {} |
|
type AIMProfileType |
Type indicates the optimization level of this template. - optimized: Template has been tuned for performance - preview: Template is experimental/pre-release - unoptimized: Default, no specific optimizations applied When nil, the type is determined by discovery. When set, overrides discovery. |
Enum: [optimized preview unoptimized] Optional: {} |
|
env EnvVar array |
Env specifies environment variables for inference containers. These variables are passed to the inference runtime and can be used to configure runtime behavior, authentication, or other settings. |
Optional: {} |
AIMServiceTemplateStatus¶
AIMServiceTemplateStatus defines the observed state of AIMServiceTemplate.
Appears in: - AIMClusterServiceTemplate - AIMServiceTemplate
| Field | Description | Default | Validation |
|---|---|---|---|
observedGeneration integer |
ObservedGeneration is the most recent generation observed by the controller. | ||
conditions Condition array |
Conditions represent the latest observations of template state. | ||
resolvedRuntimeConfig AIMResolvedReference |
ResolvedRuntimeConfig captures metadata about the runtime config that was resolved. | Optional: {} |
|
resolvedModel AIMResolvedReference |
ResolvedModel captures metadata about the image that was resolved. | Optional: {} |
|
resolvedCache AIMResolvedReference |
ResolvedCache captures metadata about which cache is used for this template | Optional: {} |
|
resolvedHardware AIMHardwareRequirements |
ResolvedHardware contains the resolved hardware requirements for this template. These values are computed from discovery results and spec defaults, and represent what will actually be used when creating InferenceServices. Resolution order: discovery output > spec values > defaults. |
Optional: {} |
|
resolvedNodeAffinity NodeAffinity |
ResolvedNodeAffinity contains the computed node affinity rules for GPU scheduling. This is derived from GPU model and minVRAM requirements, merged with any user-specified affinity from the spec. The service controller uses this directly when creating InferenceServices. |
Optional: {} |
|
hardwareSummary string |
HardwareSummary is a human-readable display string for the hardware requirements. Format: "{count} x {model}" for GPU (e.g., "2 x MI300X") or "CPU" for CPU-only. This is a computed field for display purposes only. |
Optional: {} |
|
status AIMStatus |
Status represents the current high‑level status of the template lifecycle. Values: Pending, Progressing, Ready, Degraded, Failed. |
Pending | Enum: [Pending Progressing Ready Degraded Failed NotAvailable] |
modelSources AIMModelSource array |
ModelSources list the models that this template requires to run. These are the models that will be cached, if this template is cached. |
||
profile AIMProfile |
Profile contains the full discovery result profile as a free-form JSON object. This includes metadata, engine args, environment variables, and model details. |
||
discoveryJob AIMResolvedReference |
DiscoveryJob is a reference to the job that was run for discovery | ||
discovery DiscoveryState |
Discovery contains state tracking for the discovery process, including retry attempts and backoff timing for the circuit breaker pattern. |
Optional: {} |
AIMStorageConfig¶
AIMStorageConfig configures storage defaults for artifacts and PVCs.
Appears in: - AIMClusterRuntimeConfigSpec - AIMRuntimeConfigCommon - AIMRuntimeConfigSpec - AIMServiceRuntimeConfig - AIMServiceSpec
| Field | Description | Default | Validation |
|---|---|---|---|
defaultStorageClassName string |
DefaultStorageClassName specifies the storage class to use for artifacts and PVCs when the consuming resource (AIMArtifact, AIMTemplateCache, AIMServiceTemplate) does not specify a storage class. If this field is empty, the cluster's default storage class is used. |
Optional: {} |
|
pvcHeadroomPercent integer |
PVCHeadroomPercent specifies the percentage of extra space to add to PVCs for model storage. This accounts for filesystem overhead and temporary files during model loading. The value represents a percentage (e.g., 10 means 10% extra space). If not specified, defaults to 10%. |
10 | Minimum: 0 Optional: {} |
AIMTemplateCache¶
AIMTemplateCache pre-warms artifacts for a specified template.
Appears in: - AIMTemplateCacheList
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMTemplateCache |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
spec AIMTemplateCacheSpec |
|||
status AIMTemplateCacheStatus |
AIMTemplateCacheList¶
AIMTemplateCacheList contains a list of AIMTemplateCache.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string |
aim.eai.amd.com/v1alpha1 |
||
kind string |
AIMTemplateCacheList |
||
metadata ListMeta |
Refer to Kubernetes API documentation for fields of metadata. |
||
items AIMTemplateCache array |
AIMTemplateCacheMode¶
Underlying type: string
AIMTemplateCacheMode controls the ownership behavior of artifacts created by a template cache.
Validation: - Enum: [Dedicated Shared]
Appears in: - AIMTemplateCacheSpec
| Field | Description |
|---|---|
Dedicated |
TemplateCacheModeDedicated means artifacts have owner references to the template cache. When the template cache is deleted, all its artifacts are garbage collected. Use this mode for service-specific caches that should be cleaned up with the service. |
Shared |
TemplateCacheModeShared means artifacts have no owner references. artifacts persist independently of template cache lifecycle and can be shared. This is the default mode for long-lived, reusable caches. |
AIMTemplateCacheSpec¶
AIMTemplateCacheSpec defines the desired state of AIMTemplateCache
Appears in: - AIMTemplateCache
| Field | Description | Default | Validation |
|---|---|---|---|
templateName string |
TemplateName is the name of the AIMServiceTemplate or AIMClusterServiceTemplate to cache. The controller will first look for a namespace-scoped AIMServiceTemplate in the same namespace. If not found, it will look for a cluster-scoped AIMClusterServiceTemplate with the same name. Namespace-scoped templates take priority over cluster-scoped templates. |
MinLength: 1 |
|
templateScope AIMServiceTemplateScope |
TemplateScope indicates whether the template is namespace-scoped or cluster-scoped. This field is set by the controller during template resolution. |
Enum: [Namespace Cluster Unknown] Required: {} |
|
env EnvVar array |
Env specifies environment variables to use for authentication when downloading models. These variables are used for authentication with model registries (e.g., HuggingFace tokens). |
Optional: {} |
|
imagePullSecrets LocalObjectReference array |
ImagePullSecrets references secrets for pulling AIM container images. | Optional: {} |
|
storageClassName string |
StorageClassName specifies the storage class for cache volumes. When not specified, uses the cluster default storage class. |
Optional: {} |
|
downloadImage string |
DownloadImage specifies the container image used to download and initialize artifacts. When not specified, the controller uses the default model download image. |
Optional: {} |
|
modelSources AIMModelSource array |
ModelSources specifies the model sources to cache for this template. These sources are typically copied from the resolved template's model sources. |
Optional: {} |
|
runtimeConfigName string |
Name is the name of the runtime config to use for this resource. If a runtime config with this name exists both as a namespace and a cluster runtime config, the values are merged together, the namespace config taking priority over the cluster config when there are conflicts. If this field is empty or set to default, the namespace / clusterruntime config with the name default is used, if it exists. |
Optional: {} |
|
mode AIMTemplateCacheMode |
Mode controls the ownership behavior of artifacts created by this template cache. - Dedicated: artifacts are owned by this template cache and garbage collected when it's deleted. - Shared (default): artifacts have no owner references and persist independently. When a Shared template cache encounters artifacts with owner references, it promotes them to shared by removing the owner references, ensuring they persist for long-term use. |
Shared | Enum: [Dedicated Shared] Optional: {} |
AIMTemplateCacheStatus¶
AIMTemplateCacheStatus defines the observed state of AIMTemplateCache
Appears in: - AIMTemplateCache
| Field | Description | Default | Validation |
|---|---|---|---|
observedGeneration integer |
ObservedGeneration is the most recent generation observed by the controller. | ||
conditions Condition array |
Conditions represent the latest observations of the template cache state. | ||
resolvedRuntimeConfig AIMResolvedReference |
ResolvedRuntimeConfig captures metadata about the runtime config that was resolved. | Optional: {} |
|
status AIMStatus |
Status represents the current high-level status of the template cache. | Pending | Enum: [Pending Progressing Ready Failed Degraded NotAvailable] |
resolvedTemplateKind string |
ResolvedTemplateKind indicates whether the template resolved to a namespace-scoped AIMServiceTemplate or cluster-scoped AIMClusterServiceTemplate. Values: "AIMServiceTemplate", "AIMClusterServiceTemplate" |
||
artifacts object (keys:string, values:AIMResolvedArtifact) |
Artifacts maps model names to their resolved AIMArtifact resources. | Optional: {} |
AIMTemplateCachingConfig¶
AIMTemplateCachingConfig configures model caching behavior for namespace-scoped templates.
Appears in: - AIMServiceTemplateSpec
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean |
Enabled controls whether caching is enabled for this template. Defaults to false. |
false | |
env EnvVar array |
Env specifies environment variables to use when downloading the model for caching. These variables are available to the model download process and can be used to configure download behavior, authentication, proxies, etc. If not set, falls back to the template's top-level Env field. |
Optional: {} |
AIMTemplateProfile¶
AIMTemplateProfile declares profile variables for template selection. Used in AIMCustomTemplate to specify optimization targets.
Appears in: - AIMCustomTemplate
| Field | Description | Default | Validation |
|---|---|---|---|
metric AIMMetric |
Metric specifies the optimization target (e.g., latency, throughput). | Enum: [latency throughput] Optional: {} |
|
precision AIMPrecision |
Precision specifies the numerical precision (e.g., fp8, fp16, bf16). | Enum: [auto fp4 fp8 fp16 fp32 bf16 int4 int8] Optional: {} |
DiscoveryState¶
DiscoveryState tracks the discovery process state for circuit breaker logic. This enables exponential backoff and prevents infinite retry loops when discovery jobs fail persistently.
Appears in: - AIMServiceTemplateStatus
| Field | Description | Default | Validation |
|---|---|---|---|
attempts integer |
Attempts is the number of discovery job attempts that have been made. This counter increments each time a new discovery job is created after a failure. |
Optional: {} |
|
lastAttemptTime Time |
LastAttemptTime is the timestamp of the most recent discovery job creation. Used to calculate exponential backoff before the next retry. |
Optional: {} |
|
lastFailureReason string |
LastFailureReason captures the reason for the most recent discovery failure. Used to classify failures as terminal vs transient. |
Optional: {} |
|
specHash string |
SpecHash is a hash of the template spec fields that affect discovery. When the spec changes, the circuit breaker resets to allow fresh attempts. |
Optional: {} |
DownloadProgress¶
DownloadProgress represents the download progress for a artifact
Appears in: - AIMArtifactStatus
| Field | Description | Default | Validation |
|---|---|---|---|
totalBytes integer |
TotalBytes is the expected total size of the download in bytes | Optional: {} |
|
downloadedBytes integer |
DownloadedBytes is the number of bytes downloaded so far | Optional: {} |
|
percentage integer |
Percentage is the download progress as a percentage (0-100) | Maximum: 100 Minimum: 0 Optional: {} |
|
displayPercentage string |
DisplayPercentage is a human-readable progress string (e.g., "45 %") This field is automatically populated from Progress.Percentage |
Optional: {} |
DownloadState¶
DownloadState represents the current download attempt state, updated by the downloader pod
Appears in: - AIMArtifactStatus
| Field | Description | Default | Validation |
|---|---|---|---|
protocol string |
Protocol is the download protocol currently in use (e.g., "XET", "HF_TRANSFER", "HTTP") | Optional: {} |
|
attempt integer |
Attempt is the current attempt number (1-based) | Optional: {} |
|
totalAttempts integer |
TotalAttempts is the total number of attempts configured via AIM_DOWNLOADER_PROTOCOL | Optional: {} |
|
protocolSequence string |
ProtocolSequence is the configured protocol sequence (e.g., "HF_TRANSFER,XET") | Optional: {} |
|
message string |
Message is a human-readable status message from the downloader | Optional: {} |
ImageMetadata¶
ImageMetadata contains metadata extracted from or provided for a container image.
Appears in: - AIMModelSpec - AIMModelStatus
| Field | Description | Default | Validation |
|---|---|---|---|
model ModelMetadata |
Model contains AMD Silogen model-specific metadata. | Optional: {} |
|
oci OCIMetadata |
OCI contains standard OCI image metadata. | Optional: {} |
|
originalLabels object (keys:string, values:string) |
OriginalLabels contains the raw OCI image labels as a JSON object. This preserves all labels from the image, including those not mapped to structured fields. |
Optional: {} |
ModelMetadata¶
ModelMetadata contains AMD Silogen model-specific metadata extracted from image labels.
Appears in: - ImageMetadata
| Field | Description | Default | Validation |
|---|---|---|---|
canonicalName string |
CanonicalName is the canonical model identifier (e.g., mistralai/Mixtral-8x22B-Instruct-v0.1). Extracted from: org.amd.silogen.model.canonicalName |
Optional: {} |
|
source string |
Source is the URL where the model can be found. Extracted from: org.amd.silogen.model.source |
Optional: {} |
|
tags string array |
Tags are descriptive tags (e.g., ["text-generation", "chat", "instruction"]). Extracted from: org.amd.silogen.model.tags (comma-separated) |
Optional: {} |
|
versions string array |
Versions lists available versions. Extracted from: org.amd.silogen.model.versions (comma-separated) |
Optional: {} |
|
variants string array |
Variants lists model variants. Extracted from: org.amd.silogen.model.variants (comma-separated) |
Optional: {} |
|
hfTokenRequired boolean |
HFTokenRequired indicates if a HuggingFace token is required. Extracted from: org.amd.silogen.hfToken.required |
Optional: {} |
|
title string |
Title is the Silogen-specific title for the model. Extracted from: org.amd.silogen.title |
Optional: {} |
|
descriptionFull string |
DescriptionFull is the full description. Extracted from: org.amd.silogen.description.full |
Optional: {} |
|
releaseNotes string |
ReleaseNotes contains release notes for this version. Extracted from: org.amd.silogen.release.notes |
Optional: {} |
|
recommendedDeployments RecommendedDeployment array |
RecommendedDeployments contains recommended deployment configurations. Extracted from: org.amd.silogen.model.recommendedDeployments (parsed from JSON array) |
Optional: {} |
ModelSourceFilter¶
ModelSourceFilter defines a pattern for discovering images. Supports multiple formats: - Repository patterns: "org/repo" - matches repositories with wildcards - Repository with tag: "org/repo:1.0.0" - exact tag match - Full URI: "ghcr.io/org/repo:1.0.0" - overrides registry and tag - Full URI with wildcard: "ghcr.io/org/repo" - overrides registry, matches pattern
Appears in: - AIMClusterModelSourceSpec
| Field | Description | Default | Validation |
|---|---|---|---|
image string |
Image pattern with wildcard and full URI support. Supported formats: - Repository pattern: "amdenterpriseai/aim-" - Repository with tag: "silogen/aim-llama:1.0.0" (overrides versions field) - Full URI: "ghcr.io/silogen/aim-google-gemma-3-1b-it:0.8.1-rc1" (overrides spec.registry and versions) - Full URI with wildcard: "ghcr.io/silogen/aim-" (overrides spec.registry) When a full URI is specified (including registry like ghcr.io), only images from that registry will match. When a tag is included, it takes precedence over the versions field. Wildcard: * matches any sequence of characters. |
MaxLength: 512 |
|
exclude string array |
Exclude lists specific repository names to skip (exact match on repository name only, not registry). Useful for excluding base images or experimental versions. Examples: - ["amdenterpriseai/aim-base", "amdenterpriseai/aim-experimental"] - ["silogen/aim-base"] - works with "ghcr.io/silogen/aim-*" (registry is not checked in exclusion) Note: Exclusions match against repository names (e.g., "silogen/aim-base"), not full URIs. |
Optional: {} |
|
versions string array |
Versions specifies semantic version constraints for this filter. If specified, overrides the global Versions field. Only tags that parse as valid semver are considered (including prereleases like 0.8.1-rc1). Ignored if the Image field includes an explicit tag (e.g., "repo:1.0.0"). Examples: ">=1.0.0", "<2.0.0", "~1.2.0" (patch updates), "^1.0.0" (minor updates) Prerelease versions (e.g., 0.8.1-rc1) are supported and follow semver rules: - 0.8.1-rc1 matches ">=0.8.0" (prerelease is part of version 0.8.1) - Use ">=0.8.1-rc1" to match only that prerelease or higher - Leave empty to match all tags (including prereleases and non-semver tags) |
Optional: {} |
OCIMetadata¶
OCIMetadata contains standard OCI image metadata extracted from image labels.
Appears in: - ImageMetadata
| Field | Description | Default | Validation |
|---|---|---|---|
title string |
Title is the human-readable title. Extracted from: org.opencontainers.image.title |
Optional: {} |
|
description string |
Description is a brief description. Extracted from: org.opencontainers.image.description |
Optional: {} |
|
licenses string |
Licenses is the SPDX license identifier(s). Extracted from: org.opencontainers.image.licenses |
Optional: {} |
|
vendor string |
Vendor is the organization that produced the image. Extracted from: org.opencontainers.image.vendor |
Optional: {} |
|
authors string |
Authors is contact details of the authors. Extracted from: org.opencontainers.image.authors |
Optional: {} |
|
source string |
Source is the URL to the source code repository. Extracted from: org.opencontainers.image.source |
Optional: {} |
|
documentation string |
Documentation is the URL to documentation. Extracted from: org.opencontainers.image.documentation |
Optional: {} |
|
created string |
Created is the creation timestamp. Extracted from: org.opencontainers.image.created |
Optional: {} |
|
revision string |
Revision is the source control revision. Extracted from: org.opencontainers.image.revision |
Optional: {} |
|
version string |
Version is the image version. Extracted from: org.opencontainers.image.version |
Optional: {} |
RecommendedDeployment¶
RecommendedDeployment describes a recommended deployment configuration for a model.
Appears in: - ModelMetadata
| Field | Description | Default | Validation |
|---|---|---|---|
gpuModel string |
GPUModel is the GPU model name (e.g., MI300X, MI325X) | Optional: {} |
|
gpuCount integer |
GPUCount is the number of GPUs required | Optional: {} |
|
precision string |
Precision is the recommended precision (e.g., fp8, fp16, bf16) | Optional: {} |
|
metric string |
Metric is the optimization target (e.g., latency, throughput) | Optional: {} |
|
profileId string |
ProfileId is the unique identifier of the AIM profile for this deployment. When set, templates created from this deployment will use this profile ID to deterministically select the correct runtime profile in the AIM container. |
Optional: {} |
|
description string |
Description provides additional context about this deployment configuration | Optional: {} |
RuntimeConfigRef¶
Appears in: - AIMArtifactSpec - AIMClusterServiceTemplateSpec - AIMModelSpec - AIMServiceSpec - AIMServiceTemplateSpec - AIMServiceTemplateSpecCommon - AIMTemplateCacheSpec
| Field | Description | Default | Validation |
|---|---|---|---|
runtimeConfigName string |
Name is the name of the runtime config to use for this resource. If a runtime config with this name exists both as a namespace and a cluster runtime config, the values are merged together, the namespace config taking priority over the cluster config when there are conflicts. If this field is empty or set to default, the namespace / clusterruntime config with the name default is used, if it exists. |
Optional: {} |