Services¶

An AIMService is the primary resource for deploying AI/ML models as inference endpoints on Kubernetes. It brings together a Model and a ServiceTemplate to create a running KServe InferenceService.

Overview¶

When you create an AIMService, the operator:

Resolves the model (by name reference or image URI), creating an AIMModel if needed
Resolves the template (explicit reference or auto-selection)
Configures caching if enabled (creates AIMTemplateCache and AIMArtifact resources)
Creates a KServe InferenceService with the appropriate configuration
Optionally configures routing via Gateway API

Basic Examples¶

Deploy by Model Name¶

Reference an existing model and let AIM Engine auto-select the best template:

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
  name: qwen-service
  namespace: ml-team
spec:
  model:
    name: qwen-qwen3-32b

Deploy by Image URI¶

Specify a container image directly. AIM Engine will find or create a matching AIMModel:

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
  name: qwen-service
  namespace: ml-team
spec:
  model:
    image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5

When using model.image, AIM Engine searches for existing models with that image URI. If none exist, it creates an AIMModel automatically (without owner references, so it persists for reuse by other services).

Deploy with Explicit Template¶

Specify both the model and template explicitly:

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMService
metadata:
  name: qwen-service
  namespace: ml-team
spec:
  model:
    name: qwen-qwen3-32b
  template:
    name: qwen3-32b-mi300x-fp16-latency

Model Resolution¶

AIMService supports three ways to specify the model:

Mode	Spec Field	Behavior
Reference	`model.name`	Looks up existing AIMModel or AIMClusterModel by name
Image	`model.image`	Finds or creates a model matching the image URI
Custom	`model.custom`	Creates or reuses a namespace-scoped custom AIMModel from inline model sources and hardware requirements

Resolution Order¶

When resolving by name, AIM Engine checks:

Namespace-scoped AIMModel with that name
Cluster-scoped AIMClusterModel with that name

Namespace-scoped resources take precedence.

When resolving by image URI, AIM Engine searches both namespace and cluster-scoped models for a matching spec.image. If no match exists, AIM Engine creates an AIMModel automatically. If multiple matches exist, resolution fails with an error to prevent ambiguity.

Template Resolution¶

Templates define how to run a model: GPU requirements, precision, optimization metric, environment variables, and more.

Explicit Template¶

When you specify template.name, AIM Engine looks up that template directly:

spec:
  template:
    name: my-template

Resolution order: 1. Namespace-scoped AIMServiceTemplate 2. Cluster-scoped AIMClusterServiceTemplate

Auto-Selection¶

When no template name is specified, AIM Engine automatically selects the best template for the model. This is the recommended approach for most deployments.

Auto-selection uses a multi-stage filtering and scoring algorithm:

Stage 1: Availability Filter¶

Only templates with status: Ready are considered. Templates that are Pending, Progressing, Failed, or NotAvailable are excluded.

Stage 2: Optimization Filter¶

By default, only optimized templates are considered. Templates with profile type unoptimized or preview are excluded unless you explicitly allow them:

spec:
  template:
    allowUnoptimized: true

This prevents accidentally deploying unoptimized configurations in production. Set allowUnoptimized: true during development or when optimized templates aren't available for your hardware.

Stage 3: GPU Availability¶

Templates are filtered to only those whose required GPU is available in the cluster. GPU availability is detected via node labels (based on GPU product ID).

If a template requires MI300X GPUs but none are available in the cluster, that template is excluded.

Stage 4: Scope Preference¶

When both namespace-scoped and cluster-scoped templates match, namespace-scoped templates take precedence. This allows teams to customize model deployments without affecting other namespaces.

Stage 5: Preference Scoring¶

If multiple templates remain after filtering, AIM Engine scores them using this preference hierarchy (highest to lowest priority):

Profile Type: optimized > preview > unoptimized
GPU Tier: MI325X > MI300X > MI250X > MI210
Metric: latency > throughput
Precision: Primary ordering by bit-width (smaller preferred). Secondary ordering by type: fp > bf > int. Full order: fp4 > int4 > fp8 > int8 > fp16 > bf16 > fp32

The template with the best score is selected.

Ambiguous Selection¶

If multiple templates have identical scores after all filtering and scoring, AIM Engine reports an ambiguous selection error. Resolve this by:

Specifying template.name explicitly
Removing duplicate templates

Caching¶

AIMService supports model caching to avoid downloading model weights on every pod startup. Caching is configured via spec.caching.mode.

Caching Modes¶

Mode	Behavior
`Shared` (default)	Reuses or creates shared cache assets. The template cache and artifacts persist independently of the service and can be reused by other services referencing the same template.
`Dedicated`	Creates service-owned cache assets. The template cache and artifacts are owned by the service and garbage-collected when the service is deleted.

spec:
  caching:
    mode: Shared  # default; use Dedicated for service-owned caches

When caching is omitted, mode defaults to Shared.

How Caching Works¶

Template Cache: An AIMTemplateCache pre-downloads all model sources for a template to PVCs
Model Caches: Individual AIMArtifact resources manage per-model downloads
Cache ownership: In Shared mode, the template cache has no owner references and persists after the service is deleted, available for reuse. In Dedicated mode, the cache is owned by the service and deleted with it.

Resource Configuration¶

Configure compute resources for the inference container:

spec:
  resources:
    requests:
      memory: "64Gi"
      cpu: "16"
    limits:
      memory: "128Gi"
      amd.com/gpu: "4"

Image Pull Secrets¶

For private registries:

spec:
  imagePullSecrets:
    - name: registry-credentials

Status¶

Service status reflects the health of all components:

Status	Meaning
`Pending`	Waiting for upstream dependencies (model, template)
`Starting`	Creating downstream resources (InferenceService, cache)
`Running`	InferenceService is ready and serving traffic
`Degraded`	Partially functional (e.g., cache failed but service running)
`Failed`	Critical failure preventing deployment

Component Health¶

The status includes health for each component:

Model: Resolution and readiness of the AIMModel
Template: Resolution and readiness of the AIMServiceTemplate
InferenceService: KServe InferenceService status
Cache: Template cache or service PVC status

Check conditions for detailed diagnostics:

kubectl get aimservice <name> -o jsonpath='{.status.conditions}' | jq

Troubleshooting¶

Service stuck in "Pending"¶

The service is waiting for upstream dependencies:

# Check which component is blocking
kubectl get aimservice <name> -o jsonpath='{.status.conditions}' | jq

Common causes: - Model not found: Check model.name spelling, or ensure model.image is accessible - Template not found: Check template.name or verify templates exist for the model - Template not ready: The template's model sources may still be resolving

Service stuck in "Starting"¶

Downstream resources are being created:

# Check InferenceService status
kubectl get inferenceservice -l aim.eai.amd.com/service.name=<name> -n <namespace>

# Check pod status
kubectl get pods -l serving.kserve.io/inferenceservice=<isvc-name> -n <namespace>

Use the InferenceService name returned by the first command as <isvc-name>.

Common causes: - Image pull errors: Check imagePullSecrets - Resource constraints: Insufficient GPU, memory, or CPU - PVC not binding: Check storage class availability

Template selection fails with "no templates found"¶

# List templates for the model
kubectl get aimservicetemplates -l aim.eai.amd.com/model=<model-name>

# Check if templates are Ready
kubectl get aimservicetemplates -o custom-columns=NAME:.metadata.name,STATUS:.status.status

If templates exist but aren't selected: - Templates may be NotAvailable (GPU not in cluster) - Templates may be unoptimized (set allowUnoptimized: true)

Template selection is ambiguous¶

Multiple templates have identical preference scores:

kubectl get aimservice <name> -o jsonpath='{.status.conditions[?(@.type=="Ready")].message}'

Resolution: - Specify template.name explicitly - Remove duplicate templates

Cache errors¶

# Check template cache status
kubectl get aimtemplatecache -l aim.eai.amd.com/service.name=<name>

# Check artifact status
kubectl get aimartifact -l aim.eai.amd.com/template=<template-name>

If cache is failing: - Check storage class supports ReadWriteMany - Verify PVC headroom is sufficient for model size - Check model source URLs are accessible

Storage size error¶

If you see StorageSizeError in the cache health, the template's model sources don't have size information yet. This typically resolves automatically as the template controller discovers model sizes. If it persists, check the template's model source configuration.