Model Sources¶

AIMClusterModelSource automatically discovers and syncs AI model images from container registries, creating AIMClusterModel resources for matched images.

Overview¶

Model sources eliminate the need to manually create model resources for every image version. They continuously sync with container registries, automatically creating models when new images are published.

Key features:

Automatic discovery: Continuously monitors registries for images matching your filters
Flexible filtering: Use wildcards, version constraints, and exclusions
Multi-registry support: Works with Docker Hub, GitHub Container Registry (ghcr.io), and more
Periodic sync: Configurable sync intervals to keep models up to date
Private registries: Supports authentication via imagePullSecrets

Basic Example¶

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterModelSource
metadata:
  name: amd-models
spec:
  filters:
    - image: amdenterpriseai/aim-*
  syncInterval: 1h

This source discovers all images matching amdenterpriseai/aim-* from Docker Hub and creates an AIMClusterModel for each.

Configuration¶

Registry¶

The registry field specifies which container registry to query. Defaults to docker.io if not specified.

spec:
  registry: ghcr.io  # or docker.io, gcr.io, etc.

Filters¶

Filters define which images to discover. Each filter specifies a pattern with optional version constraints and exclusions. Multiple filters are combined with OR logic.

Repository Patterns¶

Match repositories using wildcards:

spec:
  filters:
    - image: amdenterpriseai/aim-*

Repository with Specific Tag¶

Match a specific tag:

spec:
  filters:
    - image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5

Full URI¶

Override the registry for specific filters:

spec:
  registry: docker.io
  filters:
    - image: docker.io/amdenterpriseai/aim-qwen-qwen3-32b:0.8.5

Full URI with Wildcard¶

Override registry and use wildcards:

spec:
  registry: ghcr.io
  filters:
    - image: amdenterpriseai/aim-*

Version Constraints¶

Use semantic version constraints to filter tags. Supports both global and per-filter version constraints.

Global Version Constraints¶

Apply to all filters:

spec:
  registry: ghcr.io
  filters:
    - image: amdenterpriseai/aim-qwen-*
    - image: amdenterpriseai/aim-deepseek-*
  versions:
    - ">=0.8.0"
    - "<1.0.0"

Per-Filter Version Constraints¶

Override global constraints for specific filters:

spec:
  registry: ghcr.io
  versions:
    - ">=0.8.0"  # global default
  filters:
    - image: amdenterpriseai/aim-qwen-*
      versions:
        - ">=0.8.5"  # overrides global for this filter
    - image: amdenterpriseai/aim-deepseek-*
      # uses global constraint

Version Syntax¶

Constraints use standard semver syntax:

>=1.0.0 - Version 1.0.0 or higher
<2.0.0 - Below version 2.0.0
~1.2.0 - Patch updates only (1.2.x)
^1.0.0 - Minor updates allowed (1.x.x)

Prerelease versions (e.g., 0.8.1-rc1) are supported:

versions:
  - ">=0.8.1-rc1"  # includes prereleases

Non-semver tags (e.g., latest, dev) are silently skipped when version constraints are specified.

Exclusions¶

Exclude specific repositories from matching:

spec:
  filters:
    - image: amdenterpriseai/aim-*
      exclude:
        - amdenterpriseai/aim-base
        - amdenterpriseai/aim-experimental

Exclusions match repository names exactly (not including the registry).

Sync Interval¶

Control how often the source syncs with the registry:

spec:
  syncInterval: 30m  # supports: 15m, 1h, 2h30m, etc.

Default is 1h. Minimum recommended interval is 15m to avoid rate limiting.

Private Registries¶

Authenticate to private registries using imagePullSecrets:

apiVersion: v1
kind: Secret
metadata:
  name: ghcr-secret
  namespace: aim-system  # operator namespace
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: BASE64_CONFIG
---
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterModelSource
metadata:
  name: private-models
spec:
  registry: ghcr.io
  imagePullSecrets:
    - name: ghcr-secret
  filters:
    - image: myorg/private-model-*

Secrets must exist in the operator namespace (typically aim-system).

GitHub Container Registry (GHCR) Authentication¶

For GitHub Container Registry, use a GitHub Personal Access Token (PAT) with the minimal required scope:

Required Scope: - read:packages - Read access to container packages

Recommended: Use Fine-Grained Personal Access Tokens

Create a fine-grained PAT at: https://github.com/settings/tokens
Set repository access or organization permissions
Grant only read:packages permission
Set expiration date
Create the secret:

kubectl create secret docker-registry ghcr-secret \
  --docker-server=ghcr.io \
  --docker-username=YOUR_GITHUB_USERNAME \
  --docker-password=YOUR_GITHUB_PAT \
  --namespace=aim-system

Security Best Practices: - Use fine-grained PATs instead of classic PATs when possible - Grant minimal permissions (read:packages only) - Set expiration dates on tokens - Rotate tokens regularly - Use separate tokens for different environments (dev/staging/prod) - Enable encryption at rest for Kubernetes Secrets in production - Limit Secret access via RBAC to only the operator namespace

Token Scopes to Avoid: - ❌ repo - Grants read/write access to repositories (too broad) - ❌ write:packages - Write access not needed for discovery - ❌ admin:org - Organization admin access (unnecessary) - ❌ delete:packages - Delete permission (unnecessary risk)

Max Models Limit¶

Control the maximum number of models created to prevent runaway resource creation:

spec:
  maxModels: 100  # CRD default: 100, range: 1-10000
  filters:
    - image: org/very-broad-pattern-*

When using the Helm chart's optional clusterModelSource, the chart default is maxModels: 500 unless overridden.

When the limit is reached:

No new models are created, even if more matching images exist
Existing models are never deleted
Status shows modelsLimitReached: true
availableModels shows total images found vs discoveredModels created

Use Cases:

Prevent accidental model explosion from overly broad filters
Enforce resource quotas in multi-tenant environments
Limit cluster resource consumption during initial sync

Example Status:

status:
  status: Ready
  discoveredModels: 100
  availableModels: 250
  modelsLimitReached: true
  conditions:
    - type: MaxModelsLimitReached
      status: "True"
      message: "Model creation limit reached (100 models created). 150 available images not created as models."

Status¶

The status field tracks sync progress and discovered models:

kubectl get aimclustermodelsource

NAME             STATUS   MODELS   LASTSYNC             AGE
amd-models   Ready    12       2025-01-15T10:30:00  2d

Status Values¶

Pending: Waiting for initial sync
Progressing: Sync in progress
Ready: All filters succeeded
Degraded: Some filters failed, but others succeeded
Failed: All filters failed

Detailed Status¶

kubectl get aimclustermodelsource amd-models -o yaml

Key status fields:

status: Overall state (Ready, Degraded, Failed, etc.)
discoveredModels: Count of AIMClusterModel resources created
availableModels: Total count of images matching filters in registry
modelsLimitReached: Boolean indicating if maxModels limit was reached
lastSyncTime: Timestamp of last successful sync
conditions: Detailed conditions including Ready, Degraded, and MaxModelsLimitReached

Examples¶

Docker Hub with Wildcards¶

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterModelSource
metadata:
  name: dockerhub-models
spec:
  registry: docker.io
  filters:
    - image: amdenterpriseai/aim-*
      exclude:
        - amdenterpriseai/aim-base
  syncInterval: 2h

GitHub Container Registry with Version Constraints¶

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterModelSource
metadata:
  name: ghcr-stable-models
spec:
  registry: ghcr.io
  filters:
    - image: amdenterpriseai/aim-qwen-*
    - image: amdenterpriseai/aim-deepseek-*
  versions:
    - ">=0.8.0"
    - "<1.0.0"
  syncInterval: 1h

Multiple Registries¶

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterModelSource
metadata:
  name: multi-registry-models
spec:
  registry: docker.io  # default
  filters:
    - image: amdenterpriseai/aim-*  # uses docker.io
    - image: ghcr.io/amdenterpriseai/aim-*  # overrides to ghcr.io
  syncInterval: 1h

Private Registry with Authentication¶

apiVersion: v1
kind: Secret
metadata:
  name: private-registry-creds
  namespace: aim-system
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: BASE64_ENCODED_CONFIG
---
apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterModelSource
metadata:
  name: private-models
spec:
  registry: private.registry.io
  imagePullSecrets:
    - name: private-registry-creds
  filters:
    - image: myorg/model-*
      versions:
        - ">=1.0.0"
  syncInterval: 1h

Specific Versions Only¶

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterModelSource
metadata:
  name: specific-versions
spec:
  registry: ghcr.io
  filters:
    - image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.5
    - image: amdenterpriseai/aim-qwen-qwen3-32b:0.8.4
    - image: amdenterpriseai/aim-deepseek-deepseek-r1:0.8.5
  syncInterval: 6h

Lifecycle¶

Created Models¶

Model sources create AIMClusterModel resources with auto-generated names based on the image URI. These models are owned by the source via an owner reference.

Created models have discovery enabled by default and will automatically create service templates if the image includes recommended deployment metadata.

Append-Only¶

Model sources follow an append-only lifecycle during normal operation. Once created, models are never deleted by the source, even if:

The image is removed from the registry
The filter is changed or removed

This ensures running services aren't disrupted when registry contents change.

Ownership and Deletion¶

Created models have an owner reference to the source. When you delete the source, Kubernetes will automatically delete all models that were created by it.

This cascading deletion happens via Kubernetes garbage collection. To prevent accidentally disrupting running services, consider the impact before deleting a model source.

If you need to stop tracking specific models:

Update the source filters to exclude those models
Delete the unwanted models manually:

kubectl delete aimclustermodel <model-name>

Note: You cannot selectively clean up models while keeping the source unchanged - any models matching the active filters will be recreated on the next sync.

Troubleshooting¶

No Models Discovered¶

Check the source status:

kubectl get aimclustermodelsource <name> -o yaml

Common causes:

No images match the filters
Registry is unreachable
Authentication failed (check imagePullSecrets)
Version constraints too restrictive

Degraded Status¶

Some filters failed while others succeeded. Check conditions:

kubectl get aimclustermodelsource <name> -o jsonpath='{.status.conditions}'

Look for error messages indicating which filters failed and why.

Failed Status¶

All filters failed. Common causes:

Invalid registry hostname
Missing or invalid imagePullSecrets
Network connectivity issues
Registry catalog API not supported (for wildcard filters)

Wildcard Filters Not Working¶

Wildcard filters require registry catalog API support. GitHub Container Registry (ghcr.io) wildcard discovery is supported via GHCR's REST API.

Models - Understanding AIMClusterModel and AIMModel resources
Templates - Auto-generated service templates
Runtime Config - Authentication and discovery configuration