Skip to content

Multi-Tenancy

AIM Engine supports multi-tenant deployments through a combination of cluster-scoped and namespace-scoped resources.

Resource Scoping

Resource Cluster-Scoped Namespace-Scoped
Models AIMClusterModel AIMModel
Templates AIMClusterServiceTemplate AIMServiceTemplate
Runtime Config AIMClusterRuntimeConfig AIMRuntimeConfig
Model Sources AIMClusterModelSource
Services AIMService
Artifacts AIMArtifact

Namespace-scoped resources always take precedence over cluster-scoped ones during resolution.

Team Isolation

A typical multi-tenant setup:

  1. Cluster admin creates cluster-scoped resources shared by all teams:
  2. AIMClusterModelSource for model discovery
  3. AIMClusterRuntimeConfig for default routing, storage, and policies
  4. AIMClusterServiceTemplate for validated runtime profiles

  5. Teams work in their own namespaces with:

  6. AIMService resources for their inference endpoints
  7. AIMRuntimeConfig for team-specific credentials and overrides
  8. AIMModel for custom models only their team needs

Example: Namespace Configuration

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMRuntimeConfig
metadata:
  name: default
  namespace: ml-team-a
spec:
  env:
    - name: HF_TOKEN
      valueFrom:
        secretKeyRef:
          name: team-a-hf-token
          key: token
  routing:
    pathTemplate: "/team-a/{.metadata.name}"

Override Hierarchy

Configuration is resolved with the most specific scope winning:

Setting Resolution Order
Model AIMModel (namespace) → AIMClusterModel (cluster)
Template AIMServiceTemplate (namespace) → AIMClusterServiceTemplate (cluster)
Runtime config AIMRuntimeConfig (namespace) → AIMClusterRuntimeConfig (cluster)
Environment vars Service → RuntimeConfig (namespace) → ClusterRuntimeConfig

Label Propagation

AIM Engine can propagate labels from AIMService resources to managed child resources (InferenceService, HTTPRoute, PVCs). This is useful for cost tracking, team attribution, and policy enforcement.

Enable via runtime configuration:

apiVersion: aim.eai.amd.com/v1alpha1
kind: AIMClusterRuntimeConfig
metadata:
  name: default
spec:
  labelPropagation:
    enabled: true
    match:
      - "aim.eai.amd.com/*"
      - "team-*"
      - "cost-center"

Labels on the AIMService matching these patterns are automatically applied to all child resources. Labels matching aim.eai.amd.com/* are always propagated regardless of this setting.

RBAC

AIM Engine creates helper ClusterRoles for each CRD when rbacHelpers.enable is true (default):

Role Permissions
aimservice-admin Full access to AIMService resources
aimservice-editor Create, update, delete AIMService resources
aimservice-viewer Read-only access to AIMService resources

Similar roles exist for all CRDs. Bind these to team groups or service accounts:

kubectl create rolebinding team-a-aimservice \
  --clusterrole=aimservice-editor \
  --group=team-a \
  --namespace=ml-team-a

Next Steps