Using fGPUs with OKS

You can work with NVIDIA GPUs in your OKS clusters through the allocation of flexible GPUs (fGPUs). The appropriate node pool configuration allows worker nodes to allocate, attach, and use fGPUs.

For more information, see About Flexible GPUs and Node Pool Manifest Reference.

Currently, OKS supports attaching 1 GPU per node.

Enabling GPU Support Through Your Node Pool Manifest

To use fGPUs with your worker nodes, you need to apply a Kubernetes manifest to your node pool. It must include the fgpu field and follow this structure:

Manifest Sample

apiVersion: oks.dev/v1beta2
kind: NodePool
metadata:
  name: application-pool2-a
spec:
  desiredNodes: 2
  nodeType: tinav5.c2r4p1
  fgpu:
    model: "nvidia-p6"
    k8s-operator: true
  zones:
    - eu-west-2a
  upgradeStrategy:
    maxUnavailable: 1
    maxSurge: 0
    autoUpgradeEnabled: false
  autoHealing: true

You can configure GPU support by specifying the following characteristics under the spec section of your node pool manifest.

fgpu spec Sample

spec:
  fgpu:
    model: "nvidia-p6"
    k8s-operator: true

This sample contains the following fields that you need to specify:

model: The GPU model to allocate.
k8s-operator: Whether the official NVIDIA GPU operator in the gpu-operator namespace is installed on the cluster (true | false).

Deleting the node pool does not uninstall the operator.

Supported fGPU Models

OKS supports the following fGPU models provided by 3DS OUTSCALE:

nvidia-a100
nvidia-a100-80
nvidia-h100
nvidia-l40
nvidia-m60
nvidia-p6
nvidia-p100
nvidia-v100

For more information about these models, see About Flexible GPUs > Models of fGPUs.

You must make sure that your chosen fGPU model is supported by the VM type that you defined when creating your node pool. If the fGPU model and VM type are incompatible, the allocated GPUs may fail to attach. After 3 unsuccessful attempts, the VM may fail to start as well.

Related Pages