InferX platform k8s deployment

System Requirement

OS: Linux Kernel > 5.8.0, it has been tested in Ubuntu 20.04 and 22.4
Processor: X86-64/Amd64
Docker: > 17.09.0
KVM: bare metal or VM with nested virtualization, Enable virtualization technology in BIOS (Usually in Security tab of BIOS)
Memory: >=64GB
Cuda: >= 12.5
Kubenetes: K8S or K3S There are following container images:
1. inferx/inferx_dashboard:v0.1.3: The inferx webui dashboard.
2. inferx/inferx_one:v0.1.3: The inferx platform services such as rest api gateway,scheduler, etc.
3. inferx/spdk-container:v0.1.3: Optional. This is simple wrapper of https://spdk.io/. It is only needed when using InferX blob store.
4. quay.io/keycloak/keycloak:latest: Keycloak image which used for Authentication.
5. postgres:14.5: Standard postgres container. It stores inferx audit log, secret and keycloak configuration
6. quay.io/coreos/etcd:v3.5.13: Inferx configurations such as tenant, namespace and model functions.

Kube cluster setup steps

Label the the GPU node

kubectl label node <NodeName> inferx_storage=data --overwrite
kubectl label node <NodeName>-precision-7960-tower inferx_nodeType=inferx_file --overwrite

clone inferx repo:

git clone https://github.com/inferx-net/inferx.git
cd inferx

Adjust nodeagent parameters: update the k8s/nodeagent.yaml

3.1 Update memory size: the model container resources will be allocted fromt the nodeagent pod, please update the memory suitable size per node's memory size resources: requests: cpu: "20" memory: "180Gi" # Regular memory request (RAM) limits: cpu: "20" memory: "180Gi" # Regular memory request (RAM) nvidia.com/gpu: 1

3.2 Update cache size: the cache size is the memory size which cache model in cpu memory. It will take part of the memory size in 3.1. Please make sure it is less than 50% of memory size of 3.1 - name: CACHE_MEMORY value: "90Gi"

Start the pods: in the inferx folder run

make runkblob

Check the website http://:31250/demo/

Submit models

setup enviroment variable

export INFX_GATEWAY_URL="http://localhost:31501" # the inferx_one expose 31501 as the node port 
export IFERX_APIKEY="87831cdb-d07a-4dc1-9de0-fb232c9bf286" # this is admin apikey, it is configured in the nodeagent yaml

submit first model

cd inferx/config
/opt/inferx/bin/ixctl create public.json # create tenant
/opt/inferx/bin/ixctl create Qwen_namespace.json # create namespace
/opt/inferx/bin/ixctl create Qwen2.5-Coder-1.5B-Instruct.json # create first model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

InferX platform k8s deployment

System Requirement

Kube cluster setup steps

Submit models

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally