-
Notifications
You must be signed in to change notification settings - Fork 11
InferX platform k8s deployment
inferx-net edited this page Aug 11, 2025
·
12 revisions
- OS: Linux Kernel > 5.8.0, it has been tested in Ubuntu 20.04 and 22.4
- Processor: X86-64/Amd64
- Docker: > 17.09.0
- KVM: bare metal or VM with nested virtualization, Enable virtualization technology in BIOS (Usually in Security tab of BIOS)
- Memory: >=64GB
- Cuda: >= 12.5
- Kubenetes: K8S or K3S
There are following container images:
- inferx/inferx_dashboard:v0.1.3: The inferx webui dashboard.
- inferx/inferx_one:v0.1.3: The inferx platform services such as rest api gateway,scheduler, etc.
- inferx/spdk-container:v0.1.3: Optional. This is simple wrapper of https://spdk.io/. It is only needed when using InferX blob store.
- quay.io/keycloak/keycloak:latest: Keycloak image which used for Authentication.
- postgres:14.5: Standard postgres container. It stores inferx audit log, secret and keycloak configuration
- quay.io/coreos/etcd:v3.5.13: Inferx configurations such as tenant, namespace and model functions.
- Label the the GPU node
kubectl label node <NodeName> inferx_storage=data --overwrite
kubectl label node <NodeName>-precision-7960-tower inferx_nodeType=inferx_file --overwrite- clone inferx repo:
git clone https://github.com/inferx-net/inferx.git
cd inferx- Adjust nodeagent parameters: update the k8s/nodeagent.yaml
3.1 Update memory size: the model container resources will be allocted fromt the nodeagent pod, please update the memory suitable size per node's memory size resources: requests: cpu: "20" memory: "180Gi" # Regular memory request (RAM) limits: cpu: "20" memory: "180Gi" # Regular memory request (RAM) nvidia.com/gpu: 1
3.2 Update cache size: the cache size is the memory size which cache model in cpu memory. It will take part of the memory size in 3.1. Please make sure it is less than 50% of memory size of 3.1 - name: CACHE_MEMORY value: "90Gi"
- Start the pods: in the inferx folder run
make runkblob- Check the website http://:31250/demo/
- setup enviroment variable
export INFX_GATEWAY_URL="http://localhost:31501" # the inferx_one expose 31501 as the node port
export IFERX_APIKEY="87831cdb-d07a-4dc1-9de0-fb232c9bf286" # this is admin apikey, it is configured in the nodeagent yaml- submit first model
cd inferx/config
/opt/inferx/bin/ixctl create public.json # create tenant
/opt/inferx/bin/ixctl create Qwen_namespace.json # create namespace
/opt/inferx/bin/ixctl create Qwen2.5-Coder-1.5B-Instruct.json # create first model