Skip to main content

Triton Inference Server

TWSC provides pay-as-you-go working environment of NGC’s TensorRT Inference Server. The TensorRT inference server provides an inference service via an HTTP endpoint, allowing remote clients to request inferencing for any model that is being managed by the server. The TensorRT inference server itself is included in the TensorRT inference server container. External to the container, there are additional C++ and Python client libraries, and additional documentation at GitHub: Inference Server.

Image versions

Container VersionUbuntuCUDA ToolkitTriton Inference ServerTensorRTcuDNNTWCC Release Date
tritonserver-24.12-trtllm-python-py324.04NVIDIA CUDA 12.6.32.53.0TensorRT 10.7.0.23-16JUN25
tritonserver-24.05-trtllm-python-py322.04NVIDIA CUDA 12.4.12.46.0TensorRT 10.0.1.6-19JUL24
tritonserver-22.11-py320.04NVIDIA CUDA 11.8.02.28.0TensorRT 8.5.1-19JUL24
tritonserver-22.08-py320.04NVIDIA CUDA 11.7.12.25.0TensorRT 8.4.2.4-30SEP22
tritonserver-22.05-py320.04NVIDIA CUDA 11.7.02.22.0TensorRT 8.2.5.1-21JUN22
tritonserver-22.02-py320.04NVIDIA CUDA 11.6.02.19.0TensorRT 8.2.3-18MAY22
tritonserver-21.11-py320.04NVIDIA CUDA 11.5.02.16.0TensorRT 8.0.3.48.3.0.9618MAY22
tritonserver-21.08-py320.04NVIDIA CUDA 11.4.12.13.0TensorRT 8.0.1.68.2.2.616SEP21
tritonserver-21.06-py320.04NVIDIA CUDA 11.3.12.11.0TensorRT 7.2.3.48.2.116SEP21
tritonserver-21.02-py320.04NVIDIA CUDA 11.2.02.7.0TensorRT 7.2.2.3+ cuda11.1.0.0248.0.512MAY21
tensorrtserver-20.02-py318.04NVIDIA CUDA 10.2.891.12.0TensorRT 7.0.07.6.5-
tensorrtserver-19.02-py3-v116.04NVIDIA CUDA 10.0.1300.11.0 betaTensorRT 5.0.27.4.2-
tensorrtserver-18.12-py3-v116.04NVIDIA CUDA 10.0.1300.9.0 betaTensorRT 5.0.27.4.1-
tensorrtserver-18.10-py3-v116.04NVIDIA CUDA 10.0.1300.7.0 betaTensorRT 5.0.0 RC7.3.0-
tensorrtserver-18.10-py2-v116.04NVIDIA CUDA 9.0.1760.5.0 betaTensorRT 4.0.17.2.1-
info

py3 and py2 are Python version differences.

Detailed package versions