Customize Python Backend for NVIDIA Triton Inference Server

Github repository: T5 python backend triton server

Overview

This repository provides an example of customizing a Python backend for the Triton Inference Server. The implementation demonstrates how to modify the Triton server to support specific and efficient deep learning inference workflows.

Features

Custom Python model for Triton inference
Preprocessing and postprocessing pipelines
Optimized request handling
Support for multiple model versions

Getting Started

Prerequisites

Ensure you have the following dependencies installed:

Docker
NVIDIA Triton Inference Server (>=2.x)
Python 3.8+
add model artifacts to model directory

Installation

Clone the repository:

1
2
git clone https://github.com/nvicuong/triton-test-t5.git
cd triton-test-t5

Running the Triton Server

You can start the Triton server with the custom model by running:

1
2
3
docker build -t tritonserver-custom .

docker run --gpus=all -it --shm-size=256m --rm -p8000:8000 -p8001:8001 -p8002:8002 -v ${PWD}:/workspace/  -v ${PWD}/model_repository:/models tritonserver-custom

Testing the Inference

You can send health check requests using curl:

1
curl --location 'http://localhost:8000/v2/health/ready'

And send inference requests:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
curl --location 'http://localhost:8000/v2/models/ensemble_model/infer?ab=sd' \
--header 'Content-Type: application/json' \
--data '{
        "inputs": [
            {
                "name": "input_text",
                "shape": [1],  
                "datatype": "BYTES",
                "data": ["abc"]
            }
        ]
}'

Modifying the Custom Model

You can edit model.py in the model repository to modify the inference logic. Ensure that your script follows the Triton Python backend model structure.