Skip to main content

TensorRT

Building the engine

Follow the official TensorRT-LLM documentation to build the engine.

For Mistral-7B, you can use the LLaMA example
For Mixtral-8X7B, official documentation coming soon...

Deploying the engine

Once the engine is built, it can be deployed using the Triton inference server and its TensorRTLLM backend.

Follow the official documentation.

Building the engine
Deploying the engine