Skip to main content

TensorRT

Building the engine

Follow the official TensorRT-LLM documentation to build the engine.

  • For Mistral-7B, you can use the LLaMA example
  • For Mixtral-8X7B, official documentation coming soon...

Deploying the engine

Once the engine is built, it can be deployed using the Triton inference server and its TensorRTLLM backend.

Follow the official documentation.