[Deployment]

TensorRT

Follow the official TensorRT-LLM documentation to build the engine.

  • For Mistral-7B, you can use the LLaMA example
  • For Mixtral-8X7B, official documentation coming soon...
Deploying the engine

Deploying the engine

Once the engine is built, it can be deployed using the Triton inference server and its TensorRTLLM backend.

Follow the official documentation.