TensorRT-LLM // Triton
Building the engine
Follow the official TensorRT-LLM documentation to build the engine.
- For Mistral-7B, you can use the LLaMA example
- For Mixtral-8X7B, official documentation coming soon...
Deploying the engine
Once the engine is built, it can be deployed using the Triton inference server and its TensorRTLLM backend.
Follow the official documentation.