Skip to main content

Mistral AI Inference Server (0.1.0)

Download OpenAPI specification:Download

This is an inference server for Mistral AI's large language models, provided by vLLM.

Show Available Models

Show available models. Right now we are only serving one model at a time.

Responses

Response samples

Content type
application/json
{
  • "object": "list",
  • "data": [
    ]
}

Create Chat Completion

Completion API similar to OpenAI's API.

See https://platform.openai.com/docs/api-reference/chat/create for the full API specification. This API mimics the OpenAI ChatCompletion API.

NOTE: Currently we do not support the following features: - function_call (Users should implement this by themselves) - logit_bias (to be supported by vLLM engine)

Request Body schema: application/json
model
required
string (Model)
required
Messages (string) or Array of Messages (objects) (Messages)
temperature
number (Temperature)
Default: 0.7
top_p
number (Top P)
Default: 1
n
integer (N)
Default: 1
max_tokens
integer (Max Tokens)
Default: 8192
Array of Stop (strings) or Stop (string) (Stop)
stream
boolean (Stream)
Default: false
presence_penalty
number (Presence Penalty)
Default: 0
frequency_penalty
number (Frequency Penalty)
Default: 0
object (Logit Bias)
user
string (User)
best_of
integer (Best Of)
top_k
integer (Top K)
Default: -1
ignore_eos
boolean (Ignore Eos)
Default: false
use_beam_search
boolean (Use Beam Search)
Default: false

Responses

Request samples

Content type
application/json
{
  • "model": "string",
  • "messages": "string",
  • "temperature": 0.7,
  • "top_p": 1,
  • "n": 1,
  • "max_tokens": 8192,
  • "stop": [
    ],
  • "stream": false,
  • "presence_penalty": 0,
  • "frequency_penalty": 0,
  • "logit_bias": {
    },
  • "user": "string",
  • "best_of": 0,
  • "top_k": -1,
  • "ignore_eos": false,
  • "use_beam_search": false
}

Response samples

Content type
application/json
{
  • "id": "cmpl-2759a099e3c9429ca88b66b8ab9a9965",
  • "object": "chat.completion",
  • "created": 1695318445,
  • "model": "mistralai/Mistral-7B-v0.1",
  • "choices": [
    ],
  • "usage": {
    }
}

Create Completion

Completion API similar to OpenAI's API.

See https://platform.openai.com/docs/api-reference/completions/create for the API specification. This API mimics the OpenAI Completion API.

NOTE: Currently we do not support the following features: - suffix (the language models we currently support do not support suffix) - logit_bias (to be supported by vLLM engine)

Request Body schema: application/json
model
required
string (Model)
required
Array of Prompt (integers) or Array of Prompt (integers) or Prompt (string) or Array of Prompt (strings) (Prompt)
suffix
string (Suffix)
max_tokens
integer (Max Tokens)
Default: 16
temperature
number (Temperature)
Default: 1
top_p
number (Top P)
Default: 1
n
integer (N)
Default: 1
stream
boolean (Stream)
Default: false
logprobs
integer (Logprobs)
echo
boolean (Echo)
Default: false
Array of Stop (strings) or Stop (string) (Stop)
presence_penalty
number (Presence Penalty)
Default: 0
frequency_penalty
number (Frequency Penalty)
Default: 0
best_of
integer (Best Of)
object (Logit Bias)
user
string (User)
top_k
integer (Top K)
Default: -1
ignore_eos
boolean (Ignore Eos)
Default: false
use_beam_search
boolean (Use Beam Search)
Default: false

Responses

Request samples

Content type
application/json
{
  • "model": "string",
  • "prompt": [
    ],
  • "suffix": "string",
  • "max_tokens": 16,
  • "temperature": 1,
  • "top_p": 1,
  • "n": 1,
  • "stream": false,
  • "logprobs": 0,
  • "echo": false,
  • "stop": [
    ],
  • "presence_penalty": 0,
  • "frequency_penalty": 0,
  • "best_of": 0,
  • "logit_bias": {
    },
  • "user": "string",
  • "top_k": -1,
  • "ignore_eos": false,
  • "use_beam_search": false
}

Response samples

Content type
application/json
{
  • "id": "cmpl-605c15936dd441aeb08f765035c9b88e",
  • "object": "text_completion",
  • "created": 1695660046,
  • "model": "mistralai/Mistral-7B-v0.1",
  • "choices": [
    ],
  • "usage": {
    }
}

Health Liveness Check

Responses

Response samples

Content type
application/json
null

Health Readiness Check

Responses

Response samples

Content type
application/json
null