Querying a Mistral Model

This page explains how to query a Mistral model via API once you have deployed your LLM (large language model) service from OUTSCALE Marketplace. For more information about deploying a Mistral LLM service, see Deploying a Mistral LLM Service.

Deployed models expose a REST API that you can query using plain HTTP calls. To run your queries, you must set the following environment variables:

OUTSCALE_SERVER_URL, which is the URL of the virtual machine (VM) hosting your Mistral model.

Your server’s URL follows this pattern: http://${server}:5000.
OUTSCALE_MODEL_NAME, which is the name of the model to query.
- For the Small model, the model name to specify is small-2409.
- For the Codestral model, the model name to specify is codestral-2405.
- For the Ministral 8B model, the model name to specify is ministral-8b-2410.

The following examples use cURL and Python commands to query a model. For more information about the different types of query that you can use to query a Mistral model, see the official Mistral documentation.

Querying a Model in Chat Completion Mode

You can use the following commands to query you model for text generation tasks:

cURL (Completion)

$ echo $OUTSCALE_SERVER_URL/v1/chat/completions
$ echo $OUTSCALE_MODEL_NAME
$ curl --location $OUTSCALE_SRV_URL/v1/chat/completions \
  --header "Content-Type: application/json" \
  --header "Accept: application/json" \
  --data '{
      "model": "$OUTSCALE_MODEL_NAME",
      "temperature": 0,
      "messages": [
        {"role": "user", "content": "Who is the best French painter? Answer in one short sentence."}
      ],
      "stream": false
    }'

Python (Completion)

import os
from mistralai import Mistral

client = Mistral(server_url="OUTSCALE_SERVER_URL")

resp = client.chat.complete(
    model="OUTSCALE_MODEL_NAME",
    messages=[
        {
            "role": "user",
            "content": "Who is the best French painter? Answer in one short sentence.",
        }
    ],
    temperature=0
)

print(resp.choices[0].message.content)

Querying a Codestral Model in FIM Mode

You can use the following commands to query your Codestral model in FIM (fill-in-the-middle) mode, for code generation tasks:

cURL (FIM)

$  curl --location $OUTSCALE_SERVER_URL/v1/fim/completions \
   --header "Content-Type: application/json" \
   --header "Accept: application/json" \
   --data '{
       "model": "$OUTSCALE_MODEL_NAME",
       "prompt": "def count_words_in_file(file_path: str) -> int:",
       "suffix": "return n_words",
       "stream": false
     }'

Python (FIM)

 import os
 from mistralai import Mistral

 client = Mistral(server_url="OUTSCALE_SERVER_URL")

 resp = client.fim.complete(
     model = "OUTSCALE_MODEL_NAME",
     prompt="def count_words_in_file(file_path: str) -> int:",
     suffix="return n_words"
 )

 print(resp.choices[0].message.content)

Related Page

Official Mistral documentation - Clients