Skip to main content
POST
/
v1
/
embeddings
Create embeddings
curl --request POST \
  --url https://api.studio.nebius.com/v1/embeddings \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "BAAI/bge-en-icl",
  "input": "What'\''s a nice vector, Victor?",
  "encoding_format": "<string>",
  "user": "<string>",
  "service_tier": "auto"
}'
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        0.0023064255,
        -0.009327292,
        -0.0028842222
      ],
      "index": 0
    }
  ],
  "model": "BAAI/bge-en-icl",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

ai_project_id
string | null

current project ID

Body

application/json
model
string
required

ID of the model to use.

Examples:

"BAAI/bge-en-icl"

input
required

Input text to embed, encoded as a string or array of tokens.

Examples:

"What's a nice vector, Victor?"

encoding_format
string | null
default:float

The format to return the embeddings in. Can be either float or base64.

user
string | null

A unique identifier representing your end-user.

service_tier
enum<string> | null
default:auto

The service tier to use for the request. Represents the service tier for requests.

Attributes: Auto: Automatically choose the best available tier for the request (Default or OverLimit). Analyze response to determine which tier was used. Default: Return 429 errors on hitting the rate limit, do not exceed to the OverLimit tier. OverLimit: Indicate that the request was over the user limit. This tier cannot be set by user in the request, but us used in a response for tier=Auto. Flex: Do not consume rate-limit credits, but run with lower priority. May still result in 429 errors in case of if there is no resources to process.

Available options:
auto,
default,
over-limit,
flex
Examples:

"auto"

"flex"

Response

OK

object
string
required

always 'list'.

model
string
required

The model used for the embedding.

usage
object
required

Token usage stats.

data
Embedding · object[]
required

List of Embedding objects

service_tier
enum<string>
required

The service tier used for the request. Represents the service tier for requests.

Attributes: Auto: Automatically choose the best available tier for the request (Default or OverLimit). Analyze response to determine which tier was used. Default: Return 429 errors on hitting the rate limit, do not exceed to the OverLimit tier. OverLimit: Indicate that the request was over the user limit. This tier cannot be set by user in the request, but us used in a response for tier=Auto. Flex: Do not consume rate-limit credits, but run with lower priority. May still result in 429 errors in case of if there is no resources to process.

Available options:
auto,
default,
over-limit,
flex