You can deploy serverless LoRA adapter models in Nebius AI Studio with per-token billing. These models include:
  • Your own custom models. In this case, prepare an archive with model files or a link to these model files hosted on Hugging Face.
  • Models fine-tuned in Nebius AI Studio. In this case, you deploy results of fine-tuning.
For more information, see the list of supported base models. Nebius AI Studio only supports text-to-text models for deployment.

Prerequisites

If you want to use a Python script or cURL commands, meet the requirements below. If you want to work in the web interface, no prerequisites are needed.
  1. Create an API key for authentication.
  2. Save the API key to an environment variable:
    export NEBIUS_API_KEY=<API_key>
    
  3. Install the openai package:
    pip3 install openai
    

How to deploy a custom model

If you have your own custom model that is not being hosted, you can deploy it in Nebius AI Studio:
Run the following Python script:
import requests
import os
from openai import OpenAI
import time

api_key=os.environ.get('NEBIUS_API_KEY')
api_url="https://api.studio.nebius.com"

client = OpenAI(
    base_url=api_url+"/v1",
    api_key=api_key,
)

# Upload a model to the Nebius AI Studio server
def upload_file(file_name):
    with open(file_name, "rb") as file_data:
        files = {"file": (os.path.basename(file_name), file_data)}
        upload_response = requests.post(
            f"{api_url}/v0/models/upload",
            files=files,
            headers={"Authorization": f"Bearer {api_key}"}
        )

    # Check if the upload was successful
    if upload_response.status_code != 200:
        print(f"Error uploading file: {upload_response.text}")
        return upload_response.json()

    # Get and return the uploaded file ID
    file_info = upload_response.json()
    file_id = file_info["id"]
    print(f"File uploaded successfully with ID: {file_id}")
    return file_id

# Deploy a LoRA adapter model by using the uploaded file
def create_lora_from_file(name, file_id, base_model):
    # Set up the request
    lora_creation_request = {
        "source": file_id,
        "base_model": base_model,
        "name": name,
        "description": "description"
    }

    # Send the request
    response = requests.post(
        f"{api_url}/v0/models",
        json=lora_creation_request,
        headers={
            "Content-Type": "application/json",
            "Authorization": f"Bearer {api_key}"
        }
    )

    return(response.json())

# Get the custom model information, check its status and wait for the validation to complete
def wait_for_validation(name, delay=5):
    while True:
        time.sleep(delay)
        lora_info=requests.get(
            f"{api_url}/v0/models/{name}",
            headers={
                "Content-Type": "application/json",
                "Authorization": f"Bearer {api_key}"
            }
        ).json()
        if lora_info.get("status") in {"active", "error"}:
            return lora_info

# Create a multi-message request
def get_completion(model):
    completion = client.chat.completions.create(
        model=model,
        messages=[{"role":"user","content":"Hello"}],
    )
    return completion.choices[0].message.content

# Upload an archive with adapter_config.json and adapter_model.safetensors
zip_file_name="<path/to/archive.zip>"
file_id = upload_file(zip_file_name)

# Deploy a LoRA adapter model
lora_name=create_lora_from_file("<LoRA_adapter_name>", file_id, "<base_model_name>").get("name")

# Check the custom model status
lora_info = wait_for_validation(lora_name)

# If the custom model validation is successful, create a multi-message request with this model
if lora_info.get("status") == "active":
    print(get_completion(lora_name))
# If there is an error, display the reason
elif lora_info.get("status") == "error":
    print(f"An error occurred: {lora_info["status_reason"]}")

The script stages are the following:
  1. Uploads an archive with the LoRA adapter weights and configuration (adapter_model.safetensors and adapter_config.json). Alternatively, you can use a link to a Hugging Face repository with the LoRA adapter model files. In this case, you don’t need to call the upload_file method.
  2. Deploys the custom model. In the create_lora_from_file method, specify the following:
    • LoRA adapter name, for example, test-adapter.
    • Hugging Face link or the ID of the uploaded archive.
    • Base model name. Select it from the list of available models.
  3. Waits for the custom model to be validated. The model first receives the validating status. When the model is validated, the status changes to active. If the uploaded data is invalid, the status changes to error and the status_reason contains the error message.
  4. Gets the custom model name. The name of a deployed model is composed of the following components:
    • Base model name.
    • LoRA adapter name.
    • Random suffix that is used to make the model name unique.
    For example, if you use the meta-llama/Llama-3.1-8B-Instruct base model and the test-adapter name for a LoRA adapter, the name of the custom model is meta-llama/Llama-3.1-8B-Instruct-LoRa:test-adapter-AbCd.
  5. Creates a multi-message request by using the custom model name.
  1. In the web interface, go to the Models section and switch to the Custom tab.
  2. Click  Deploy your LoRA.
  3. In the window that opens, specify the deployment settings:
    1. Select a base model.
    2. Enter a LoRA adapter name.
    3. If you have uploaded the LoRA adapter model to Hugging Face, select Add by link and enter the link to the model.
    4. If you have an archive with the LoRA adapter model weights and configuration (adapter_model.safetensors and adapter_config.json), select Upload. Next, drag and drop the archive to the window. The archive size should not exceed 500 MB.
  4. Click Start deployment.
Once the model is validated and deployed, it becomes available on the Custom tab. The name of a deployed model is composed of the following components:
  • Base model name.
  • LoRA adapter name.
  • Random suffix that is used to make the model name unique.
For example, if you use the meta-llama/Llama-3.1-8B-Instruct base model and the test-adapter name for a LoRA adapter, the name of the custom model is meta-llama/Llama-3.1-8B-Instruct-LoRa:test-adapter-AbCd.

How to deploy a model fine-tuned in Nebius AI Studio

After you fine-tune a model in Nebius AI Studio, you can deploy the resulting model.
Run the following Python script:
import requests
import os
from openai import OpenAI
import time

api_key=os.environ.get('NEBIUS_API_KEY')
api_url="https://api.studio.nebius.com"

client = OpenAI(
    base_url=api_url+"/v1",
    api_key=api_key,
)

# Deploy a LoRA adapter model by using a fine-tuning job
def create_lora_from_job(name, ft_job, ft_checkpoint, base_model):
    # Set up the request
    fine_tuning_result=ft_job+":"+ft_checkpoint;
    lora_creation_request = {
        "source": fine_tuning_result,
        "base_model": base_model,
        "name": name,
        "description": "description"
    }

    # Send the request
    response = requests.post(
        f"{api_url}/v0/models",
        json=lora_creation_request,
        headers={"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
    )

    return(response.json())

# Get the custom model information, check its status and wait for the validation to complete
def wait_for_validation(name, delay=5):
    while True:
        time.sleep(delay)
        lora_info=requests.get(
            f"{api_url}/v0/models/{name}",
            headers={"Content-Type": "application/json","Authorization": f"Bearer {api_key}"}
        ).json()
        if lora_info.get("status") in {"active", "error"}:
            return lora_info

# Create a multi-message request
def get_completion(model):
    completion = client.chat.completions.create(
        model=model,
        messages=[{"role":"user","content":"Hello"}],
    )
    return completion.choices[0].message.content

# Deploy a LoRA adapter model by using IDs of a fine-tuning job and its checkpoint
lora_name=create_lora_from_job("<model_name>", "<ftjob-***>", "<ftckpt_***>", "<base_model_name>").get("name")

# Check the custom model status
lora_info = wait_for_validation(lora_name)

# If the custom model validation is successful, create a multi-message request with this model
if lora_info.get("status") == "active":
    print(get_completion(lora_name))
# If there is an error, display the reason
elif lora_info.get("status") == "error":
    print(f"An error occurred: {lora_info["status_reason"]}")

The script stages are the following:
  1. Deploys a fine-tuned model from a specific fine-tuning job. In the create_lora_from_job method, specify the following:
    • Name for the LoRA adapter, for example, test-adapter.
    • Fine-tuning job ID. You can find this ID in the response when you create a fine-tuning job:
      job = client.fine_tuning.jobs.create(**job_request)
      job_id = job.id
      
      
    • Checkpoint ID. You can get this ID from a successful fine-tuning job:
      checkpoint_id = client.fine_tuning.jobs.checkpoints.list(job_id).data.checkpoint.id
      
      
    • Base model name that you set when you created the fine-tuning job.
  2. Waits for the custom model to be validated. The model first receives the validating status. When the model is validated, the status changes to active. If the uploaded data is invalid, the status changes to error and status_reason contains the error message.
  3. Gets the custom model name. The name of a deployed model is composed of the following components:
    • Base model name.
    • LoRA adapter name.
    • Random suffix that is used to make the model name unique.
    For example, if you use the meta-llama/Llama-3.1-8B-Instruct base model and the test-adapter name for a LoRA adapter, the name of the custom model is meta-llama/Llama-3.1-8B-Instruct-LoRa:test-adapter-AbCd.
  4. Creates a multi-message request by using the custom model name.

How to delete a deployed custom model

requests.delete(f"https://api.studio.nebius.com/v0/models/<custom_model_name>",
    headers={
        "Content-Type": "application/json",
        "Authorization": f"Bearer {os.environ.get('NEBIUS_API_KEY')}"
    })