- Your own custom models. In this case, prepare an archive with model files or a link to these model files hosted on Hugging Face.
- Models fine-tuned in Nebius AI Studio. In this case, you deploy results of fine-tuning.
Prerequisites
If you want to use a Python script or cURL commands, meet the requirements below. If you want to work in the web interface, no prerequisites are needed.- Create an API key for authentication.
-
Save the API key to an environment variable:
-
Install the
openai
package:
How to deploy a custom model
If you have your own custom model that is not being hosted, you can deploy it in Nebius AI Studio:Run the following Python script:The script stages are the following:
-
Uploads an archive with the LoRA adapter weights and configuration (
adapter_model.safetensors
andadapter_config.json
). Alternatively, you can use a link to a Hugging Face repository with the LoRA adapter model files. In this case, you don’t need to call theupload_file
method. -
Deploys the custom model. In the
create_lora_from_file
method, specify the following:- LoRA adapter name, for example,
test-adapter
. - Hugging Face link or the ID of the uploaded archive.
- Base model name. Select it from the list of available models.
- LoRA adapter name, for example,
-
Waits for the custom model to be validated.
The model first receives the
validating
status. When the model is validated, the status changes toactive
. If the uploaded data is invalid, the status changes toerror
and thestatus_reason
contains the error message. -
Gets the custom model name.
The name of a deployed model is composed of the following components:
- Base model name.
- LoRA adapter name.
- Random suffix that is used to make the model name unique.
meta-llama/Llama-3.1-8B-Instruct
base model and thetest-adapter
name for a LoRA adapter, the name of the custom model ismeta-llama/Llama-3.1-8B-Instruct-LoRa:test-adapter-AbCd
. - Creates a multi-message request by using the custom model name.
- In the web interface, go to the Models section and switch to the Custom tab.
- Click Deploy your LoRA.
- In the window that opens, specify the deployment settings:
- Select a base model.
- Enter a LoRA adapter name.
- If you have uploaded the LoRA adapter model to Hugging Face, select Add by link and enter the link to the model.
- If you have an archive with the LoRA adapter model weights and configuration (
adapter_model.safetensors
andadapter_config.json
), select Upload. Next, drag and drop the archive to the window. The archive size should not exceed 500 MB.
- Click Start deployment.
- Base model name.
- LoRA adapter name.
- Random suffix that is used to make the model name unique.
meta-llama/Llama-3.1-8B-Instruct
base model and the test-adapter
name for a LoRA adapter, the name of the custom model is meta-llama/Llama-3.1-8B-Instruct-LoRa:test-adapter-AbCd
.
How to deploy a model fine-tuned in Nebius AI Studio
After you fine-tune a model in Nebius AI Studio, you can deploy the resulting model.Run the following Python script:The script stages are the following:
-
Deploys a fine-tuned model from a specific fine-tuning job. In the
create_lora_from_job
method, specify the following:-
Name for the LoRA adapter, for example,
test-adapter
. -
Fine-tuning job ID. You can find this ID in the response when you create a fine-tuning job:
-
Checkpoint ID. You can get this ID from a successful fine-tuning job:
- Base model name that you set when you created the fine-tuning job.
-
Name for the LoRA adapter, for example,
-
Waits for the custom model to be validated. The model first receives the
validating
status. When the model is validated, the status changes toactive
. If the uploaded data is invalid, the status changes toerror
andstatus_reason
contains the error message. -
Gets the custom model name.
The name of a deployed model is composed of the following components:
- Base model name.
- LoRA adapter name.
- Random suffix that is used to make the model name unique.
meta-llama/Llama-3.1-8B-Instruct
base model and thetest-adapter
name for a LoRA adapter, the name of the custom model ismeta-llama/Llama-3.1-8B-Instruct-LoRa:test-adapter-AbCd
. - Creates a multi-message request by using the custom model name.