You can create datasets for training a model and validating the results of the training. Nebius AI Studio supports the JSON Lines format (.jsonl) for dataset files. You can use one of the following dataset types: The size limit for dataset files is 5 gigabytes.

Conversational data

You can train a model by using chats. Pass along every chat as a single messages line. The following is an example of how a single messages parameter can look:
{
  "messages": [
    {"role": "system", "content": "This is a system prompt."},
    {"role": "user", "content": "What's for dinner tonight?"},
    {"role": "assistant", "content": "What cuisine do you prefer?"},
    {"role": "user", "content": "I wouldn't mind Italian."},
    {"role": "assistant", "content": "Then let's try a new pasta!"}
  ]
}
Examples of conversational datasets:

Instruction data

You can specify prompts and the expected answers to them:
{"prompt": "Capital of Australia", "completion": "Canberra"}
{"prompt": "Distance between Rome and Kuala Lumpur", "completion": "9703 km"}
Example of an instruction dataset: Andzej-75/German_RisingWorld_prompt-text-rejected_Jsonl.

Text data

If you have unstructured data, you can put each piece in a text line:
{"text": "assistant: How can I help you?\n user: I need my current balance statement.\n assistant: Please find your statement in the attachment."}
{"text": "user: Can I open a bank account?\n assistant: Sure! Please fill out the following form."}
Example of a text dataset: FranciscoMacaya/train_model.jsonl.