LlamaIndex supports several features for text-to-text models. For instance, you can send a request with a chat history or get a streaming response.

Prerequisites

  1. Create an API key for authentication.
  2. Save the API key to an environment variable:
    export NEBIUS_API_KEY=<API_key>
    
  3. Install LlamaIndex packages:
    pip3 install llama-index-llms-nebius llama-index
    

Prepare a script

  1. Copy the following part of the script:
    from llama_index.llms.nebius import NebiusLLM
    import os
    
    # Take the API key from the environment variable
    NEBIUS_API_KEY = os.getenv("NEBIUS_API_KEY")
    
    # Load an LLM
    llm = NebiusLLM(
        api_key=NEBIUS_API_KEY, model="meta-llama/Meta-Llama-3.1-70B-Instruct-fast"
    )
    
  2. Add one of the following methods, depending on your use case:
    Use caseDescriptionHow to implement
    PromptAsk a question as a prompt.Call the llm.complete("<prompt>") method.
    Multi-message requestInclude system prompts and a chat history to your request, so Nebius AI Studio returns more precise output.Call the llm.chat(messages) method and pass along a list of messages in it. To prepare messages, call the ChatMessage() method. For more details, see Examples.
    Streaming outputOutput is printed out word by word. This can be helpful for chats, so the user can watch how the answer is being typed gradually.To make the output streaming, call the llm.stream_complete("<prompt>") method. Next, print out the response:
    for r in response:
    print(r.delta, end="")
    Multi-message request with streaming outputAdd system prompts and a chat history and receive the streaming output.Call the llm.stream_chat(messages) method and pass along a list of messages in it.

Examples

Multi-message request

The script below adds a chat to the request and asks in the last message to write a poem:
from llama_index.llms.nebius import NebiusLLM
from llama_index.core.llms import ChatMessage

import os

# Take the API key from the environment variable
NEBIUS_API_KEY = os.getenv("NEBIUS_API_KEY")

# Load an LLM
llm = NebiusLLM(
    api_key=NEBIUS_API_KEY, model="meta-llama/Meta-Llama-3.1-70B-Instruct-fast"
)

# Prepare a list of messages, put a request into one of them and then get a response
messages = [
    ChatMessage(role="system", content="You are a helpful AI assistant."),
    ChatMessage(
        role="user",
        content="Write a poem about a smart AI robot named Wall-e.",
    ),
]
response = llm.chat(messages)
print(response)
The output is the following:
assistant: In a world of wires and circuitry bright,
A small robot shone with a heart so light,
Wall-e, the wanderer, with a soul so true,
A smart AI mind, with a spirit anew.

...

Prompt request with streaming output

The script below sends the prompt Amsterdam is the capital of and returns the output that is printed out word by word:
from llama_index.llms.nebius import NebiusLLM
from llama_index.core.llms import ChatMessage

import os

# Take the API key from the environment variable
NEBIUS_API_KEY = os.getenv("NEBIUS_API_KEY")

# Load an LLM
llm = NebiusLLM(
    api_key=NEBIUS_API_KEY, model="meta-llama/Meta-Llama-3.1-70B-Instruct-fast"
)

# Send a request as a prompt and get a response
response = llm.stream_complete("Amsterdam is the capital of ")
for r in response:
    print(r.delta, end="")
The output is the following:
The Netherlands!