Skip to main content
The stream() method is used to make streaming chat completion requests to the Edgee AI Gateway. It returns a generator that yields StreamChunk objects as they arrive from the API.

Arguments

ParameterTypeDescription
model strThe model identifier to use (e.g., "gpt-4o")
input str | InputObject | dictThe input for the completion. Can be a simple string or a structured InputObject or dictionary

Input Types

String Input

When input is a string, it’s automatically converted to a user message:
for chunk in edgee.stream("gpt-4o", "Tell me a story"):
    if chunk.text:
        print(chunk.text, end="", flush=True)
    
    if chunk.finish_reason:
        print(f"\nFinished: {chunk.finish_reason}")
# Equivalent to: input={"messages": [{"role": "user", "content": "Tell me a story"}]}

InputObject or Dictionary

When input is an InputObject or dictionary, you have full control over the conversation:
PropertyTypeDescription
messages list[dict]Array of conversation messages
toolslist[dict] | NoneArray of function tools available to the model
tool_choicestr | dict | NoneControls which tool (if any) the model should call. See Tools documentation for details
For details about Message type, see the Send Method documentation. For details about Tool and ToolChoice types, see the Tools documentation. Example - Streaming with Messages:
for chunk in edgee.stream("gpt-4o", {
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a poem about coding"}
    ]
}):
    if chunk.text:
        print(chunk.text, end="", flush=True)

Return Value

The stream() method returns a generator that yields StreamChunk objects. Each chunk contains incremental updates to the response.

StreamChunk Object

Each chunk yielded by the generator has the following structure:
PropertyTypeDescription
choiceslist[StreamChoice]Array of streaming choices (typically one)

StreamChoice Object

Each choice in the choices array contains:
PropertyTypeDescription
indexintThe index of this choice in the array
deltaStreamDeltaThe incremental update to the message
finish_reasonstr | NoneReason why the generation stopped. Only present in the final chunk. Possible values: "stop", "length", "tool_calls", "content_filter", or None
Example - Handling Multiple Choices:
for chunk in edgee.stream("gpt-4o", "Give me creative ideas"):
    for choice in chunk.choices:
        if choice.delta.content:
            print(f"Choice {choice.index}: {choice.delta.content}")

StreamDelta Object

The delta object contains incremental updates:
PropertyTypeDescription
rolestr | NoneThe role of the message (typically "assistant"). Only present in the first chunk
contentstr | NoneIncremental text content. Each chunk contains a portion of the full response
tool_callslist[dict] | NoneArray of tool calls (if any). See Tools documentation for details

Convenience Properties

The StreamChunk class provides convenience properties for easier access:
PropertyTypeDescription
textstr | NoneShortcut to choices[0].delta.content - the incremental text content
rolestr | NoneShortcut to choices[0].delta.role - the message role (first chunk only)
finish_reasonstr | NoneShortcut to choices[0].finish_reason - the finish reason (final chunk only)
Example - Using Convenience Properties:
for chunk in edgee.stream("gpt-4o", "Explain quantum computing"):
    # Content chunks
    if chunk.text:
        print(chunk.text, end="", flush=True)

    # First chunk contains the role
    if chunk.role:
        print(f"\nRole: {chunk.role}")

    # Last chunk contains finish reason
    if chunk.finish_reason:
        print(f"\nFinish reason: {chunk.finish_reason}")

Understanding Streaming Behavior

Chunk Structure

  1. First chunk: Contains role (typically "assistant") and may contain initial content
  2. Content chunks: Contain incremental content updates
  3. Final chunk: Contains finish_reason indicating why generation stopped
Example - Collecting Full Response:
full_text = ""

for chunk in edgee.stream("gpt-4o", "Tell me a story"):
    if chunk.text:
        full_text += chunk.text
        print(chunk.text, end="", flush=True)  # Also display as it streams

print(f"\n\nFull response ({len(full_text)} characters):")
print(full_text)

Finish Reasons

ValueDescription
"stop"Model generated a complete response and stopped naturally
"length"Response was cut off due to token limit
"tool_calls"Model requested tool/function calls
"content_filter"Content was filtered by safety systems
NoneGeneration is still in progress (not the final chunk)

Empty Chunks

Some chunks may not contain content. This is normal and can happen when:
  • The chunk only contains metadata (role, finish_reason)
  • The chunk is part of tool call processing
  • Network buffering creates empty chunks
Always check for chunk.text before using it:
for chunk in edgee.stream("gpt-4o", "Hello"):
    if chunk.text:  # ✅ Good: Check before using
        print(chunk.text)
    # ❌ Bad: print(chunk.text) - may print None

Alternative: Using send() with stream=True

You can also use the send() method with stream=True to get streaming responses:
for chunk in edgee.send("gpt-4o", "Tell me a story", stream=True):
    if chunk.text:
        print(chunk.text, end="", flush=True)
The stream() method is a convenience wrapper that calls send() with stream=True.

Error Handling

The stream() method can raise exceptions:
try:
    for chunk in edgee.stream("gpt-4o", "Hello!"):
        if chunk.text:
            print(chunk.text, end="", flush=True)
except RuntimeError as error:
    # API errors: "API error {status}: {message}"
    # Network errors: Standard HTTP errors
    print(f"Stream failed: {error}")