stream() method is used to make streaming chat completion requests to the Edgee AI Gateway. It returns a generator that yields StreamChunk objects as they arrive from the API.
Arguments
| Parameter | Type | Description |
|---|---|---|
model | str | The model identifier to use (e.g., "gpt-4o") |
input | str | InputObject | dict | The input for the completion. Can be a simple string or a structured InputObject or dictionary |
Input Types
String Input
Wheninput is a string, it’s automatically converted to a user message:
InputObject or Dictionary
Wheninput is an InputObject or dictionary, you have full control over the conversation:
| Property | Type | Description |
|---|---|---|
messages | list[dict] | Array of conversation messages |
tools | list[dict] | None | Array of function tools available to the model |
tool_choice | str | dict | None | Controls which tool (if any) the model should call. See Tools documentation for details |
Message type, see the Send Method documentation.
For details about Tool and ToolChoice types, see the Tools documentation.
Example - Streaming with Messages:
Return Value
Thestream() method returns a generator that yields StreamChunk objects. Each chunk contains incremental updates to the response.
StreamChunk Object
Each chunk yielded by the generator has the following structure:| Property | Type | Description |
|---|---|---|
choices | list[StreamChoice] | Array of streaming choices (typically one) |
StreamChoice Object
Each choice in thechoices array contains:
| Property | Type | Description |
|---|---|---|
index | int | The index of this choice in the array |
delta | StreamDelta | The incremental update to the message |
finish_reason | str | None | Reason why the generation stopped. Only present in the final chunk. Possible values: "stop", "length", "tool_calls", "content_filter", or None |
StreamDelta Object
Thedelta object contains incremental updates:
| Property | Type | Description |
|---|---|---|
role | str | None | The role of the message (typically "assistant"). Only present in the first chunk |
content | str | None | Incremental text content. Each chunk contains a portion of the full response |
tool_calls | list[dict] | None | Array of tool calls (if any). See Tools documentation for details |
Convenience Properties
TheStreamChunk class provides convenience properties for easier access:
| Property | Type | Description |
|---|---|---|
text | str | None | Shortcut to choices[0].delta.content - the incremental text content |
role | str | None | Shortcut to choices[0].delta.role - the message role (first chunk only) |
finish_reason | str | None | Shortcut to choices[0].finish_reason - the finish reason (final chunk only) |
Understanding Streaming Behavior
Chunk Structure
- First chunk: Contains
role(typically"assistant") and may contain initialcontent - Content chunks: Contain incremental
contentupdates - Final chunk: Contains
finish_reasonindicating why generation stopped
Finish Reasons
| Value | Description |
|---|---|
"stop" | Model generated a complete response and stopped naturally |
"length" | Response was cut off due to token limit |
"tool_calls" | Model requested tool/function calls |
"content_filter" | Content was filtered by safety systems |
None | Generation is still in progress (not the final chunk) |
Empty Chunks
Some chunks may not containcontent. This is normal and can happen when:
- The chunk only contains metadata (role, finish_reason)
- The chunk is part of tool call processing
- Network buffering creates empty chunks
chunk.text before using it:
Alternative: Using send() with stream=True
You can also use thesend() method with stream=True to get streaming responses:
stream() method is a convenience wrapper that calls send() with stream=True.
Error Handling
Thestream() method can raise exceptions: