Streaming Response

A streaming response is a mode where the language model sends generated tokens incrementally rather than waiting for a full message before returning it. This creates a more responsive, real-time feel. For conversational interfaces and live tools, streaming helps users see progress quickly, improves perceived performance, and allows early interruption, while still relying on strong prompts and guardrails to guide overall message quality. A streaming response is a mode where the model sends generated tokens incrementally as they are produced instead of waiting for the full message. Users see text appear in real time. For conversational interfaces and live tools, streaming reduces perceived latency and makes AI interactions feel more responsive and interactive.