Skip to main content

Minimal request

{
  "model": "gemini-2.5-flash",
  "contents": [
    {
      "role": "user",
      "parts": [{ "text": "Explain SSE streaming while streaming the answer." }]
    }
  ]
}

cURL example

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Explain SSE streaming while streaming the answer." }]
      }
    ]
  }'

Python example

from google import genai

client = genai.Client(api_key="<GEMINI_API_KEY>")

stream = client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Explain SSE streaming while streaming the answer.",
)

for chunk in stream:
    if chunk.text:
        print(chunk.text, end="")

Node.js example

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const stream = await ai.models.generateContentStream({
  model: "gemini-2.5-flash",
  contents: "Explain SSE streaming while streaming the answer."
});

for await (const chunk of stream) {
  if (chunk.text) process.stdout.write(chunk.text);
}

Best practices

  • Accumulate parts[].text incrementally instead of assuming complete sentences per chunk
  • Structured output can also be streamed, but parse partial JSON server-side
  • Flash-class models are usually the best fit for latency-sensitive streaming chat