Skip to main content

Minimal request

{
  "model": "gemini-2.5-pro",
  "contents": [
    {
      "role": "user",
      "parts": [{ "text": "Compare three cache architectures and recommend one." }]
    }
  ],
  "generationConfig": {
    "thinkingConfig": {
      "thinkingBudget": 1024
    }
  }
}

cURL example

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Compare three cache architectures and recommend one." }]
      }
    ],
    "generationConfig": {
      "thinkingConfig": {
        "thinkingBudget": 1024
      }
    }
  }'

Python example

import requests

response = requests.post(
    "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent",
    headers={
        "x-goog-api-key": "<GEMINI_API_KEY>",
        "Content-Type": "application/json",
    },
    json={
        "contents": [
            {
                "role": "user",
                "parts": [{"text": "Compare three cache architectures and recommend one."}],
            }
        ],
        "generationConfig": {
            "thinkingConfig": {
                "thinkingBudget": 1024,
            },
        },
    },
)

print(response.json())

Node.js example

const response = await fetch(
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent",
  {
    method: "POST",
    headers: {
      "x-goog-api-key": process.env.GEMINI_API_KEY,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      contents: [
        {
          role: "user",
          parts: [{ text: "Compare three cache architectures and recommend one." }]
        }
      ],
      generationConfig: {
        thinkingConfig: {
          thinkingBudget: 1024
        }
      }
    }),
  }
);

console.log(await response.json());

Best practices

  • Use Pro-class models for deeper reasoning quality
  • Increase thinkingBudget gradually instead of maxing it out immediately
  • Keep a thinkingBudget: 0 fallback path for latency-sensitive requests