推荐 endpoint
最小请求
{
"model": "gemini-2.5-pro",
"contents": [
{
"role": "user",
"parts": [{ "text": "比较三种缓存架构的取舍,并给出推荐。" }]
}
],
"generationConfig": {
"thinkingConfig": {
"thinkingBudget": 1024
}
}
}
cURL 示例
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [
{
"role": "user",
"parts": [{ "text": "比较三种缓存架构的取舍,并给出推荐。" }]
}
],
"generationConfig": {
"thinkingConfig": {
"thinkingBudget": 1024
}
}
}'
Python 示例
import requests
response = requests.post(
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent",
headers={
"x-goog-api-key": "<GEMINI_API_KEY>",
"Content-Type": "application/json",
},
json={
"contents": [
{
"role": "user",
"parts": [{"text": "比较三种缓存架构的取舍,并给出推荐。"}],
}
],
"generationConfig": {
"thinkingConfig": {
"thinkingBudget": 1024,
}
},
},
timeout=60,
)
response.raise_for_status()
print(response.json()["candidates"][0]["content"]["parts"][0]["text"])
Node.js 示例
const response = await fetch(
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent",
{
method: "POST",
headers: {
"x-goog-api-key": process.env.GEMINI_API_KEY,
"Content-Type": "application/json"
},
body: JSON.stringify({
contents: [
{
role: "user",
parts: [{ text: "比较三种缓存架构的取舍,并给出推荐。" }]
}
],
generationConfig: {
thinkingConfig: {
thinkingBudget: 1024
}
}
})
}
);
const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
最佳实践
- 高质量推理优先用 Pro 类模型
thinkingBudget要按任务复杂度逐步加,不要一开始拉满- 对低延迟路径保留一个
thinkingBudget: 0的降级配置
