Skip to main content

Minimal request

{
  "model": "o4-mini",
  "input": "Compare three cache architectures and recommend one.",
  "reasoning": {
    "effort": "medium"
  },
  "max_output_tokens": 1200
}

cURL example

curl https://mass.apigo.ai/v1/responses \
  -H "Authorization: Bearer $TIDEMIND_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o4-mini",
    "input": "Compare three cache architectures and recommend one.",
    "reasoning": {
      "effort": "medium"
    },
    "max_output_tokens": 1200
  }'

Python example

from openai import OpenAI

client = OpenAI(
    base_url="https://mass.apigo.ai/v1",
    api_key="<TIDEMIND_API_KEY>",
)

response = client.responses.create(
    model="o4-mini",
    input="Compare three cache architectures and recommend one.",
    reasoning={"effort": "medium"},
    max_output_tokens=1200,
)

print(response.output_text)

Node.js example

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://mass.apigo.ai/v1",
  apiKey: process.env.TIDEMIND_API_KEY,
});

const response = await client.responses.create({
  model: "o4-mini",
  input: "Compare three cache architectures and recommend one.",
  reasoning: { effort: "medium" },
  max_output_tokens: 1200,
});

console.log(response.output_text);

Best practices

  • Control reasoning.effort explicitly for cost and latency
  • Ask for decision-ready outputs rather than raw chain-of-thought
  • Track latency and token cost separately from normal chat traffic