Quick Answer
- Best for high-complexity reasoning, planning, and code analysis workflows.
- Uses OpenAI-compatible format: POST /v1/chat/completions for low-friction SDK migration.
- Supports stream=true SSE output for IDE copilots and real-time assistants.
Key Parameters
- model | string | required | gpt-5-4-mini | - | Model ID for this page (for example gpt-5-4-mini).
- messages | object[] | required | - | - | Conversation messages in chronological order with system/user/assistant roles.
- max_tokens | integer | optional | - | >=1 | Maximum output tokens (model default applies when omitted).
- stream | boolean | optional | false | - | Whether to enable SSE streaming output.
- temperature | number | optional | 1 | 0-2 | Sampling temperature controlling randomness.
- top_p | number | optional | 1 | 0-1 | Nucleus sampling threshold; avoid aggressively tuning with temperature together.
- stop | string | string[] | optional | - | - | Stop sequence(s), up to 4 entries.
- Authorization | HTTP Header | required | - | - | Bearer auth: Authorization: Bearer <YOUR_API_KEY>.
Common Errors
- 400 invalid_request_error: trigger=Missing required fields or invalid field types in payload.; fix=Validate model, messages, and parameter types.; retry=Retry only after fixing payload.
- 401 authentication_error: trigger=Missing/invalid auth header or invalid API key.; fix=Verify Authorization header format and key validity.; retry=Retry after auth is fixed.
- 429 rate_limit_error: trigger=Request rate, concurrency, or current quota hits upstream rate limiting.; fix=Apply exponential backoff first, then review request rate, concurrency, and quota usage.; retry=Use 1s/2s/4s backoff with jitter; if it persists, reduce submission pressure.

