Quick Answer
- Official capabilities include a 1M context window, up to 64K output, plus function calling, structured output, search tools, and code execution.
- Compared with classic lightweight Flash tiers, it is better suited to complex Q&A, coding, long-document handling, and agent workflows.
- The recommended integration path remains OpenAI-compatible chat so existing SDKs, SSE streaming, and retry middleware can be reused.
Common Errors
- 400 invalid_request_error: trigger=Missing required fields, malformed messages, or invalid parameter types in the payload.; fix=Validate model, messages, max_tokens, and any tools/schema JSON before retrying.; retry=Retry only after fixing the payload; avoid blind retries.
- 401 authentication_error: trigger=Missing Authorization header, malformed bearer token, or invalid API key.; fix=Verify Authorization: Bearer <YOUR_API_KEY> format and key validity.; retry=Retry after authentication is fixed.
- 429 rate_limit_error: trigger=Request rate, concurrency, or quota usage has hit upstream rate limiting.; fix=Apply exponential backoff and inspect concurrency, context size, and current quota consumption.; retry=Use 1s/2s/4s backoff with jitter; reduce concurrency or downgrade workload shape if it persists.
- 500 internal_error: trigger=Transient upstream instability, tool execution failure, or internal processing issues.; fix=Capture request id and a compact context summary, then retry; escalate if failures persist.; retry=Retry 2-3 times with short delays.