gemini-3.5-flash API Руководство по модели

Краткий вывод

Official capabilities include a 1M context window, up to 64K output, plus function calling, structured output, search tools, and code execution.
Compared with classic lightweight Flash tiers, it is better suited to complex Q&A, coding, long-document handling, and agent workflows.
The recommended integration path remains OpenAI-compatible chat so existing SDKs, SSE streaming, and retry middleware can be reused.

Ключевые возможности

High intelligence with Flash latency：It is not just a low-cost fallback tier; it aims to handle deeper understanding, planning, and generation while staying responsive.
1M long context：Well suited to long-document summarization, codebase analysis, knowledge synthesis, large prompts, and extended conversations.
Structured and tool-based output：Function calling, structured JSON, search tools, and code execution make it a stronger fit for agentic workflows.
Coding and technical generation：Useful for code explanation, unit test generation, refactor proposals, SDK wrappers, and technical drafts.
OpenAI-compatible migration path：Chat Completions-style payloads reduce switching friction from GPT- or Claude-oriented stacks.
Streaming interaction：SSE streaming supports chat UIs, terminal assistants, and IDE-style progressive rendering.

Когда использовать

When you need more reasoning and coding strength than a classic lightweight model without giving up too much responsiveness.
When handling long-context tasks such as document Q&A, codebase analysis, long-session memory, or retrieval-grounded answers.
When you need structured JSON, function calling, or search/tool integration for agents.

Когда не использовать

For ultra-simple, ultra-cheap bulk templating tasks where higher intelligence is unnecessary.
For pure image or video generation workloads; use dedicated media models instead.

Особенности работы

The recommended path is POST /v1/chat/completions using an OpenAI-compatible request shape.
stream=true returns SSE chunks, while stream=false returns a standard full completion object.
If your workflow depends on structured output or tools, validate schemas, tool choice, and fallback behavior on a small slice first.

Минимальный запрос

{
  "model": "gemini-3.5-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a senior full-stack engineer. Summarize the approach first, then provide the smallest runnable code."
    },
    {
      "role": "user",
      "content": "Write a Node.js function that reads a CSV, removes duplicate emails, and returns summary stats."
    }
  ],
  "temperature": 0.2,
  "max_tokens": 500,
  "stream": false
}

Минимальный ответ

{
  "id": "chatcmpl_xxxxxxxx",
  "object": "chat.completion",
  "created": 1747699200,
  "model": "gemini-3.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 120,
    "completion_tokens": 260,
    "total_tokens": 380
  }
}

Ключевые параметры

Параметр	Тип	Обяз.	По умолчанию	Диапазон	Описание
model	string	Да	gemini-3.5-flash	-	Model name. Use `gemini-3.5-flash`.
messages	object[]	Да	-	-	Conversation messages in chronological order, commonly using system, user, and assistant roles.
max_tokens	integer	Нет	-	>=1, practical cap should be set by your app	Maximum output tokens. The model can reach 64K output officially, but real applications should enforce scenario-specific limits.
stream	boolean	Нет	false	-	Whether to enable SSE streaming output.
temperature	number	Нет	1	0-2	Sampling temperature. Lower values are steadier; higher values are more diverse.
top_p	number	Нет	1	0-1	Nucleus sampling threshold; avoid aggressively tuning it together with temperature.
tools	object[]	Нет	-	-	Tool or function definitions for tool-calling and agent orchestration.
Authorization	HTTP Header	Да	-	-	Bearer authentication: Authorization: Bearer <YOUR_API_KEY>.

Частые ошибки

HTTP	Code	Триггер	Исправление	Повтор
400	invalid_request_error	Missing required fields, malformed messages, or invalid parameter types in the payload.	Validate model, messages, max_tokens, and any tools/schema JSON before retrying.	Retry only after fixing the payload; avoid blind retries.
401	authentication_error	Missing Authorization header, malformed bearer token, or invalid API key.	Verify Authorization: Bearer <YOUR_API_KEY> format and key validity.	Retry after authentication is fixed.
429	rate_limit_error	Request rate, concurrency, or quota usage has hit upstream rate limiting.	Apply exponential backoff and inspect concurrency, context size, and current quota consumption.	Use 1s/2s/4s backoff with jitter; reduce concurrency or downgrade workload shape if it persists.
500	internal_error	Transient upstream instability, tool execution failure, or internal processing issues.	Capture request id and a compact context summary, then retry; escalate if failures persist.	Retry 2-3 times with short delays.

FAQ

What is gemini-3.5-flash best for?
It is best for text and coding assistant tasks that need strong intelligence without giving up throughput and response speed.
How is it different from a classic Flash model?
The main difference is a higher intelligence ceiling, making it more suitable for complex Q&A, long context, and tool-enabled workflows.
What is the fastest way to integrate it?
Use the OpenAI-compatible Chat Completions path first so you can reuse existing messages, streaming, and retry middleware.
When should max_tokens be constrained carefully?
Constrain max_tokens carefully for long-context, coding, and structured-output tasks to avoid bloated responses, cost drift, or timeouts.

gemini-3.5-flash Полный разбор (Markdown)

gemini-3.5-flash API Руководство по модели

Краткий вывод

Ключевые возможности

Когда использовать

Когда не использовать

Особенности работы

Минимальный запрос

Минимальный ответ

Ключевые параметры

Частые ошибки

FAQ

Связанные API