gemini-3.5-flash Полный разбор (Markdown)

Назад к разбору модели

gemini-3.5-flash API Руководство по модели

Краткий вывод

  • Official capabilities include a 1M context window, up to 64K output, plus function calling, structured output, search tools, and code execution.
  • Compared with classic lightweight Flash tiers, it is better suited to complex Q&A, coding, long-document handling, and agent workflows.
  • The recommended integration path remains OpenAI-compatible chat so existing SDKs, SSE streaming, and retry middleware can be reused.

Ключевые возможности

  • High intelligence with Flash latency:It is not just a low-cost fallback tier; it aims to handle deeper understanding, planning, and generation while staying responsive.
  • 1M long context:Well suited to long-document summarization, codebase analysis, knowledge synthesis, large prompts, and extended conversations.
  • Structured and tool-based output:Function calling, structured JSON, search tools, and code execution make it a stronger fit for agentic workflows.
  • Coding and technical generation:Useful for code explanation, unit test generation, refactor proposals, SDK wrappers, and technical drafts.
  • OpenAI-compatible migration path:Chat Completions-style payloads reduce switching friction from GPT- or Claude-oriented stacks.
  • Streaming interaction:SSE streaming supports chat UIs, terminal assistants, and IDE-style progressive rendering.

Когда использовать

  • When you need more reasoning and coding strength than a classic lightweight model without giving up too much responsiveness.
  • When handling long-context tasks such as document Q&A, codebase analysis, long-session memory, or retrieval-grounded answers.
  • When you need structured JSON, function calling, or search/tool integration for agents.

Когда не использовать

  • For ultra-simple, ultra-cheap bulk templating tasks where higher intelligence is unnecessary.
  • For pure image or video generation workloads; use dedicated media models instead.

Особенности работы

  • The recommended path is POST /v1/chat/completions using an OpenAI-compatible request shape.
  • stream=true returns SSE chunks, while stream=false returns a standard full completion object.
  • If your workflow depends on structured output or tools, validate schemas, tool choice, and fallback behavior on a small slice first.

Минимальный запрос

{
  "model": "gemini-3.5-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a senior full-stack engineer. Summarize the approach first, then provide the smallest runnable code."
    },
    {
      "role": "user",
      "content": "Write a Node.js function that reads a CSV, removes duplicate emails, and returns summary stats."
    }
  ],
  "temperature": 0.2,
  "max_tokens": 500,
  "stream": false
}

Минимальный ответ

{
  "id": "chatcmpl_xxxxxxxx",
  "object": "chat.completion",
  "created": 1747699200,
  "model": "gemini-3.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 120,
    "completion_tokens": 260,
    "total_tokens": 380
  }
}

Ключевые параметры

ПараметрТипОбяз.По умолчаниюДиапазонОписание
modelstringДаgemini-3.5-flash-Model name. Use gemini-3.5-flash.
messagesobject[]Да--Conversation messages in chronological order, commonly using system, user, and assistant roles.
max_tokensintegerНет->=1, practical cap should be set by your appMaximum output tokens. The model can reach 64K output officially, but real applications should enforce scenario-specific limits.
streambooleanНетfalse-Whether to enable SSE streaming output.
temperaturenumberНет10-2Sampling temperature. Lower values are steadier; higher values are more diverse.
top_pnumberНет10-1Nucleus sampling threshold; avoid aggressively tuning it together with temperature.
toolsobject[]Нет--Tool or function definitions for tool-calling and agent orchestration.
AuthorizationHTTP HeaderДа--Bearer authentication: Authorization: Bearer <YOUR_API_KEY>.

Частые ошибки

HTTPCodeТриггерИсправлениеПовтор
400invalid_request_errorMissing required fields, malformed messages, or invalid parameter types in the payload.Validate model, messages, max_tokens, and any tools/schema JSON before retrying.Retry only after fixing the payload; avoid blind retries.
401authentication_errorMissing Authorization header, malformed bearer token, or invalid API key.Verify Authorization: Bearer <YOUR_API_KEY> format and key validity.Retry after authentication is fixed.
429rate_limit_errorRequest rate, concurrency, or quota usage has hit upstream rate limiting.Apply exponential backoff and inspect concurrency, context size, and current quota consumption.Use 1s/2s/4s backoff with jitter; reduce concurrency or downgrade workload shape if it persists.
500internal_errorTransient upstream instability, tool execution failure, or internal processing issues.Capture request id and a compact context summary, then retry; escalate if failures persist.Retry 2-3 times with short delays.

FAQ

  1. What is gemini-3.5-flash best for?
    It is best for text and coding assistant tasks that need strong intelligence without giving up throughput and response speed.
  2. How is it different from a classic Flash model?
    The main difference is a higher intelligence ceiling, making it more suitable for complex Q&A, long context, and tool-enabled workflows.
  3. What is the fastest way to integrate it?
    Use the OpenAI-compatible Chat Completions path first so you can reuse existing messages, streaming, and retry middleware.
  4. When should max_tokens be constrained carefully?
    Constrain max_tokens carefully for long-context, coding, and structured-output tasks to avoid bloated responses, cost drift, or timeouts.

Связанные API