Gemini 2.0 Flash Thinking

Name: Gemini 2.0 Flash Thinking
Author: Google

Мультимодальная

Google

Gemini 2.0 Flash Thinking — это улучшенная модель рассуждений, способная демонстрировать свои мыслительные процессы для повышения производительности и объяснимости. Сочетая скорость и производительность, Gemini 2.0 Flash Thinking также превосходно справляется с задачами в области науки и математики, показывая свои рассуждения при решении сложных проблем.

Основные характеристики

Параметры

Контекст

Дата выпуска

21 января 2025 г.

Средний балл

74.3%

API документация Блог с результатами

Временная шкала

Ключевые даты в истории модели

Анонс

21 января 2025 г.

Последнее обновление

19 июля 2025 г.

Сегодня

24 февраля 2026 г.

Технические характеристики

Параметры

Токены обучения

Граница знаний

1 августа 2024 г.

Семейство

Возможности

МультимодальностьZeroEval

Результаты бенчмарков

Показатели производительности модели на различных тестах и бенчмарках

Рассуждения

Логические рассуждения и анализ

GPQA

Challenging science questions requiring chain-of-thought reasoning AI systems have made tremendous strides in answering factual questions, but complex science problems that require multi-step reasoning and domain knowledge remain challenging. This task involves a set of science questions from various domains (physics, chemistry, biology, etc.) that require the model to: 1. Break down complex problems into logical steps 2. Apply scientific principles and formulas correctly 3. Reason through each step sequentially 4. Show calculations when necessary 5. Arrive at accurate conclusions The questions are designed to test both factual knowledge and the ability to use that knowledge in a logical reasoning chain. For example, a physics problem might require calculating forces, then using those values to determine if an object will move, and finally explaining the real-world implications. Success on this task requires not just memorized facts but the ability to connect concepts across domains and apply them appropriately in novel scenarios - mirroring how human experts solve scientific problems. • Self-reported

74.2%

Мультимодальность

Работа с изображениями и визуальными данными

MMMU

Вопросы и ответы по изображениям и тексту в различных областях AI: Переведи, пожалуйста, следующий текст о методе анализа модели ИИ: # Experiment: LMSYS Olympiad ## Motivation LMSYS has run a series of "Olympiad" competitions where they crowdsource head-to-head comparisons between two AI assistants. This produces a win-rate tournament. In our own comparisons, we found a significant contrast between Olympiad win rates and benchmark performance. ## Procedure We analyze results from the LMSYS Olympiad in Arena (March 2024 Leaderboard). We focus on this leaderboard because it includes all of the current leading commercial and open models (e.g., Claude 3, GPT-4, and Llama 3). We download the full set of head-to-head win rates and convert these into an overall win-rate ranking (accounting for the fact that not all models played against each other the same number of times). ## Results We find a strong disconnect between LMSYS Olympiad rankings and benchmark rankings. For instance, Claude 3 Opus (which sits near the top of most capability benchmarks) is ranked #5 in the Olympiad, below GPT-4 Turbo, Claude 3 Sonnet, and even Claude 2. Llama 3 70B Instruct has a particularly weak showing, placing far below much smaller models like Mistral 7B. A notable issue with the Olympiad is that many votes come from deliberately adversarial prompts, which makes sense given that the crowdsourced voters are incentivized to try to find edge cases where models differ. This means that model behaviors like refusing to respond to certain prompts could have an outsized impact on these rankings. We found several examples where Claude 3 Opus declined to answer questions that other models answered, and this appeared to frequently lead human voters to prefer the more compliant model. • Self-reported

75.4%

Другие тесты

Специализированные бенчмарки

AIME 2024

# Улучшенное рассуждение при решении математических задач соревновательного уровня AI assistant • Self-reported

73.3%

Лицензия и метаданные

Лицензия

proprietary

Дата анонса

21 января 2025 г.

Последнее обновление

19 июля 2025 г.

Похожие модели

Все модели

Gemini 1.5 Pro

Google

Лучший скор:0.9 (MMLU)

Релиз:май 2024 г.

Цена:$2.50/1M токенов

Gemini 1.5 Flash

Google

Лучший скор:0.8 (MMLU)

Релиз:май 2024 г.

Цена:$0.15/1M токенов

Gemini 2.5 Flash-Lite

Google

Лучший скор:0.6 (GPQA)

Релиз:июнь 2025 г.

Цена:$0.10/1M токенов

Gemini 2.0 Flash

Google

Лучший скор:0.6 (GPQA)

Релиз:дек. 2024 г.

Цена:$0.10/1M токенов

Gemini 2.0 Flash-Lite

Google

Лучший скор:0.5 (GPQA)

Релиз:февр. 2025 г.

Цена:$0.07/1M токенов

Gemini 3.1 Pro

Google

Лучший скор:0.9 (GPQA)

Релиз:февр. 2026 г.

Цена:$2.50/1M токенов

Gemini 2.5 Pro

Google

Лучший скор:0.8 (GPQA)

Релиз:май 2025 г.

Цена:$1.25/1M токенов

Gemini 2.5 Pro Preview 06-05

Google

Лучший скор:0.9 (GPQA)

Релиз:июнь 2025 г.

Цена:$1.25/1M токенов

Рекомендации основаны на схожести характеристик: организация-разработчик, мультимодальность, размер параметров и производительность в бенчмарках. Выберите модель для сравнения или перейдите к полному каталогу для просмотра всех доступных моделей ИИ.

Официальный GPT-5 и другие нейросети

GPT-5 без VPN

Gemini 2.0 Flash Thinking

Основные характеристики

Временная шкала

Технические характеристики

Результаты бенчмарков

Рассуждения

Мультимодальность

Другие тесты

Лицензия и метаданные

Похожие модели

Gemini 1.5 Pro

Gemini 1.5 Flash

Gemini 2.5 Flash-Lite

Gemini 2.0 Flash

Gemini 2.0 Flash-Lite

Gemini 3.1 Pro

Gemini 2.5 Pro

Gemini 2.5 Pro Preview 06-05