Self-Evolving · Human-on-the-Loop · End-to-End

EvoScientist

v0.2.2 Apache 2.0 Python 3.11+

$ uv tool install EvoScientist ⌘

Latest

News & Updates

03 Jun 2026 🥈 Ranked #2 overall — and 🥇 #1 among GPT-5.4-based agents — on ResearchClawBench (Agent Mode)! Leaderboard 👈

18 Apr 2026 🥇 Ranked #1 on DeepResearch Bench at submission time! Leaderboard 👈

13 Apr 2026 🥇 Reclaimed #1 on DeepResearch Bench II at submission time! Leaderboard 👈

26 Mar 2026 🥇 Ranked #1 on AstaBench Data Analysis at submission time! Leaderboard 👈

25 Mar 2026 🥇 Ranked #1 on AstaBench Code & Execution at submission time! Leaderboard 👈

13 Mar 2026 🚀 EvoScientist officially debuts!

11 Mar 2026 ⛳ Technical Report is live! Check it out 👈

06 Mar 2026 🥇 Ranked #1 on DeepResearch Bench II at submission time! Leaderboard 👈

24 Nov 2025 🏆 6/6 accepted at ICAIS 2025 AI Scientist Track — Best Paper & AI Reviewer's Appraisal Award! Details 👈

Releases

Changelog

v0.2.2 11 Jul 2026

New models. Selectable in onboarding and /model: GPT-5.6 (sol, terra, luna) for OpenAI and OpenRouter, plus Grok 4.5 and Tencent Hunyuan HY3 on OpenRouter; tighter config-file permissions and a reworked onboarding OAuth flow for auxiliary models.

v0.2.1 05 Jul 2026

AutoSkills. EvoMemory drafts reusable skills from its own observation clusters for you to review via /autoskills; a new --output-format stream-json for headless / SDK clients; richer slash-command completions; Windows UTF-8 config reads; a TUI welcome-banner fix; langchain-openrouter 0.2.5.

v0.2.0 26 Jun 2026

Autonomy milestone. Scheduled tasks: cron-style recurring runs you set up with /schedule or natural language, running unattended with shell-access gating for safety; self-linking memory that connects related observations into a knowledge graph (complements / contradicts / supersedes) as it grows; a read-only GET /api/models endpoint exposing the model registry to the WebUI picker; the main agent now always keeps its memory tools available; deps: deepagents 0.6.12 / langchain-quickjs 0.3.2.

v0.1.9 23 Jun 2026

Hotfix for fresh installs: the first message crashed with The subagent task tool cannot be exposed via ptc after deepagents 0.6.11 / langchain-quickjs 0.3 reserved task as the REPL global. Removed task from the code-interpreter PTC allowlist (task() stays available as the REPL global; async dispatch stays in PTC) and pinned deepagents[quickjs]~=0.6.11.

v0.1.8 22 Jun 2026

LangGraph gateway layer: UI-agnostic graph & thread access shared across CLI / TUI / serve / channel; OpenRouter Anthropic prompt caching now on by default (opt out with openrouter_anthropic_prompt_cache=false); slash-command Enter now submits correctly when a command name prefixes another; pre-commit ruff bump.

v0.1.7 16 Jun 2026

Memory retrieval: agents run a per-task preflight over stored observations (search_observations ranked keyword search + read_memory); multi-stage slash-command completions with subcommand awareness; Windows reliability fixes (async MCP tool execution + graph-state recovery after interruptions, cmd.exe path quoting); quoted virtual-path handling; deepagents 0.6.10.

v0.1.6 11 Jun 2026

Session persistence fix: WebUI / langgraph dev threads survive restarts (SQLite checkpointer + scoped thread restore), memory-worker checkpoint cleanup (delete-on-completion + startup purge), short thread IDs in /threads and resume hints.

v0.1.5 11 Jun 2026

Dangerous mode (real-filesystem access with safety checks), LangGraph streaming v3 pipeline, opt-in Anthropic prompt caching via OpenRouter, claude-fable-5, free-scrolling TUI, Windows CI support, public Cloudflare tunnel for EvoSci deploy (--tunnel).

v0.1.4 07 Jun 2026

Auxiliary model for background tasks & tool selection, observation-memory lifecycle, Qwen3.7-Max/Plus (DashScope), UI-backend selection, plus an OpenRouter multi-turn reasoning fix.

v0.1.3 03 Jun 2026

Multimodal handling (image + PDF/doc flatten/hoisting, text-only model fallback), runtime-context middleware, memory middleware → profile files with stream timeline narration, textual CJK-input fix.

v0.1.2 02 Jun 2026

Browser WebUI mode (beta), EvoSci deploy standalone LangGraph server, default model → claude-sonnet-4-6, MiniMax M3, plus sandbox-timeout and async-notifier channel-routing fixes.

v0.1.1 19 May 2026

deepagents 0.6.2 DeltaChannel upgrade, tier-aware skill mounts, status & elapsed-time bar, QQ inline buttons.

v0.1.0 08 May 2026

Async sub-agents (langgraph dev), official Docker image, personal WeChat, sessions-DB compaction.

v0.0.9 26 Apr 2026

Faster startup, in-session model switching, unified slash commands, DeepSeek V4 thinking fix.

v0.0.8 21 Apr 2026

Unified data directory, status bar, enhanced ask-user & auto-mode.

v0.0.7 10 Apr 2026

Global skills directory, Moonshot/Kimi providers, ccproxy fixes, channel improvements.

v0.0.6 03 Apr 2026

Dynamic context management, OpenRouter reasoning, More Effort mode, GLM-5.1.

v0.0.5 27 Mar 2026

Context-retry middleware, OpenAI relay config, Feishu event-loop fix, /compact.

v0.0.4 24 Mar 2026

@file mentions, resume history, Feishu WebSocket, LaTeX setup.

v0.0.3 20 Mar 2026

Voice input (STT), MiniMax/DeepSeek providers, MCP & skill browsers.

v0.0.2 17 Mar 2026

OAuth sign-in, human-in-the-loop & ask_user, headless serve mode.

v0.0.1 13 Mar 2026

First public release of the self-evolving AI Scientist.

Recognition

Awards & Benchmarks

Best Paper Award

ICAIS 2025 · AI Scientist Track

6/6 papers accepted. Best Paper & AI Reviewer's Appraisal Award. Details →

AI-Generated Best Paper

ICAIS 2025

AI-authored research with end-to-end scientific quality. Details →

#1 DeepResearch Bench II

AgentResearchLab · Apr 2026

Reclaimed #1 on DeepResearch Bench II at submission time. Leaderboard →

#1 DeepResearch Bench

DeepResearch Bench · Apr 2026

Ranked #1 on DeepResearch Bench at submission time. Leaderboard →

#1 AstaBench Code & Execution

AllenAI AstaBench · Mar 2026

Ranked #1 on AstaBench Code & Execution at submission time. Leaderboard →

#1 AstaBench Data Analysis

AllenAI AstaBench · Mar 2026

Ranked #1 on AstaBench Data Analysis at submission time. Leaderboard →

Quick Start

Up and running
in just seconds

One wizard configures everything — LLM provider, API keys, model selection, and workspace mode. Supports OAuth sign-in for Claude Code and Codex CLI users.

Install via uv tool install EvoScientist
Run EvoSci onboard and follow the prompts
Choose your LLM provider and enter your API key
Pick a model and workspace mode — done

~ — EvoSci onboard

System Design

Agent pipeline in action

From user input to experimental output — every step orchestrated, every result verified.

Specialized Agents

Purpose-built for every phase
of the scientific process

From hypothesis to publication — each agent handles a dedicated stage of the experiment workflow.

Multi-Agent Team

6 specialized sub-agents — plan, research, code, debug, analyze, write — working in concert under a shared LangGraph state machine.

Persistent Memory

Context, preferences, and experimental findings survive across sessions. The system internalizes scholarly taste and builds on prior work.

Literature Research

Deep web search with structured 7-dimension reflection. Finds papers, methods, and baselines with enforced citation rigor.

Code Generation & Debug

Write, execute, and iteratively debug experiment code in a sandboxed workspace with 300s timeout, output limits, and auto-recovery.

Scientific Workflow

6-phase process: Intake → Plan → Execute → Evaluate → Write → Verify. Baseline-first design with one-variable iteration for scientific rigor.

MCP & EvoSkills

Plug in MCP servers or install research-lifecycle skills from GitHub. Compatible with other AI coding agents out of the box.

Async Sub-Agents

Long-running tasks dispatched to background workers via start_async_task. The main agent stays responsive and is notified on completion — true parallel multi-agent execution.

Adaptive Context

Per-turn tool filtering keeps only relevant tools visible. Dynamic system-prompt rewriting trims to conversation state. Automatic context overflow recovery prevents runaway sessions.

Human Control

Approve high-risk tool calls before they run. Voice input via faster-whisper (zh/en/auto). 14 slash commands — /threads, /resume, /compact, and more.

Multi-Provider

Your models, your choice

9 LLM providers supported. One config to switch. Auto-detect model names or specify full IDs directly.

A Anthropic

claude-opus-4-6claude-sonnet-4-6claude-haiku-4-5

O OpenAI

gpt-4oo3-minio1

G Google

gemini-2.5-progemini-2.5-flashgemini-2.0-flash

N NVIDIA

deepseek-r1llama-3.3-70bnemotron-ultra

S SiliconFlow

deepseek-v3qwen-plusglm-4

R OpenRouter

any modelunified APIauto-routing

V Volcengine

doubao-prodoubao-lite

D DashScope

qwen-maxqwen-turbo

⚙ Ollama & Custom

local / self-hosted

ollama: prefixcustom base_url

Multi-Channel

One agent, every surface

CLI as the hub. 10 messaging integrations — one agent session, any device.

💬

iMessage

✈️

🎮

Discord

💼

Slack

💚

WeChat

🔔

DingTalk

🪶

Feishu

📧

🐧

🔒

Signal

Official Skill Repository

EvoSkills

13 research-lifecycle skills covering the full pipeline from ideation to publication. Install all with a single command. Also compatible with Claude Code, Cursor, and other AI coding agents.

research-ideation research-survey paper-navigator experiment-pipeline experiment-craft experiment-iterative-coder paper-planning paper-writing paper-review paper-rebuttal academic-slides nano-banana evo-memory

EvoScientist/EvoSkills

Skills

Stay tuned.
Big things are coming.

Benchmarks · More agents · EvoSkills v2

EvoScientist.ai@gmail.com Join Discord