Self-Evolving · Human-on-the-Loop · End-to-End

EvoScientist

v0.1.0 Apache 2.0 Python 3.11+
$ uv tool install EvoScientist
Latest

News & Updates

18 Apr 2026 🥇 Ranked #1 on DeepResearch Bench at submission time! Leaderboard 👈
13 Apr 2026 🥇 Reclaimed #1 on DeepResearch Bench II at submission time! Leaderboard 👈
26 Mar 2026 🥇 Ranked #1 on AstaBench Data Analysis at submission time! Leaderboard 👈
25 Mar 2026 🥇 Ranked #1 on AstaBench Code & Execution at submission time! Leaderboard 👈
13 Mar 2026 🚀 EvoScientist officially debuts!
11 Mar 2026 Technical Report is live! Check it out 👈
06 Mar 2026 🥇 Ranked #1 on DeepResearch Bench II at submission time! Leaderboard 👈
24 Nov 2025 🏆 6/6 accepted at ICAIS 2025 AI Scientist Track — Best Paper & AI Reviewer's Appraisal Award! Details 👈
Recognition

Awards & Benchmarks

ICAIS 2025 Awards
Best Paper Award
ICAIS 2025 · AI Scientist Track
6/6 papers accepted. Best Paper & AI Reviewer's Appraisal Award. Details →
AI-Generated Best Paper
AI-Generated Best Paper
AI-authored research with end-to-end scientific quality. Details →
DeepResearch Bench II #1
#1 DeepResearch Bench II
AgentResearchLab · Apr 2026
Reclaimed #1 on DeepResearch Bench II at submission time. Leaderboard →
DeepResearch Bench #1
#1 DeepResearch Bench
DeepResearch Bench · Apr 2026
Ranked #1 on DeepResearch Bench at submission time. Leaderboard →
AstaBench Code & Execution #1
#1 AstaBench Code & Execution
AllenAI AstaBench · Mar 2026
Ranked #1 on AstaBench Code & Execution at submission time. Leaderboard →
AstaBench Data Analysis #1
#1 AstaBench Data Analysis
AllenAI AstaBench · Mar 2026
Ranked #1 on AstaBench Data Analysis at submission time. Leaderboard →
Quick Start

Up and running
in just seconds

One wizard configures everything — LLM provider, API keys, model selection, and workspace mode. Supports OAuth sign-in for Claude Code and Codex CLI users.

  • Install via uv tool install EvoScientist
  • Run EvoSci onboard and follow the prompts
  • Choose your LLM provider and enter your API key
  • Pick a model and workspace mode — done
~ — EvoSci onboard
System Design

Agent pipeline in action

From user input to experimental output — every step orchestrated, every result verified.

User CLI / API Main Agent planner-agent research-agent code-agent debug-agent data-analysis-agent writing-agent Results
Specialized Agents

Purpose-built for every phase
of the scientific process

From hypothesis to publication — each agent handles a dedicated stage of the experiment workflow.

Multi-Agent Team

6 specialized sub-agents — plan, research, code, debug, analyze, write — working in concert under a shared LangGraph state machine.

Persistent Memory

Context, preferences, and experimental findings survive across sessions. The system internalizes scholarly taste and builds on prior work.

Literature Research

Deep web search with structured 7-dimension reflection. Finds papers, methods, and baselines with enforced citation rigor.

Code Generation & Debug

Write, execute, and iteratively debug experiment code in a sandboxed workspace with 300s timeout, output limits, and auto-recovery.

Scientific Workflow

6-phase process: Intake → Plan → Execute → Evaluate → Write → Verify. Baseline-first design with one-variable iteration for scientific rigor.

MCP & EvoSkills

Plug in MCP servers or install research-lifecycle skills from GitHub. Compatible with other AI coding agents out of the box.

Async Sub-Agents

Long-running tasks dispatched to background workers via start_async_task. The main agent stays responsive and is notified on completion — true parallel multi-agent execution.

Adaptive Context

Per-turn tool filtering keeps only relevant tools visible. Dynamic system-prompt rewriting trims to conversation state. Automatic context overflow recovery prevents runaway sessions.

Human Control

Approve high-risk tool calls before they run. Voice input via faster-whisper (zh/en/auto). 14 slash commands — /threads, /resume, /compact, and more.

Multi-Provider

Your models, your choice

9 LLM providers supported. One config to switch. Auto-detect model names or specify full IDs directly.

A Anthropic
claude-opus-4-6claude-sonnet-4-6claude-haiku-4-5
O OpenAI
gpt-4oo3-minio1
G Google
gemini-2.5-progemini-2.5-flashgemini-2.0-flash
N NVIDIA
deepseek-r1llama-3.3-70bnemotron-ultra
S SiliconFlow
deepseek-v3qwen-plusglm-4
R OpenRouter
any modelunified APIauto-routing
V Volcengine
doubao-prodoubao-lite
D DashScope
qwen-maxqwen-turbo
Ollama & Custom
local / self-hosted
ollama: prefixcustom base_url
Multi-Channel

One agent, every surface

CLI as the hub. 10 messaging integrations — one agent session, any device.

💬
iMessage
✈️
Telegram
🎮
Discord
💼
Slack
💚
WeChat
🔔
DingTalk
🪶
Feishu
📧
Email
🐧
QQ
🔒
Signal
Official Skill Repository

EvoSkills

13 research-lifecycle skills covering the full pipeline from ideation to publication. Install all with a single command. Also compatible with Claude Code, Cursor, and other AI coding agents.

research-ideation research-survey paper-navigator experiment-pipeline experiment-craft experiment-iterative-coder paper-planning paper-writing paper-review paper-rebuttal academic-slides nano-banana evo-memory
13
Skills
EvoSkills Research Pipeline Framework

Stay tuned.
Big things are coming.

Benchmarks · More agents · EvoSkills v2