Task Agents and Sub-Agents

TatsuCode runs several parallel sub-agents alongside your main chat to keep heavy work off the primary model. They handle research, code exploration, read-only code/design review, architectural reasoning, video and PDF analysis — letting your main model spend its tokens on the parts only it can do.

v0.9.107 adds two big improvements here: a new PDF Reader Agent for reliable PDF analysis, and a new /models-taskagents command to assign cost-efficient models to each sub-agent so long sessions go further.

The Sub-Agent Lineup

These are the five user-configurable Task Agents in /models-taskagents:

Agent	What it does	Model requirements
Web Agent	Live web research, page fetches, documentation lookups, direct API calls (faster than scraping)	Any model
Code Explorer Agent	Read-only codebase exploration — find symbols, trace references, lint/analyze with `CodebaseReadOnly`, summarize regions	Any model
Design Reviewer (Architect Agent)	Independent read-only code/design review for non-obvious changes — full code-review passes, refactors, migrations, API/schema design, system boundaries, trade-off and risk analysis	Any reasoning-strong model
YouTube Video Agent	Visual YouTube video analysis — extracts code snippets, on-screen text, step-by-step tutorials, shader/UI techniques directly from the video frames (not just transcripts)	Requires a Google Gemini model via OpenRouter — Gemini is the only family with native YouTube video input. Connect via `/connect`.
PDF Reader Agent	Native-PDF analysis — text, charts, diagrams, scanned content, equations, figures	Model with native PDF input — Claude API, Gemini 2.5+, or GPT-5+

Up to 5 sub-agents can run in parallel on a single problem.

The picker auto-filters the model list for capability-restricted agents — the YouTube agent only shows Gemini models (and prompts you to connect OpenRouter if it's missing); the PDF Reader only shows native-PDF-capable models. You don't have to memorize which models qualify.

Beyond the user-facing five, TatsuCode also spawns short-lived helper SubAgents automatically (web-page summarization, DevBrowser inspection compression, AGENTS.md creation, conversation compaction). Those run on internal defaults and aren't configured here.

YouTube Video Agent — Visual Analysis via Gemini

The YouTube Video Agent doesn't just pull a transcript. It hands the actual video to a Gemini model that natively accepts YouTube URLs as input, so the agent can:

See on-screen code, terminal output, and configuration that's never spoken aloud
Extract step-by-step instructions from screen recordings and tutorials
Analyze visual techniques — shader effects, UI layouts, animation timing
Summarize the full video without losing visual context

Why Gemini specifically: Google's Gemini models are the only major family with native YouTube video input — you pass the URL directly and the model processes the frames. Other model families (Claude, GPT) cannot ingest YouTube videos this way, which is why this agent is locked to Gemini.

How to enable it: connect OpenRouter via /connect, then open /models-taskagents and assign a Gemini model (e.g., Gemini 2.5 Flash for cost, Gemini 2.5 Pro for depth) to the YouTube Video Agent. Without OpenRouter the picker will tell you what's missing.

PDF Reader Agent

PDF analysis used to depend on pdftoppm (from Poppler) being installed on your machine to rasterize pages. On Windows in particular, Poppler is rarely present — so PDF reads silently degraded to text-only and missed every chart, diagram, and scanned page.

The new PDF Reader Agent fixes that:

Renders pages through TatsuCode's own PDFium pipeline — no Poppler dependency.
Routes to a model with native PDF input (Claude API / Gemini 2.5+ / GPT-5+).
Extracts text and visual content reliably — charts, diagrams, scans, equations, signatures, figures.

It auto-activates whenever the main agent asks about a PDF that could contain visual content. You don't have to do anything special — just attach the PDF and ask.

`/models-taskagents`

/models-taskagents

Assign a different model to each sub-agent. The idea is simple: scout work doesn't need your most expensive model.

Cost-Saving Rationale

Sub-agent	Recommended assignment	Why
Web Agent	Smaller / faster model	Most web research is summarization — doesn't need heavy reasoning
Code Explorer	Smaller / faster model	Read-only navigation — find symbols, summarize, trace
Design Reviewer / Architect	Match your main model	This is where reasoning quality actually matters — code review, architecture, and trade-offs benefit from stronger models
YouTube Video	A Gemini model (Flash for cost, Pro for depth)	Gemini is required — only family with native YouTube video input. Flash is the cheap default; Pro for dense technical content
PDF Reader	A model with native PDF input	Required for visual extraction (Claude API / Gemini 2.5+ / GPT-5+)

For most users, pointing the scout-style agents (Web, Code Explorer) at a fast/cheap model like a Haiku or Flash variant — keeping Design Reviewer / Architect and your main chat on Sonnet/Opus or GPT-5.x, and locking YouTube to Gemini Flash and PDF Reader to a native-PDF model — meaningfully cuts token cost on long sessions without hurting output quality.

Token Budget Impact

On long debugging or research sessions, the savings compound. A typical research turn might fan out to 3 sub-agents at once — running each on a model that's 5-10× cheaper than your main model can stretch a session twice as far before you hit /compact.

CLI-Backed Models Cannot Be Task Agents

If you select a CLI-backed model (Claude Code, future Gemini CLI), it won't appear in the /models-taskagents picker. This is intentional — CLI providers ignore TatsuCode's tool whitelist and run their own loop, so a "read-only Code Explorer" assigned to a CLI provider wouldn't actually be read-only. See CLI vs API/Subscription Providers for the structural reasons.

If you want a CLI provider for delegated exploration, use the dedicated ClaudeAgent tool from your main chat — same delegation, no whitelist mismatch.

When the Main Model Calls a Sub-Agent

You don't usually have to think about this — the main model decides. But the rough heuristics:

Multi-file code search / symbol lookup / lint analysis → Code Explorer
Web research / API lookups / current info / docs → Web Agent
Full code review, refactor planning, schema/API design, trade-off analysis → Design Reviewer / Architect
YouTube link in the prompt → YouTube Video Agent (Gemini-only, visual)
PDF attached → PDF Reader Agent

Each sub-agent returns a focused report to the main chat, which then continues the conversation with that context.

Next Steps

Commands — /models-taskagents
Models — picking the right model for each role
CLI vs API/Subscription Providers — why CLI providers can't be Task Agents