Supervisor Platform
The supervisor platform enables a multi-agent architecture where a supervisor agent classifies incoming requests and delegates work to specialized worker agents.
Architecture
User Message
|
v
+--------+--------+
| Supervisor Agent | (role: leader)
| - Classifies |
| - Routes |
| - Plans |
+-+------+------+-+
| | |
+------+ +---+---+ +------+
| | | |
+------v---+ +--v----+ +v-------+ +----------+
| Coding | |Infra | | Ops | | Verifier |
| Worker | |Worker | | Worker | | Worker |
+----+-----+ +---+---+ +---+----+ +----+-----+
| | | |
+----v-----+ +---v----+ +--v-----+ +---v-----+
|ClaudeCode| |Claude | |Direct | |Claude |
| Toolkit | |Code TK | |Ops | |Code TK |
+----------+ +--------+ +--------+ +---------+
Classification
The supervisor classifies each request into one of these categories:
| Classification | Description |
|---|---|
no_action |
No action needed |
answer_only |
Can answer directly without worker |
read_only_analysis |
Read-only code/data analysis |
code_fix |
Bug fix or small code change |
feature_small |
Small feature (1-2 files) |
feature_medium |
Medium feature (3-5 files) |
feature_large |
Large feature (6+ files) |
refactor_scoped |
Scoped refactoring |
test_generation |
Generate tests |
documentation_update |
Update documentation |
infrastructure_change |
Infrastructure/IaC changes |
noc_operation |
Runtime operations (kubectl, monitoring) |
high_risk_escalation |
Requires human review |
Worker Types
| Worker | Engine | Description |
|---|---|---|
coding |
claude_code | Repository-based code changes |
planning |
claude_code | Read-only analysis and planning |
infrastructure |
claude_code | Terraform, Helm, CDK changes |
operations |
direct_ops | Runtime ops (kubectl, AWS, GCP) |
documentation |
claude_code | Documentation updates |
verifier |
claude_code | Test execution and verification |
data_platform |
claude_code | Data pipeline/DAG work |
Execution Engines
| Engine Type | Description |
|---|---|
code_agent |
Claude Code CLI with MCP plugins |
managed_agent |
API-based agents (Anthropic, OpenAI, Google) |
direct_ops |
Direct tool execution (kubectl, AWS CLI) |
custom |
Custom execution handler |
Execution Targets
| Target Type | Description |
|---|---|
local |
Local machine execution |
ssh |
Remote execution via SSH |
remote_service |
Remote API-based execution |
managed_agents |
Managed agent API endpoints |
Execution Limits
Each job has configurable limits:
{
"network_access": false,
"allow_dependency_install": false,
"allow_git_push": false,
"allow_merge": false,
"allow_delete_files": false,
"allow_migrations": false,
"allow_apply_or_deploy": false,
"allow_production_change": false,
"max_runtime_minutes": 15,
"max_attempts": 3,
"max_memory_mb": 4096,
"max_cpus": 2.0
}
MCP Plugin Support
Workers can use MCP (Model Context Protocol) servers configured per worker:
mcp_servers:
- name: filesystem
type: stdio
command: npx
args: ["-y", "@anthropic-ai/mcp-filesystem"]
description: File read/write
- name: github
type: http
url: https://api.githubcopilot.com/mcp/
description: GitHub PRs and issues
Permission Rules
Workers have granular permission rules:
permissions:
allow:
- "Read(/src/**)"
- "Bash(git diff *)"
- "Bash(npm run *)"
deny:
- "Bash(rm -rf *)"
- "Bash(git push --force *)"
- "Edit(.env)"
ask:
- "Bash(git push *)"
Streaming
Supervisor runs support SSE streaming with three verbosity levels:
| Verbosity | Events Included |
|---|---|
full |
All events (tool calls, thinking, results, status) |
events |
Tool calls, messages, status changes, approval requests |
result |
Messages, errors, and final status only |
OOM Recovery
The platform handles out-of-memory failures with automatic retry:
- Retry: If under
max_retries, retry withmemory_multiplier * current_memory - Re-plan: If at retry limit and
enable_supervisor_replan=true, supervisor re-classifies - Circuit breaker: After
circuit_breaker_thresholdfailures, markfailed_circuit_openand escalate
Default retry policy:
{
"max_retries": 2,
"memory_multiplier": 2.0,
"enable_supervisor_replan": true,
"circuit_breaker_threshold": 3
}