self-evolution Topic Execution Plan Template
Harness Engineering execution plan: this is an agent-executable scenario that shows how the control plane coordinates environment, workflow, guardrails, and feedback loops rather than a one-off agent call.
Agent Collaboration: This document is an agent-executable plan. Open this project in an AI coding agent (Claude Code, OpenCode, Codex, etc.) — the agent reads this plan and orchestrates other agents via the orchestrator CLI to collaboratively complete the task, from resource deployment and execution to result verification, fully autonomously.
This document is a generic template for submitting a topic to the orchestrator's self-evolution workflow for execution. Unlike self-bootstrap's linear iteration, self-evolution uses WP03 dynamic candidate generation + competitive selection to explore multiple implementation paths, with the engine automatically selecting the optimal solution.
Applicable scenarios:
- Multiple possible implementation paths exist, and you want to select the best one through competitive comparison
- Topic scope is moderate (1-5 files), suitable for A/B comparison of 2 candidate solutions
- Objectively quantifiable evaluation criteria exist (compilation/tests/clippy/diff size)
Not applicable:
- Topic scope is very large, where a single candidate solution requires multiple iterations to complete (use self-bootstrap)
- Implementation path is clearly unique, making competition meaningless (use self-bootstrap)
- Full QA document governance and ticket collection are needed (use self-bootstrap; self-evolution omits these steps)
Recommended reference examples:
docs/showcases/self-evolution-execution.md(first real-world execution)docs/showcases/self-bootstrap-execution-template.md(comparison: linear iteration template)
1. Task Objective
Pass the following objective text directly to orchestrator as the topic for this round of self-evolution:
Topic name:
<topic title>Background:
<Brief description of the current problem, tech debt, defect, or optimization opportunity>Task objective for this round:
<Describe the expected outcome from orchestrator>Constraints:
- Prioritize fixing the root cause; superficial workarounds are not acceptable.
- Preserve existing core semantics, compatibility requirements, key events, or state behaviors:
<behaviors to preserve>- The final goal is:
<explicit completion state>
1.1 Expected Output
Produced and delivered autonomously by orchestrator:
- Two competing candidate solutions (generated by the
evo_planstep and injected as dynamic items viagenerate_items). - Independent implementation for each candidate (
evo_implement, item-scoped). - Automated scoring for each candidate (
evo_benchmark: compilation/tests/clippy/diff size). - Engine automatically selects the higher-scoring candidate (
select_best, WP03 item_select). - Winning candidate is applied and passes final validation (
evo_apply_winner+self_test).
1.2 Non-Goals
This round does not involve humans pre-defining implementation details; does not presume which path should win; does not specify concrete code changes on behalf of orchestrator in the plan document. Implementation paths are autonomously explored and competitively selected by the workflow — humans only observe whether the process deviates from the objective.
1.3 Topic Suitability Checklist
Before using this template, confirm the topic meets the following conditions:
- [ ] At least two implementation paths with substantive differences exist
- [ ] Change scope is manageable (1-5 files), and a single candidate can be completed in one agent invocation
- [ ] Objectively quantifiable comparison dimensions exist (performance, code size, correctness, etc.)
- [ ] Existing tests provide sufficient regression protection without additional QA documentation
2. Execution Method
This round follows the standard self-evolution pipeline:
evo_plan ──[generate_items]──> evo_implement (x2) ──> evo_benchmark (x2) ──> select_best ──> evo_apply_winner ──> evo_align_tests ──> self_test ──> loop_guardKey differences from self-bootstrap:
| Dimension | self-bootstrap | self-evolution |
|---|---|---|
| Loop strategy | Fixed 2 cycles | Fixed 1 cycle |
| Implementation path | Single linear | 2 candidates competing |
| Selection mechanism | None | WP03 item_select (max score) |
| Cost control | Multiple steps, multiple agents | max_parallel=1, no QA/doc steps |
| Safety guarantee | self_test + self_restart | self_test + invariant (compilation_gate) |
Cost note: self-evolution controls cost through single cycle + serial candidate execution. Although there are 2 candidate solutions,
max_parallel: 1ensures multiple agents do not run simultaneously. Total agent invocations are approximately 6 (plan x1 + implement x2 + benchmark x2 + apply_winner x1), plus builtin steps. Compared to self-bootstrap's 2 cycles x multiple steps, the cost is comparable or slightly lower.
The human role is limited to two types:
- Launching and providing the topic objective.
- Monitoring execution status, determining if the process is stuck, and recording results.
3. Startup Steps
3.1 Build and Start the Daemon
In C/S architecture, the CLI (orchestrator) connects to the daemon (orchestratord) via Unix Domain Socket.
cd "$ORCHESTRATOR_ROOT" # your orchestrator project directory
cargo build --release -p orchestratord -p orchestrator-cli
# Start daemon (if not running)
# --foreground keeps log output in foreground; --workers specifies parallel worker count
nohup ./target/release/orchestratord --foreground --workers 2 > /tmp/orchestratord.log 2>&1 &
# Verify daemon is running
ps aux | grep orchestratord | grep -v grep
# Verify queue can be consumed by daemon workers
orchestrator task list -o jsonWarning: CLI binary path: The C/S mode CLI is at
target/release/orchestrator(crates/cli), not the legacy monolithic binarycore/target/release/agent-orchestrator. Update any symlinks pointing to the old path.
3.2 Initialize Database and Load Resources
orchestrator delete project/self-evolution --force
orchestrator init
orchestrator apply -f your-secrets.yaml --project self-evolution
# apply additional secret manifests as needed --project self-evolution
# To use the Claude native API, comment out the above line (claude-* model configs will take effect)
orchestrator apply -f docs/workflow/execution-profiles.yaml --project self-evolution
# Warning: --project is required, otherwise real AI agents will register in the global space
orchestrator apply -f docs/workflow/self-evolution.yaml --project self-evolution3.3 Verify Resources Are Loaded
Verify resources are loaded (add --project to limit scope to a project):
orchestrator get workspaces --project self-evolution -o json
orchestrator get agents --project self-evolution -o json3.4 Create Task (Submit Objective to Orchestrator)
In C/S mode, task create enqueues directly to daemon workers. Task creation automatically starts execution — no separate task start is needed.
self-evolution does not require specifying -t target files — dynamic items are generated at runtime by evo_plan's generate_items and do not depend on static QA file scanning.
orchestrator task create \
-n "<task name>" \
-w self -W self-evolution \
--project self-evolution \
-g "<Compress the task objective above into a single line and pass it directly as the goal>"Record the returned <task_id>. The task will be immediately claimed by a worker and begin executing. To wait for completion, use orchestrator task watch <task_id> or poll task info.
4. Monitoring Methods
4.1 Status Monitoring
orchestrator task list
orchestrator task info <task_id>
orchestrator task trace <task_id> # execution timeline with anomaly detection
orchestrator task watch <task_id> # real-time status panel refreshKey observations:
- Current step (pay special attention to fan-out status of item-scoped steps)
- Whether task status is progressing
- Whether
failed,blocked, or prolonged inactivity appears
4.2 Evolution Process Key Events
self-evolution has the following unique observation points compared to self-bootstrap:
items_generatedevent: Confirmevo_plansuccessfully generated candidate itemsbashorchestrator event list --task <task_id> --type items_generated -o jsonDynamic item status: Confirm all candidates were executed
bashorchestrator task items <task_id>Selection result: Confirm item_select chose a winner
bashorchestrator store get evolution winner_latest --project self-evolution
4.3 Log Monitoring
orchestrator task logs --tail 100 <task_id>
orchestrator task logs --tail 200 <task_id>Key observations:
- Whether
evo_plangenerated candidates with substantive differences (not just superficial variants) - Whether
evo_implementitems each implemented independently - Whether
evo_benchmarkscoring is based on objective metrics and has discriminating power - Whether
select_bestselected the higher-scoring candidate - Whether
evo_apply_winnercleanly applied the winning candidate
4.4 Process / Daemon Monitoring
# daemon process
ps aux | grep orchestratord | grep -v grep
# queue/task status
orchestrator task list -o json
# agent subprocesses (claude -p)
ps aux | grep "claude -p" | grep -v grep
# code changes
git diff --statKey observations:
- Whether agent processes are still making progress
- Whether
git diff --statshows reasonable ongoing changes - If there is prolonged zero output, zero diff, or stalled processes, record as suspected stuck
4.5 Additional Diagnostic Commands
orchestrator task trace <task_id> --json
orchestrator event list --task <task_id> --limit 205. Key Checkpoints
5.1 evo_plan Phase Checkpoint
Confirm the output includes:
- 2 structured candidate solutions (JSON format, containing id/name/description/strategy)
- Two candidates with substantive differences (different algorithms, different designs, different trade-offs)
items_generatedevent has been persisted, with the correct item count
If evo_plan outputs invalid JSON or the candidates are substantively identical, this indicates insufficient prompt differentiation guidance.
5.2 evo_implement Phase Checkpoint
Confirm:
- Both items produced code changes
- Change scope is consistent with each candidate's strategy description
- No mutual interference (item-scoped isolation is working correctly)
5.3 evo_benchmark Phase Checkpoint
Confirm:
- Both items have score captures
- Scoring is based on objective metrics such as compilation/tests/clippy
- Scores have discriminating power (not all perfect or all zero)
5.4 select_best Phase Checkpoint
Confirm:
evolution.winner_lateststore entry exists- The selected candidate has the higher score
- Winner data includes the candidate ID and score
5.5 evo_apply_winner + self_test Phase Checkpoint
Confirm:
- The winning candidate's code compiles
- All tests pass
compilation_gateinvariant did not trigger halt- Behaviors required to be preserved in the objective still work correctly
6. Success Criteria
The topic is considered complete when all of the following conditions are met:
- orchestrator completed the full
self-evolutionpipeline and exited normally atloop_guard. - Two distinct candidate solutions were actually generated and implemented separately.
- The engine selected the higher-scoring candidate via
item_select. - The winning candidate's code passed
self_testand thecompilation_gateinvariant. - Key completion state achieved:
<fill in the explicit completion condition for the topic here> evolution.winner_lateststore records the selection result.- This round did not introduce any new compilation or test regressions.
7. Exception Handling
7.1 Evolution-Specific Exception Scenarios
| Exception | Detection Method | Resolution |
|---|---|---|
| evo_plan did not output valid JSON | items_generated event does not exist | Check prompt; may need to adjust JSON output instructions |
| Two candidate solutions are substantively identical | Inspect item label and approach variables | Insufficient prompt differentiation guidance; consider explicitly specifying the differentiation dimension in the goal |
| Both candidates fail to compile | Benchmark scores are both 0 | Invariant will halt; manual analysis needed to determine if the topic is too complex |
| item_select cannot select a winner | Store entry does not exist | Check whether score capture is working correctly |
| Test regression after evo_apply_winner | self_test fails | evo_align_tests should attempt to fix; if it still fails, manual intervention is needed |
| Candidate solution exceeds topic scope | Diff involves unexpected files | Plan prompt scope constraints are insufficient; consider adding scope limits in the goal |
7.2 C/S Architecture-Specific Exceptions
| Exception | Detection Method | Resolution |
|---|---|---|
| Daemon not running | CLI reports failed to connect to daemon at .../orchestrator.sock | Start with orchestratord --foreground --workers 2 |
| CLI points to legacy monolithic binary | which orchestrator points to core/target/release/ | Update symlink to target/release/orchestrator |
| Daemon still uses old code after rebuild | Previously fixed bug reappears | Kill old daemon process and start a new one |
| Task starts immediately after task create | task list shows pending or quickly becomes running | In C/S mode, task lifecycle is queue-only; this is normal behavior |
7.3 General Exceptions
If any of the following occur, the human should stop "monitor-only" mode and record the exception:
evo_planclearly deviates from the topic or fails to generate structured candidatesevo_implementhas prolonged zero output or zero code changesself_testis ineffective or bypassed- Process is deadlocked with zero output
Recommended recording method:
orchestrator task info <task_id>
orchestrator task logs --tail 200 <task_id>
git diff --statManual takeover for analysis should follow if necessary.
8. Human Role Boundaries
In this plan, the human role is explicitly limited to:
- Providing the objective
- Launching the workflow
- Monitoring status
- Interrupting and recording when exceptions occur
Humans do not write implementation plans for orchestrator in advance, do not preset code changes, and do not predetermine which path should win.
The purpose of this template is to reuse a stable execution method to verify: whether the current orchestrator can autonomously select a better implementation solution through competitive evolution around a clear objective.
9. Selection Guide: self-evolution vs. self-bootstrap
| Decision Dimension | Choose self-evolution | Choose self-bootstrap |
|---|---|---|
| Implementation path | Multiple viable paths; comparison needed | Path is clear or unique |
| Topic scope | Small to medium (1-5 files) | Medium to large (unlimited) |
| Evaluation method | Objectively quantifiable scoring | QA scenario verification needed |
| Iteration requirement | One evolution round is sufficient | Multiple iterations needed for refinement |
| Documentation governance | Not needed | QA/doc synchronized updates needed |
| Cost sensitivity | Medium (2 candidates x 6 agent calls) | Medium (2 cycles x multiple steps) |
| Safety requirement | Invariant compilation gate is sufficient | self_restart bootstrap verification needed |
10. Cleanup
After task completion, clean up agent-produced topic code so the same topic can be re-tested:
# View agent-produced changes
git diff --stat
# Revert all files modified by agent (adjust file list based on diff output)
git checkout HEAD -- <list of modified files>
# Delete new files created by agent
git status --short | grep '^??' | awk '{print $2}' | xargs rm -f
# Confirm working tree is clean
git status --short
# Verify compilation
cargo checkWarning: The agent may modify core files (
context.rs,lib.rs,Cargo.toml, etc.). After each execution, always checkgit diff --statand revert unexpected changes. Infrastructure bug fixes should be committed separately before cleaning up topic code.