self-evolution Topic Execution Plan Template

Harness Engineering execution plan: this is an agent-executable scenario that shows how the control plane coordinates environment, workflow, guardrails, and feedback loops rather than a one-off agent call.
Agent Collaboration: This document is an agent-executable plan. Open this project in an AI coding agent (Claude Code, OpenCode, Codex, etc.) — the agent reads this plan and orchestrates other agents via the orchestrator CLI to collaboratively complete the task, from resource deployment and execution to result verification, fully autonomously.

This document is a generic template for submitting a topic to the orchestrator's self-evolution workflow for execution. Unlike self-bootstrap's linear iteration, self-evolution uses WP03 dynamic candidate generation + competitive selection to explore multiple implementation paths, with the engine automatically selecting the optimal solution.

Applicable scenarios:

Multiple possible implementation paths exist, and you want to select the best one through competitive comparison
Topic scope is moderate (1-5 files), suitable for A/B comparison of 2 candidate solutions
Objectively quantifiable evaluation criteria exist (compilation/tests/clippy/diff size)

Not applicable:

Topic scope is very large, where a single candidate solution requires multiple iterations to complete (use self-bootstrap)
Implementation path is clearly unique, making competition meaningless (use self-bootstrap)
Full QA document governance and ticket collection are needed (use self-bootstrap; self-evolution omits these steps)

Recommended reference examples:

docs/showcases/self-evolution-execution.md (first real-world execution)
docs/showcases/self-bootstrap-execution-template.md (comparison: linear iteration template)

1. Task Objective

Pass the following objective text directly to orchestrator as the topic for this round of self-evolution:

Topic name: <topic title>
Background: <Brief description of the current problem, tech debt, defect, or optimization opportunity>
Task objective for this round: <Describe the expected outcome from orchestrator>
Constraints:
Prioritize fixing the root cause; superficial workarounds are not acceptable.
Preserve existing core semantics, compatibility requirements, key events, or state behaviors: <behaviors to preserve>
The final goal is: <explicit completion state>

1.1 Expected Output

Produced and delivered autonomously by orchestrator:

Two competing candidate solutions (generated by the evo_plan step and injected as dynamic items via generate_items).
Independent implementation for each candidate (evo_implement, item-scoped).
Automated scoring for each candidate (evo_benchmark: compilation/tests/clippy/diff size).
Engine automatically selects the higher-scoring candidate (select_best, WP03 item_select).
Winning candidate is applied and passes final validation (evo_apply_winner + self_test).

1.2 Non-Goals

This round does not involve humans pre-defining implementation details; does not presume which path should win; does not specify concrete code changes on behalf of orchestrator in the plan document. Implementation paths are autonomously explored and competitively selected by the workflow — humans only observe whether the process deviates from the objective.

1.3 Topic Suitability Checklist

Before using this template, confirm the topic meets the following conditions:

[ ] At least two implementation paths with substantive differences exist
[ ] Change scope is manageable (1-5 files), and a single candidate can be completed in one agent invocation
[ ] Objectively quantifiable comparison dimensions exist (performance, code size, correctness, etc.)
[ ] Existing tests provide sufficient regression protection without additional QA documentation

2. Execution Method

This round follows the standard self-evolution pipeline:

text

evo_plan ──[generate_items]──> evo_implement (x2) ──> evo_benchmark (x2) ──> select_best ──> evo_apply_winner ──> evo_align_tests ──> self_test ──> loop_guard

Key differences from self-bootstrap:

Dimension	self-bootstrap	self-evolution
Loop strategy	Fixed 2 cycles	Fixed 1 cycle
Implementation path	Single linear	2 candidates competing
Selection mechanism	None	WP03 item_select (max score)
Cost control	Multiple steps, multiple agents	max_parallel=1, no QA/doc steps
Safety guarantee	self_test + self_restart	self_test + invariant (compilation_gate)

Cost note: self-evolution controls cost through single cycle + serial candidate execution. Although there are 2 candidate solutions, max_parallel: 1 ensures multiple agents do not run simultaneously. Total agent invocations are approximately 6 (plan x1 + implement x2 + benchmark x2 + apply_winner x1), plus builtin steps. Compared to self-bootstrap's 2 cycles x multiple steps, the cost is comparable or slightly lower.

The human role is limited to two types:

Launching and providing the topic objective.
Monitoring execution status, determining if the process is stuck, and recording results.

3. Startup Steps

3.1 Build and Start the Daemon

In C/S architecture, the CLI (orchestrator) connects to the daemon (orchestratord) via Unix Domain Socket.

bash

cd "$ORCHESTRATOR_ROOT"   # your orchestrator project directory

cargo build --release -p orchestratord -p orchestrator-cli

# Start daemon (if not running)
# --foreground keeps log output in foreground; --workers specifies parallel worker count
nohup ./target/release/orchestratord --foreground --workers 2 > /tmp/orchestratord.log 2>&1 &

# Verify daemon is running
ps aux | grep orchestratord | grep -v grep
# Verify queue can be consumed by daemon workers
orchestrator task list -o json

Warning: CLI binary path: The C/S mode CLI is at target/release/orchestrator (crates/cli), not the legacy monolithic binary core/target/release/agent-orchestrator. Update any symlinks pointing to the old path.

3.2 Initialize Database and Load Resources

bash

orchestrator delete project/self-evolution --force
orchestrator init
orchestrator apply -f your-secrets.yaml           --project self-evolution
# apply additional secret manifests as needed      --project self-evolution
# To use the Claude native API, comment out the above line (claude-* model configs will take effect)
orchestrator apply -f docs/workflow/execution-profiles.yaml --project self-evolution
# Warning: --project is required, otherwise real AI agents will register in the global space
orchestrator apply -f docs/workflow/self-evolution.yaml --project self-evolution

3.3 Verify Resources Are Loaded

Verify resources are loaded (add --project to limit scope to a project):

bash

orchestrator get workspaces --project self-evolution -o json
orchestrator get agents --project self-evolution -o json

3.4 Create Task (Submit Objective to Orchestrator)

In C/S mode, task create enqueues directly to daemon workers. Task creation automatically starts execution — no separate task start is needed.

self-evolution does not require specifying -t target files — dynamic items are generated at runtime by evo_plan's generate_items and do not depend on static QA file scanning.

bash

orchestrator task create \
  -n "<task name>" \
  -w self -W self-evolution \
  --project self-evolution \
  -g "<Compress the task objective above into a single line and pass it directly as the goal>"

Record the returned <task_id>. The task will be immediately claimed by a worker and begin executing. To wait for completion, use orchestrator task watch <task_id> or poll task info.

4. Monitoring Methods

4.1 Status Monitoring

bash

orchestrator task list
orchestrator task info <task_id>
orchestrator task trace <task_id>    # execution timeline with anomaly detection
orchestrator task watch <task_id>    # real-time status panel refresh

Key observations:

Current step (pay special attention to fan-out status of item-scoped steps)
Whether task status is progressing
Whether failed, blocked, or prolonged inactivity appears

4.2 Evolution Process Key Events

self-evolution has the following unique observation points compared to self-bootstrap:

items_generated event: Confirm evo_plan successfully generated candidate items

bash

orchestrator event list --task <task_id> --type items_generated -o json

Dynamic item status: Confirm all candidates were executed
bash
```
orchestrator task items <task_id>
```

Selection result: Confirm item_select chose a winner

bash

orchestrator store get evolution winner_latest --project self-evolution

4.3 Log Monitoring

bash

orchestrator task logs --tail 100 <task_id>
orchestrator task logs --tail 200 <task_id>

Key observations:

Whether evo_plan generated candidates with substantive differences (not just superficial variants)
Whether evo_implement items each implemented independently
Whether evo_benchmark scoring is based on objective metrics and has discriminating power
Whether select_best selected the higher-scoring candidate
Whether evo_apply_winner cleanly applied the winning candidate

4.4 Process / Daemon Monitoring

bash

# daemon process
ps aux | grep orchestratord | grep -v grep

# queue/task status
orchestrator task list -o json

# agent subprocesses (claude -p)
ps aux | grep "claude -p" | grep -v grep

# code changes
git diff --stat

Key observations:

Whether agent processes are still making progress
Whether git diff --stat shows reasonable ongoing changes
If there is prolonged zero output, zero diff, or stalled processes, record as suspected stuck

4.5 Additional Diagnostic Commands

bash

orchestrator task trace <task_id> --json
orchestrator event list --task <task_id> --limit 20

5. Key Checkpoints

5.1 evo_plan Phase Checkpoint

Confirm the output includes:

2 structured candidate solutions (JSON format, containing id/name/description/strategy)
Two candidates with substantive differences (different algorithms, different designs, different trade-offs)
items_generated event has been persisted, with the correct item count

If evo_plan outputs invalid JSON or the candidates are substantively identical, this indicates insufficient prompt differentiation guidance.

5.2 evo_implement Phase Checkpoint

Confirm:

Both items produced code changes
Change scope is consistent with each candidate's strategy description
No mutual interference (item-scoped isolation is working correctly)

5.3 evo_benchmark Phase Checkpoint

Confirm:

Both items have score captures
Scoring is based on objective metrics such as compilation/tests/clippy
Scores have discriminating power (not all perfect or all zero)

5.4 select_best Phase Checkpoint

Confirm:

evolution.winner_latest store entry exists
The selected candidate has the higher score
Winner data includes the candidate ID and score

5.5 evo_apply_winner + self_test Phase Checkpoint

Confirm:

The winning candidate's code compiles
All tests pass
compilation_gate invariant did not trigger halt
Behaviors required to be preserved in the objective still work correctly

6. Success Criteria

The topic is considered complete when all of the following conditions are met:

orchestrator completed the full self-evolution pipeline and exited normally at loop_guard.
Two distinct candidate solutions were actually generated and implemented separately.
The engine selected the higher-scoring candidate via item_select.
The winning candidate's code passed self_test and the compilation_gate invariant.
Key completion state achieved: <fill in the explicit completion condition for the topic here>
evolution.winner_latest store records the selection result.
This round did not introduce any new compilation or test regressions.

7. Exception Handling

7.1 Evolution-Specific Exception Scenarios

Exception	Detection Method	Resolution
evo_plan did not output valid JSON	`items_generated` event does not exist	Check prompt; may need to adjust JSON output instructions
Two candidate solutions are substantively identical	Inspect item label and approach variables	Insufficient prompt differentiation guidance; consider explicitly specifying the differentiation dimension in the goal
Both candidates fail to compile	Benchmark scores are both 0	Invariant will halt; manual analysis needed to determine if the topic is too complex
item_select cannot select a winner	Store entry does not exist	Check whether score capture is working correctly
Test regression after evo_apply_winner	self_test fails	evo_align_tests should attempt to fix; if it still fails, manual intervention is needed
Candidate solution exceeds topic scope	Diff involves unexpected files	Plan prompt scope constraints are insufficient; consider adding scope limits in the goal

7.2 C/S Architecture-Specific Exceptions

Exception	Detection Method	Resolution
Daemon not running	CLI reports `failed to connect to daemon at .../orchestrator.sock`	Start with `orchestratord --foreground --workers 2`
CLI points to legacy monolithic binary	`which orchestrator` points to `core/target/release/`	Update symlink to `target/release/orchestrator`
Daemon still uses old code after rebuild	Previously fixed bug reappears	Kill old daemon process and start a new one
Task starts immediately after task create	task list shows `pending` or quickly becomes `running`	In C/S mode, task lifecycle is queue-only; this is normal behavior

7.3 General Exceptions

If any of the following occur, the human should stop "monitor-only" mode and record the exception:

evo_plan clearly deviates from the topic or fails to generate structured candidates
evo_implement has prolonged zero output or zero code changes
self_test is ineffective or bypassed
Process is deadlocked with zero output

Recommended recording method:

bash

orchestrator task info <task_id>
orchestrator task logs --tail 200 <task_id>
git diff --stat

Manual takeover for analysis should follow if necessary.

8. Human Role Boundaries

In this plan, the human role is explicitly limited to:

Providing the objective
Launching the workflow
Monitoring status
Interrupting and recording when exceptions occur

Humans do not write implementation plans for orchestrator in advance, do not preset code changes, and do not predetermine which path should win.

The purpose of this template is to reuse a stable execution method to verify: whether the current orchestrator can autonomously select a better implementation solution through competitive evolution around a clear objective.

9. Selection Guide: self-evolution vs. self-bootstrap

Decision Dimension	Choose self-evolution	Choose self-bootstrap
Implementation path	Multiple viable paths; comparison needed	Path is clear or unique
Topic scope	Small to medium (1-5 files)	Medium to large (unlimited)
Evaluation method	Objectively quantifiable scoring	QA scenario verification needed
Iteration requirement	One evolution round is sufficient	Multiple iterations needed for refinement
Documentation governance	Not needed	QA/doc synchronized updates needed
Cost sensitivity	Medium (2 candidates x 6 agent calls)	Medium (2 cycles x multiple steps)
Safety requirement	Invariant compilation gate is sufficient	self_restart bootstrap verification needed

10. Cleanup

After task completion, clean up agent-produced topic code so the same topic can be re-tested:

bash

# View agent-produced changes
git diff --stat

# Revert all files modified by agent (adjust file list based on diff output)
git checkout HEAD -- <list of modified files>

# Delete new files created by agent
git status --short | grep '^??' | awk '{print $2}' | xargs rm -f

# Confirm working tree is clean
git status --short

# Verify compilation
cargo check

Warning: The agent may modify core files (context.rs, lib.rs, Cargo.toml, etc.). After each execution, always check git diff --stat and revert unexpected changes. Infrastructure bug fixes should be committed separately before cleaning up topic code.

self-evolution Topic Execution Plan Template ​

1. Task Objective ​

1.1 Expected Output ​

1.2 Non-Goals ​

1.3 Topic Suitability Checklist ​

2. Execution Method ​

3. Startup Steps ​

3.1 Build and Start the Daemon ​

3.2 Initialize Database and Load Resources ​

3.3 Verify Resources Are Loaded ​

3.4 Create Task (Submit Objective to Orchestrator) ​

4. Monitoring Methods ​

4.1 Status Monitoring ​

4.2 Evolution Process Key Events ​

4.3 Log Monitoring ​

4.4 Process / Daemon Monitoring ​

4.5 Additional Diagnostic Commands ​

5. Key Checkpoints ​

5.1 evo_plan Phase Checkpoint ​

5.2 evo_implement Phase Checkpoint ​

5.3 evo_benchmark Phase Checkpoint ​

5.4 select_best Phase Checkpoint ​

5.5 evo_apply_winner + self_test Phase Checkpoint ​

6. Success Criteria ​

7. Exception Handling ​

7.1 Evolution-Specific Exception Scenarios ​

7.2 C/S Architecture-Specific Exceptions ​

7.3 General Exceptions ​

8. Human Role Boundaries ​

9. Selection Guide: self-evolution vs. self-bootstrap ​

10. Cleanup ​

self-evolution Topic Execution Plan Template

1. Task Objective

1.1 Expected Output

1.2 Non-Goals

1.3 Topic Suitability Checklist

2. Execution Method

3. Startup Steps

3.1 Build and Start the Daemon

3.2 Initialize Database and Load Resources

3.3 Verify Resources Are Loaded

3.4 Create Task (Submit Objective to Orchestrator)

4. Monitoring Methods

4.1 Status Monitoring

4.2 Evolution Process Key Events

4.3 Log Monitoring

4.4 Process / Daemon Monitoring

4.5 Additional Diagnostic Commands

5. Key Checkpoints

5.1 evo_plan Phase Checkpoint

5.2 evo_implement Phase Checkpoint

5.3 evo_benchmark Phase Checkpoint

5.4 select_best Phase Checkpoint

5.5 evo_apply_winner + self_test Phase Checkpoint

6. Success Criteria

7. Exception Handling

7.1 Evolution-Specific Exception Scenarios

7.2 C/S Architecture-Specific Exceptions

7.3 General Exceptions

8. Human Role Boundaries

9. Selection Guide: self-evolution vs. self-bootstrap

10. Cleanup