Skip to content

self-evolution Topic Execution Plan Template

Harness Engineering execution plan: this is an agent-executable scenario that shows how the control plane coordinates environment, workflow, guardrails, and feedback loops rather than a one-off agent call.

Agent Collaboration: This document is an agent-executable plan. Open this project in an AI coding agent (Claude Code, OpenCode, Codex, etc.) — the agent reads this plan and orchestrates other agents via the orchestrator CLI to collaboratively complete the task, from resource deployment and execution to result verification, fully autonomously.

This document is a generic template for submitting a topic to the orchestrator's self-evolution workflow for execution. Unlike self-bootstrap's linear iteration, self-evolution uses WP03 dynamic candidate generation + competitive selection to explore multiple implementation paths, with the engine automatically selecting the optimal solution.

Applicable scenarios:

  • Multiple possible implementation paths exist, and you want to select the best one through competitive comparison
  • Topic scope is moderate (1-5 files), suitable for A/B comparison of 2 candidate solutions
  • Objectively quantifiable evaluation criteria exist (compilation/tests/clippy/diff size)

Not applicable:

  • Topic scope is very large, where a single candidate solution requires multiple iterations to complete (use self-bootstrap)
  • Implementation path is clearly unique, making competition meaningless (use self-bootstrap)
  • Full QA document governance and ticket collection are needed (use self-bootstrap; self-evolution omits these steps)

Recommended reference examples:

  1. docs/showcases/self-evolution-execution.md (first real-world execution)
  2. docs/showcases/self-bootstrap-execution-template.md (comparison: linear iteration template)

1. Task Objective

Pass the following objective text directly to orchestrator as the topic for this round of self-evolution:

Topic name: <topic title>

Background: <Brief description of the current problem, tech debt, defect, or optimization opportunity>

Task objective for this round: <Describe the expected outcome from orchestrator>

Constraints:

  1. Prioritize fixing the root cause; superficial workarounds are not acceptable.
  2. Preserve existing core semantics, compatibility requirements, key events, or state behaviors: <behaviors to preserve>
  3. The final goal is: <explicit completion state>

1.1 Expected Output

Produced and delivered autonomously by orchestrator:

  1. Two competing candidate solutions (generated by the evo_plan step and injected as dynamic items via generate_items).
  2. Independent implementation for each candidate (evo_implement, item-scoped).
  3. Automated scoring for each candidate (evo_benchmark: compilation/tests/clippy/diff size).
  4. Engine automatically selects the higher-scoring candidate (select_best, WP03 item_select).
  5. Winning candidate is applied and passes final validation (evo_apply_winner + self_test).

1.2 Non-Goals

This round does not involve humans pre-defining implementation details; does not presume which path should win; does not specify concrete code changes on behalf of orchestrator in the plan document. Implementation paths are autonomously explored and competitively selected by the workflow — humans only observe whether the process deviates from the objective.

1.3 Topic Suitability Checklist

Before using this template, confirm the topic meets the following conditions:

  • [ ] At least two implementation paths with substantive differences exist
  • [ ] Change scope is manageable (1-5 files), and a single candidate can be completed in one agent invocation
  • [ ] Objectively quantifiable comparison dimensions exist (performance, code size, correctness, etc.)
  • [ ] Existing tests provide sufficient regression protection without additional QA documentation

2. Execution Method

This round follows the standard self-evolution pipeline:

text
evo_plan ──[generate_items]──> evo_implement (x2) ──> evo_benchmark (x2) ──> select_best ──> evo_apply_winner ──> evo_align_tests ──> self_test ──> loop_guard

Key differences from self-bootstrap:

Dimensionself-bootstrapself-evolution
Loop strategyFixed 2 cyclesFixed 1 cycle
Implementation pathSingle linear2 candidates competing
Selection mechanismNoneWP03 item_select (max score)
Cost controlMultiple steps, multiple agentsmax_parallel=1, no QA/doc steps
Safety guaranteeself_test + self_restartself_test + invariant (compilation_gate)

Cost note: self-evolution controls cost through single cycle + serial candidate execution. Although there are 2 candidate solutions, max_parallel: 1 ensures multiple agents do not run simultaneously. Total agent invocations are approximately 6 (plan x1 + implement x2 + benchmark x2 + apply_winner x1), plus builtin steps. Compared to self-bootstrap's 2 cycles x multiple steps, the cost is comparable or slightly lower.

The human role is limited to two types:

  1. Launching and providing the topic objective.
  2. Monitoring execution status, determining if the process is stuck, and recording results.

3. Startup Steps

3.1 Build and Start the Daemon

In C/S architecture, the CLI (orchestrator) connects to the daemon (orchestratord) via Unix Domain Socket.

bash
cd "$ORCHESTRATOR_ROOT"   # your orchestrator project directory

cargo build --release -p orchestratord -p orchestrator-cli

# Start daemon (if not running)
# --foreground keeps log output in foreground; --workers specifies parallel worker count
nohup ./target/release/orchestratord --foreground --workers 2 > /tmp/orchestratord.log 2>&1 &

# Verify daemon is running
ps aux | grep orchestratord | grep -v grep
# Verify queue can be consumed by daemon workers
orchestrator task list -o json

Warning: CLI binary path: The C/S mode CLI is at target/release/orchestrator (crates/cli), not the legacy monolithic binary core/target/release/agent-orchestrator. Update any symlinks pointing to the old path.

3.2 Initialize Database and Load Resources

bash
orchestrator delete project/self-evolution --force
orchestrator init
orchestrator apply -f your-secrets.yaml           --project self-evolution
# apply additional secret manifests as needed      --project self-evolution
# To use the Claude native API, comment out the above line (claude-* model configs will take effect)
orchestrator apply -f docs/workflow/execution-profiles.yaml --project self-evolution
# Warning: --project is required, otherwise real AI agents will register in the global space
orchestrator apply -f docs/workflow/self-evolution.yaml --project self-evolution

3.3 Verify Resources Are Loaded

Verify resources are loaded (add --project to limit scope to a project):

bash
orchestrator get workspaces --project self-evolution -o json
orchestrator get agents --project self-evolution -o json

3.4 Create Task (Submit Objective to Orchestrator)

In C/S mode, task create enqueues directly to daemon workers. Task creation automatically starts execution — no separate task start is needed.

self-evolution does not require specifying -t target files — dynamic items are generated at runtime by evo_plan's generate_items and do not depend on static QA file scanning.

bash
orchestrator task create \
  -n "<task name>" \
  -w self -W self-evolution \
  --project self-evolution \
  -g "<Compress the task objective above into a single line and pass it directly as the goal>"

Record the returned <task_id>. The task will be immediately claimed by a worker and begin executing. To wait for completion, use orchestrator task watch <task_id> or poll task info.


4. Monitoring Methods

4.1 Status Monitoring

bash
orchestrator task list
orchestrator task info <task_id>
orchestrator task trace <task_id>    # execution timeline with anomaly detection
orchestrator task watch <task_id>    # real-time status panel refresh

Key observations:

  1. Current step (pay special attention to fan-out status of item-scoped steps)
  2. Whether task status is progressing
  3. Whether failed, blocked, or prolonged inactivity appears

4.2 Evolution Process Key Events

self-evolution has the following unique observation points compared to self-bootstrap:

  1. items_generated event: Confirm evo_plan successfully generated candidate items

    bash
    orchestrator event list --task <task_id> --type items_generated -o json
  2. Dynamic item status: Confirm all candidates were executed

    bash
    orchestrator task items <task_id>
  3. Selection result: Confirm item_select chose a winner

    bash
    orchestrator store get evolution winner_latest --project self-evolution

4.3 Log Monitoring

bash
orchestrator task logs --tail 100 <task_id>
orchestrator task logs --tail 200 <task_id>

Key observations:

  1. Whether evo_plan generated candidates with substantive differences (not just superficial variants)
  2. Whether evo_implement items each implemented independently
  3. Whether evo_benchmark scoring is based on objective metrics and has discriminating power
  4. Whether select_best selected the higher-scoring candidate
  5. Whether evo_apply_winner cleanly applied the winning candidate

4.4 Process / Daemon Monitoring

bash
# daemon process
ps aux | grep orchestratord | grep -v grep

# queue/task status
orchestrator task list -o json

# agent subprocesses (claude -p)
ps aux | grep "claude -p" | grep -v grep

# code changes
git diff --stat

Key observations:

  1. Whether agent processes are still making progress
  2. Whether git diff --stat shows reasonable ongoing changes
  3. If there is prolonged zero output, zero diff, or stalled processes, record as suspected stuck

4.5 Additional Diagnostic Commands

bash
orchestrator task trace <task_id> --json
orchestrator event list --task <task_id> --limit 20

5. Key Checkpoints

5.1 evo_plan Phase Checkpoint

Confirm the output includes:

  1. 2 structured candidate solutions (JSON format, containing id/name/description/strategy)
  2. Two candidates with substantive differences (different algorithms, different designs, different trade-offs)
  3. items_generated event has been persisted, with the correct item count

If evo_plan outputs invalid JSON or the candidates are substantively identical, this indicates insufficient prompt differentiation guidance.

5.2 evo_implement Phase Checkpoint

Confirm:

  1. Both items produced code changes
  2. Change scope is consistent with each candidate's strategy description
  3. No mutual interference (item-scoped isolation is working correctly)

5.3 evo_benchmark Phase Checkpoint

Confirm:

  1. Both items have score captures
  2. Scoring is based on objective metrics such as compilation/tests/clippy
  3. Scores have discriminating power (not all perfect or all zero)

5.4 select_best Phase Checkpoint

Confirm:

  1. evolution.winner_latest store entry exists
  2. The selected candidate has the higher score
  3. Winner data includes the candidate ID and score

5.5 evo_apply_winner + self_test Phase Checkpoint

Confirm:

  1. The winning candidate's code compiles
  2. All tests pass
  3. compilation_gate invariant did not trigger halt
  4. Behaviors required to be preserved in the objective still work correctly

6. Success Criteria

The topic is considered complete when all of the following conditions are met:

  1. orchestrator completed the full self-evolution pipeline and exited normally at loop_guard.
  2. Two distinct candidate solutions were actually generated and implemented separately.
  3. The engine selected the higher-scoring candidate via item_select.
  4. The winning candidate's code passed self_test and the compilation_gate invariant.
  5. Key completion state achieved: <fill in the explicit completion condition for the topic here>
  6. evolution.winner_latest store records the selection result.
  7. This round did not introduce any new compilation or test regressions.

7. Exception Handling

7.1 Evolution-Specific Exception Scenarios

ExceptionDetection MethodResolution
evo_plan did not output valid JSONitems_generated event does not existCheck prompt; may need to adjust JSON output instructions
Two candidate solutions are substantively identicalInspect item label and approach variablesInsufficient prompt differentiation guidance; consider explicitly specifying the differentiation dimension in the goal
Both candidates fail to compileBenchmark scores are both 0Invariant will halt; manual analysis needed to determine if the topic is too complex
item_select cannot select a winnerStore entry does not existCheck whether score capture is working correctly
Test regression after evo_apply_winnerself_test failsevo_align_tests should attempt to fix; if it still fails, manual intervention is needed
Candidate solution exceeds topic scopeDiff involves unexpected filesPlan prompt scope constraints are insufficient; consider adding scope limits in the goal

7.2 C/S Architecture-Specific Exceptions

ExceptionDetection MethodResolution
Daemon not runningCLI reports failed to connect to daemon at .../orchestrator.sockStart with orchestratord --foreground --workers 2
CLI points to legacy monolithic binarywhich orchestrator points to core/target/release/Update symlink to target/release/orchestrator
Daemon still uses old code after rebuildPreviously fixed bug reappearsKill old daemon process and start a new one
Task starts immediately after task createtask list shows pending or quickly becomes runningIn C/S mode, task lifecycle is queue-only; this is normal behavior

7.3 General Exceptions

If any of the following occur, the human should stop "monitor-only" mode and record the exception:

  1. evo_plan clearly deviates from the topic or fails to generate structured candidates
  2. evo_implement has prolonged zero output or zero code changes
  3. self_test is ineffective or bypassed
  4. Process is deadlocked with zero output

Recommended recording method:

bash
orchestrator task info <task_id>
orchestrator task logs --tail 200 <task_id>
git diff --stat

Manual takeover for analysis should follow if necessary.


8. Human Role Boundaries

In this plan, the human role is explicitly limited to:

  1. Providing the objective
  2. Launching the workflow
  3. Monitoring status
  4. Interrupting and recording when exceptions occur

Humans do not write implementation plans for orchestrator in advance, do not preset code changes, and do not predetermine which path should win.

The purpose of this template is to reuse a stable execution method to verify: whether the current orchestrator can autonomously select a better implementation solution through competitive evolution around a clear objective.


9. Selection Guide: self-evolution vs. self-bootstrap

Decision DimensionChoose self-evolutionChoose self-bootstrap
Implementation pathMultiple viable paths; comparison neededPath is clear or unique
Topic scopeSmall to medium (1-5 files)Medium to large (unlimited)
Evaluation methodObjectively quantifiable scoringQA scenario verification needed
Iteration requirementOne evolution round is sufficientMultiple iterations needed for refinement
Documentation governanceNot neededQA/doc synchronized updates needed
Cost sensitivityMedium (2 candidates x 6 agent calls)Medium (2 cycles x multiple steps)
Safety requirementInvariant compilation gate is sufficientself_restart bootstrap verification needed

10. Cleanup

After task completion, clean up agent-produced topic code so the same topic can be re-tested:

bash
# View agent-produced changes
git diff --stat

# Revert all files modified by agent (adjust file list based on diff output)
git checkout HEAD -- <list of modified files>

# Delete new files created by agent
git status --short | grep '^??' | awk '{print $2}' | xargs rm -f

# Confirm working tree is clean
git status --short

# Verify compilation
cargo check

Warning: The agent may modify core files (context.rs, lib.rs, Cargo.toml, etc.). After each execution, always check git diff --stat and revert unexpected changes. Infrastructure bug fixes should be committed separately before cleaning up topic code.