Self-Evolution Topic Execution Plan
Harness Engineering execution plan: this is an agent-executable scenario that shows how the control plane coordinates environment, workflow, guardrails, and feedback loops rather than a one-off agent call.
Agent Collaboration: This document is an agent-executable plan. Open this project in an AI coding agent (Claude Code, OpenCode, Codex, etc.) — the agent reads this plan and orchestrates other agents via the orchestrator CLI to collaboratively complete the task, from resource deployment and execution to result verification, fully autonomously.
This document is the first real-world test topic for the self-evolution workflow. Unlike self-bootstrap, self-evolution uses WP03 dynamic candidate generation + competitive selection to explore multiple implementation paths, with the engine automatically selecting the best solution.
1. Task Objective
Pass the following objective verbatim to the orchestrator as the topic for this round of self-evolution:
Topic name:
StepTemplate Prompt Variable Parsing EnhancementBackground: The current StepTemplate prompt field uses simple string substitution (
{var_name}) to inject runtime variables. This approach has the following issues:
- No detection of undefined variables — if a prompt references a non-existent variable, the placeholder
{var_name}is preserved after substitution, which may confuse the agent.- No conditional sections — it is not possible to include/exclude a prompt section based on whether a variable exists (e.g., "show the diff section if a diff is available").
- No default value mechanism — there is no way to fall back to a reasonable default when a variable does not exist.
Objectives for this round: Enhance prompt template variable parsing to support the following syntax:
{var_name}— existing behavior, with a warning log when undefined{var_name:-default_value}— use default value when undefined{?var_name}...{/var_name}— conditional section, included when variable exists and is non-emptyConstraints:
- Do not introduce external template engine dependencies (e.g., Tera, Handlebars); implement in pure Rust.
- Maintain full backward compatibility with the existing
{var_name}syntax.- The ultimate goal: all existing StepTemplate prompts should work without any modifications; the new syntax is an optional enhancement.
1.1 Expected Outputs
Produced autonomously by the orchestrator:
- Two competing proposals (generated by the
evo_planstep and injected as dynamic items viagenerate_items). - Independent implementation for each proposal (
evo_implement, item-scoped in parallel). - Automated scoring for each proposal (
evo_benchmark: compilation/tests/clippy/diff size). - Engine automatically selects the higher-scoring proposal (
select_best, WP03 item_select). - The winning proposal is applied and passes final verification (
evo_apply_winner+self_test).
1.2 Non-Goals
- Do not presume which path should win.
- Do not have humans specify the concrete code implementation approach.
- Do not require full QA documentation generation (this round focuses on validating the evolution mechanism).
1.3 Rationale for Topic Selection
This topic was chosen as the first real-world self-evolution test based on the following considerations:
- Appropriate scope: Involves 1-2 files (the template resolution module), with manageable change size suitable for comparing 2 candidate proposals.
- Clear dimensions for comparison: Regex-based approach vs. hand-written parser — the two paths have genuine differences in performance, readability, and correctness.
- Objectively scorable: Compilation pass, test pass, clean clippy, diff size — all are automatable, quantifiable metrics.
- Backward compatibility constraint: Existing tests naturally serve as regression protection, requiring no additional manual verification.
- Self-bootstrap relevant: Improving prompt templates directly enhances the quality of orchestrator's own agent calls.
2. Execution Method
This round follows the self-evolution workflow with the following pipeline:
evo_plan ──[generate_items]──> evo_implement (x2) ──> evo_benchmark (x2) ──> select_best ──> evo_apply_winner ──> evo_align_tests ──> self_test ──> loop_guardKey differences from self-bootstrap:
| Dimension | self-bootstrap | self-evolution |
|---|---|---|
| Loop strategy | Fixed 2 cycles | Fixed 1 cycle |
| Implementation paths | Single linear | 2 competing candidates |
| Selection mechanism | None | WP03 item_select (max score) |
| Cost control | Multiple steps, multiple agents | max_parallel=1, no QA/doc steps |
| Safety guarantees | self_test + self_restart | self_test + invariant (compilation_gate) |
3. Launch Steps
3.1 Build and Start the Daemon
In the C/S architecture, the CLI (orchestrator) connects to the daemon (orchestratord) via a Unix Domain Socket.
cd "$ORCHESTRATOR_ROOT" # your orchestrator project directory
cargo build --release -p orchestratord -p orchestrator-cli
# Start the daemon (if not already running)
# --foreground keeps log output in the foreground; --workers specifies the number of parallel workers
nohup ./target/release/orchestratord --foreground --workers 2 > /tmp/orchestratord.log 2>&1 &
# Verify the daemon is running
ps aux | grep orchestratord | grep -v grep
# Verify the queue can be consumed by daemon workers
orchestrator task list -o jsonWarning: CLI binary path: In C/S mode, the CLI is at
target/release/orchestrator(crates/cli), not the legacy monolithic binarycore/target/release/agent-orchestrator. Update any symlinks pointing to the old path.
3.2 Initialize Database and Load Resources
orchestrator delete project/self-evolution --force
orchestrator init
orchestrator apply -f your-secrets.yaml --project self-evolution
# apply additional secret manifests as needed --project self-evolution
# Warning: --project is required; otherwise real AI agents will be registered in the global namespace
orchestrator apply -f docs/workflow/execution-profiles.yaml --project self-evolution
orchestrator apply -f docs/workflow/self-evolution.yaml --project self-evolution3.3 Verify Resources Are Loaded
Verify that resources are loaded (add --project to scope to a specific project):
orchestrator get workspaces --project self-evolution -o json
orchestrator get workflows --project self-evolution -o json
orchestrator get agents --project self-evolution -o json3.4 Create and Launch the Task
In C/S mode, task create directly enqueues to the daemon worker. The task begins executing automatically upon creation; there is no need for a separate task start.
orchestrator task create \
-n "evo-prompt-template-enhance" \
-w self -W self-evolution \
--project self-evolution \
-g "Enhance StepTemplate prompt variable parsing: support {var:-default} default value syntax and {?var}...{/var} conditional section syntax. Pure Rust implementation, no external template engines. Maintain full backward compatibility with the existing {var} syntax. Undefined variables should produce a warn log instead of silently preserving the placeholder."Record the returned <task_id>. The task will be immediately claimed by a worker and begin execution. To wait for completion, use orchestrator task watch <task_id> or poll task info.
4. Monitoring Methods
4.1 Status Monitoring
orchestrator task list
orchestrator task info <task_id>
orchestrator task trace <task_id> # execution timeline with anomaly detection
orchestrator task watch <task_id> # real-time status panel refresh4.2 Key Events in the Evolution Process
In addition to standard step monitoring, self-evolution has the following specific observation points:
items_generatedevent: Confirm thatevo_plansuccessfully generated 2 candidate itemsbashorchestrator event list --task <task_id> --type items_generated -o jsonDynamic item status: Confirm both candidates were executed
bashorchestrator task items <task_id>Selection result: Confirm item_select chose a winner
bashorchestrator store get evolution winner_latest --project self-evolution
4.3 Log Monitoring
orchestrator task logs --tail 100 <task_id>
orchestrator task logs --tail 200 <task_id>Key observations:
- Whether
evo_plangenerated two proposals with substantive differences - Whether
evo_implementproduced independent implementations for each item - Whether
evo_benchmarkscoring is based on objective metrics - Whether
select_bestselected the higher-scoring proposal - Whether
evo_apply_winnercleanly applied the winning proposal
4.4 Process / Daemon Monitoring
# Daemon process
ps aux | grep orchestratord | grep -v grep
# Queue/task status
orchestrator task list -o json
# Agent subprocesses (claude -p)
ps aux | grep "claude -p" | grep -v grep
# Code changes
git diff --stat5. Key Checkpoints
5.1 evo_plan Phase
Confirm the output contains:
- 2 structured candidate proposals (JSON format)
- The two proposals have substantive differences (e.g., regex vs. hand-written parser)
- The
items_generatedevent has been persisted
5.2 evo_implement Phase
Confirm:
- Both items produced code changes
- Change scope is consistent with each proposal's description
- No cross-contamination between items (item-scoped isolation)
5.3 evo_benchmark Phase
Confirm:
- Both items have a score capture
- Scoring is based on objective metrics such as compilation/tests/clippy
- Scores are differentiated (not both receiving full marks)
5.4 select_best Phase
Confirm:
- The
evolution.winner_lateststore entry exists - The selected proposal has the higher score
- Winner data includes the proposal ID and score
5.5 evo_apply_winner + self_test Phase
Confirm:
- The winning proposal's code compiles
- All tests pass
- The
compilation_gateinvariant did not trigger a halt - Existing StepTemplate prompt behavior is unchanged (backward compatible)
6. Success Criteria
The topic is considered complete when all of the following conditions are met:
- The orchestrator completes the full
self-evolutionpipeline and terminates normally atloop_guard. - Two distinct candidate proposals were actually generated and independently implemented.
- The engine selected the higher-scoring proposal via
item_select. - The winning proposal's code passes
self_testand thecompilation_gateinvariant. - The existing
{var_name}substitution syntax remains backward compatible. - The
evolution.winner_lateststore records the selection result.
7. Error Handling
7.1 Evolution-Specific Error Scenarios
| Error | Detection Method | Resolution |
|---|---|---|
| evo_plan does not output valid JSON | items_generated event does not exist | Check the prompt; JSON output instructions may need adjustment |
| Two candidate proposals are essentially identical | Inspect item labels and approach variables | Indicates insufficient differentiation guidance in the prompt |
| Both candidates fail to compile | Benchmark scores are both 0 | Invariant will halt; manual analysis of plan quality needed |
| item_select cannot choose a winner | Store entry does not exist | Check whether score capture is working correctly |
| Tests regress after evo_apply_winner | self_test fails | evo_align_tests should attempt a fix; if it still fails, manual intervention needed |
7.2 C/S Architecture-Specific Errors
| Error | Detection Method | Resolution |
|---|---|---|
| Daemon not running | CLI reports failed to connect to daemon at .../orchestrator.sock | Start with orchestratord --foreground --workers 2 |
| CLI points to legacy monolithic binary | which orchestrator points to core/target/release/ | Update symlink to target/release/orchestrator |
| Daemon still uses old code after rebuild | Previously fixed bug reappears | Kill the old daemon process and start a new one |
| Task starts immediately after task create | task list shows pending or quickly transitions to running | In C/S mode the task lifecycle is queue-only; this is normal behavior |
7.3 General Errors
Same as self-bootstrap: record status, logs, and diff; manually take over if necessary.
8. Human Role Boundaries
Same as self-bootstrap: humans are only responsible for launching, monitoring, judging, and recording.
The additional observation focus for this round is whether the evolution mechanism itself works:
- Whether candidate generation produces meaningful differentiation
- Whether competitive evaluation is based on objective metrics
- Whether the selection result is reasonable
- Whether the overall pipeline produces higher-quality code than linear self-bootstrap
These observations will be used to determine whether the self-evolution workflow is worth replacing or supplementing self-bootstrap in future topics.
9. Post-Test Cleanup
After the task completes, clean up the agent-produced topic code so the same fixture can be tested again:
# Revert all files modified by the agent (preserve infrastructure bug fixes)
git checkout HEAD -- Cargo.lock core/Cargo.toml \
core/src/collab/context.rs core/src/collab/mod.rs \
core/src/selection.rs crates/daemon/src/server.rs
# Delete new files created by the agent
rm -f core/src/collab/template.rs
# Confirm working tree is clean
git status --short
# Verify compilation
cargo checkWarning: The agent may modify core files such as
context.rs,lib.rs, andCargo.toml. After each execution, be sure to checkgit diff --statand revert unexpected changes.