How Guildhall works

Guildhall is a local service that keeps AI work from disappearing into a chat fog. It watches the project, frames the work, sends the right helper into the right lane, and keeps the receipts where you can inspect them later.

The point is not a gigantic prompt. The point is a visible loop: plan the work, do the work, inspect the work, and remember what changed.

Each run pulls from a few kinds of state:

Project state: the registered project, ./guildhall.yaml, provider setup, local commands, memory files, and current task queue.
Planning state: project goals, task blueprints, open questions, acceptance criteria, out-of-scope boundaries, and change orders.
Codebase orientation: the Corpus Map and likely target files that point helpers toward existing modules, helpers, components, design tokens, tests, and conventions.
Execution state: active worktree, checkpoint, previous attempts, verification output, and unresolved reviewer feedback.
Inspection state: reviewer rubrics, guild specialists, gate results, release readiness, and your decisions.
Learning state: project habits and cross-project preferences that can be accepted, ignored, or scoped.

The guild

Guildhall splits work by job, because "one agent does everything" gets messy fast.

Role	Job	Watch out for
Spec agent	Turns rough intent into a buildable blueprint.	Guessing when it needs your product call.
Coordinator	Keeps the project coherent across tasks.	Treating every task like a tiny island.
Worker	Makes focused changes against the accepted plan.	Inventing a new helper or component because it was convenient in the moment.
Reviewer	Checks the result against the plan and evidence.	Mistaking confidence for correctness.
Gate checker	Runs commands and records the outcome.	Pretending a passing command proves the product is good.

The split is not lore for lore's sake. Each role catches a different kind of mistake:

A spec agent can over-plan or ask vague questions.
A coordinator can lose the project-level thread.
A worker can optimize for the local file and miss the system.
A reviewer can rubber-stamp plausible prose.
A gate checker can confuse passing commands with product quality.

Guildhall’s job is to make those mistakes harder to miss. The plan says what good means. The worker names what it reused. Reviewers compare the result to the plan. Gates keep command output attached. The coordinator keeps an eye on whether the whole project still makes sense.

Structured flexibility

Software work is not deterministic, and pretending otherwise is how tools become brittle. Guildhall uses lanes instead: enough structure to stop drift, enough room to solve the real problem.

For example:

A task blueprint can name the goal, non-goals, likely files, and checks without dictating every edit.
A question card can offer good choices and still let you type the better answer.
A setting can inherit from your global defaults, stay project-specific, or become a new default when the same preference keeps showing up.
The Corpus Map can point to the likely helper, package, or component, while the worker still reads the files before editing.

That is the middle path between "just chat with an agent" and "please enjoy this mandatory process cathedral."

Making good output more likely

Good output usually comes from a chain of smaller wins:

The task is framed clearly enough that an agent knows what success means.
The helper sees the project state and constraints that matter for that task.
The task points toward the existing helpers, tokens, tests, and conventions that matter.
The run leaves durable evidence: a blueprint, question, code diff, checkpoint, review finding, gate result, or decision.
Another role or deterministic check inspects that evidence.

That is why Guildhall invests in blueprints, context packets, Corpus Map, design-system summaries, review rubrics, checkpoints, and gates. None of those exist to make the app feel busy. They exist because plausible-looking work is still wrong if it ignores the project around it.

Feedback loops

Guildhall asks for feedback where it changes the outcome.

Your feedback belongs on product calls, risk calls, taste calls, and release judgment. If the answer changes what the product is, who it serves, or whether you are comfortable shipping it, Guildhall asks.

Reviewer feedback belongs where another lens catches a class of mistake: architecture fit, product flow, accessibility, security, test coverage, design-system consistency, or release readiness.

Copy review is part of that loop. When a task changes docs or a visible surface, Guildhall brings in the Copywriter to check the small words too: button labels, nav labels, status text, tooltips, headings, empty states, and capitalization. Those details are where an app starts to feel either calm and intentional or weirdly stitched together.

Command feedback belongs where commands can prove something: tests, typechecks, builds, lint, browser checks, and release scripts. Guildhall records those outcomes so the next run does not have to rediscover whether a claim was actually verified.

How Guildhall learns

Guildhall can notice useful patterns: the command that always proves a project is healthy, the component library a repo already uses, the way you prefer tasks to be split, or the kind of question you never want buried in a transcript.

It does not treat every correction as a new law. A lesson has a scope and a source. Some lessons belong only to the current task, some become project habits, and some are candidates for your global defaults. Broad changes stay off until you approve them.

That keeps learning useful without making the product spooky. You can inspect what Guildhall wants to remember in Settings -> Memory, choose whether it applies here or everywhere, and ignore suggestions that were only true once.

Recovery uses the same trail. When a worker gets stuck, Guildhall can reread the right files, retry a bounded step, ask a sharper question, or turn the blocker into a visible change order. The goal is not to hide failure. The goal is to make the next move obvious.

See Memory, learning, and recovery for the inspectable memory layers and recovery paths.

Auditability as product quality

Auditability is not an enterprise checkbox. It is how Guildhall keeps AI work from becoming a magic trick.

The system records:

what you asked for
what questions were asked and answered
what blueprint was accepted
what context the worker received
what files changed
what checkpoint the worker left behind
what reviewers found
what gates ran
what decisions changed the plan
what settings or learned behaviors affected execution

This matters because good work is not only “the diff looks okay today.” Good work is being able to answer, tomorrow, why the change was made, what tradeoff was accepted, what verification ran, and whether the agent followed the project’s existing architecture instead of inventing a parallel one.

The trail still has to be readable. You need a quick status when you are scanning, a review packet when you are checking a task, and deeper history only when something smells off.

The context principle

Helpers need enough context to make good local decisions without stuffing the entire repository into the prompt. That means Guildhall prefers compact maps, summaries, pointers, and current state over full-file dumps.

When a helper needs more, it can read the specific file, search the repo, or run a command. The injected context is a navigation layer, not a substitute for evidence.

This is especially important for reuse. Guildhall points work toward existing shared code before anyone edits:

helpers, services, schemas, and packages for backend/runtime work
components, design tokens, and interaction patterns for UI work
tests and fixtures that show the intended contract
docs and decisions that explain why the system has its current shape

If the same idea appears twice, it may be time for a small abstraction. If it is still a one-off, the project does not need a brand-new framework-shaped hat.

Two pages cover the most important pieces:

Agent context explains what each agent receives before it acts.
Corpus Map explains how Guildhall builds and refreshes the compact codebase map.

The work loop

Guildhall’s default loop is:

Survey the project and capture facts.
Draft or revise the blueprint.
Ask bounded questions only when your answer changes the plan.
Promote ready work into implementation.
Give the worker scoped context, likely files, and Corpus Map guidance.
Record checkpoints and verification evidence as work changes.
Inspect the result with reviewers and deterministic gates.
Finish, revise, escalate, or record a change order.

This is why Guildhall has visible task states, transcripts, review packets, checkpoints, and settings provenance. The product is not just the final diff. It is also the trail that lets you trust the diff.

Finish What Is Already Moving

Guildhall is biased toward finishing active work before starting something new. That sounds small, but it is one of the main ways the product stays calm.

When work is already in progress, review, or gate checks, Guildhall keeps driving that work toward a clear outcome before it grabs unrelated tasks. A finished task with a review packet is more useful than five half-started tasks with lively transcripts.

The loop looks like this:

Finish anything waiting on gates.
Resolve work already in review.
Continue active implementation.
Claim the next ready task only when the active lane is clear.

That keeps the queue honest. Blocked work is marked as blocked instead of quietly falling behind fresher work. Ready work is claimed explicitly so two agents do not collide. Every handoff needs evidence: what changed, what was checked, what still needs a decision, and where to resume.

Where Guildhall stays quiet

More structure is not always better. A tiny project does not need to think about release maps, coordinator rosters, design-system maturity, or settings inheritance unless those concepts become useful.

Guildhall keeps advanced detail available, not mandatory. It asks when the project gets complicated, suggests a split when a task is too broad, or surfaces a setting when the default keeps being wrong. Until then, it quietly does the ordinary thing.

What makes this different from a long chat

A long chat can remember a lot, but it is fragile. Important state gets buried in prose, and each new task depends on the model rediscovering the project.

Guildhall keeps durable state first-class:

task status and acceptance criteria live in structured task state
questions live as answerable cards
project settings live as levers with provenance
codebase orientation, including design-system maturity, lives in the Corpus Map
review findings live on the task
verification output is tied to gates and checkpoints
learned behavior must be accepted before it becomes a default

The result feels less like "one clever conversation" and more like a small local team with a shared project room.

How Guildhall works ​

The guild ​

Structured flexibility ​

Making good output more likely ​

Feedback loops ​

How Guildhall learns ​

Auditability as product quality ​

The context principle ​

The work loop ​

Finish What Is Already Moving ​

Where Guildhall stays quiet ​

What makes this different from a long chat ​