Most AI coding assistants fail the same way: they try to be every expert simultaneously. Ask for a code review and you receive feedback on product strategy, naming conventions, missing tests, and deployment risks all in one undifferentiated response. Ask for a feature plan and the AI starts writing code before you have finished describing the problem. This blending of perspectives is not a bug you can prompt your way out of — it is a fundamental characteristic of how large language models respond when they lack a clear role constraint. gstack’s gear system was engineered specifically to break this pattern.
This article goes deep on gstack’s gear architecture: what gears are at the technical level, how each of the three built-in personas is structured, the mechanics behind persona switching, and — most importantly — how to extend the system with your own custom gears. If you have not yet installed gstack or are unfamiliar with the basic concept, start with the gstack overview before continuing here.
The Gears Architecture
A “gear” in gstack is a focused system context — a curated set of behavioral instructions and epistemic priorities that tell the AI model which professional role to inhabit for the duration of an interaction. Each gear is implemented as a skill file inside ~/.claude/skills/gstack/skills/. When you invoke a slash command like /code-review or /plan, Claude Code loads that file, executes its instructions as a session-level context, and enters the corresponding gear state for everything that follows in that interaction thread.
The key engineering insight behind gears is that role clarity produces output quality. When a model has explicit instructions telling it “you are thinking as a QA Engineer right now — your job is to find things that break, not to solve them or redesign the architecture,” it can apply that lens with far more precision than when it tries to balance all relevant professional perspectives simultaneously. Garry Tan, gstack’s creator, describes the failure mode of the latter approach as a “mushy blend” — outputs that are simultaneously a little bit of everything and not very much of anything.
The gear architecture borrows directly from how high-performing human teams are actually structured. When a senior engineering leader reviews a pull request, they are not simultaneously playing product manager, QA analyst, and technical writer. They are applying the specific judgment that comes from their role. Gears replicate this by giving the AI a role boundary it is instructed to respect throughout the interaction.
Structurally, the three built-in gears map to three of the most important decision-making roles in early-stage product development:
| Gear | Persona | Primary Question |
|---|---|---|
| Founder | Product visionary / strategic decision-maker | Does this solve the right problem, the right way? |
| Engineering Manager | Technical leader / code quality steward | Is this built correctly and sustainably? |
| QA Engineer | Quality advocate / edge-case finder | What will break, and under what conditions? |
Each gear has its own epistemics. The Founder gear is trained to reason about opportunity costs, user impact, and product-market fit. The Engineering Manager gear reasons about technical debt, system design, and long-term maintainability. The QA Engineer gear reasons about failure modes, edge cases, and the gap between developer assumptions and real-world user behavior. The same codebase, evaluated through each lens in sequence, produces three qualitatively different and mutually complementary sets of feedback.
Founder Gear
The Founder gear activates when you invoke any command associated with product strategy and high-level planning. The core commands are /plan, /design-review, /retrospective, /rfc, and /metrics.
Philosophical stance: The Founder gear is optimized for judgment calls that cannot be resolved by pure technical analysis. It asks “what should we build?” before “how should we build it?” It explicitly deprioritizes implementation details unless the implementation choice has direct strategic consequences. When you use the Founder gear, you are asking the AI to inhabit the perspective of someone who has to make product decisions under uncertainty with limited resources — a description that applies equally to a solo indie developer and a YC founder running a seed-stage startup.
Prompt structure: At the system prompt level, the Founder gear establishes a cognitive frame around business judgment. The prompt instructs the model to evaluate every response against criteria like: Does this create real user value? What is the opportunity cost of this decision? What assumptions are being made that could turn out to be wrong? What is the minimum viable version of this that would validate the core hypothesis?
The prompt also contains explicit negative constraints — instructions about what the Founder gear should not do. It should not critique variable naming. It should not suggest architectural refactors unless they are directly related to the strategic question at hand. It should not conflate “we can build this” with “we should build this.” These negative constraints are just as important as the positive instructions, because they are what prevent the mushy blend from creeping back in.
The /plan command in practice:
When you run /plan followed by a feature description, the Founder gear produces a structured output that typically includes:
- A restatement of the problem from the user’s perspective, not the developer’s
- A scope boundary: what is explicitly in and out for this iteration
- Two or three alternative approaches with explicit trade-off analysis
- Success criteria that can be evaluated before the feature ships
- A prioritized implementation sequence with the rationale for the ordering
/plan Add real-time collaboration to the document editor — multiple users editing the same doc simultaneously
The Founder gear will engage with questions like: Is real-time collaboration actually what users need, or is async commenting sufficient for the use case? What is the MVP version that proves the concept without building full OT/CRDT conflict resolution? What does “success” look like in week one after launch?
The /design-review command in practice:
/design-review is a dual-gear command — it invokes both Founder and Engineering Manager perspectives in sequence. The Founder pass evaluates the proposed design from a product and strategic standpoint. The Engineering Manager pass evaluates it for technical soundness and implementation risk. This command is intentionally more expensive in terms of output depth because the stakes of a design decision are higher than a routine code review.
/design-review We are considering switching our authentication system from session-based to JWT-based tokens before the public launch
A design review at this stage will probe whether the switch is driven by a real technical constraint or a fashionable architectural preference, what the migration path looks like for existing users, and whether the timing relative to launch creates unnecessary risk.
Engineering Manager Gear
The Engineering Manager gear is the most frequently used gear for developers who are past the planning phase and actively shipping code. It activates on /code-review, /ship, /post-ship-docs, /standup, and /changelog.
Philosophical stance: The Engineering Manager gear thinks in terms of consequences over time. A piece of code is not just correct or incorrect today — it is maintainable or unmaintainable over the next twelve months, it creates or reduces technical debt, it is consistent or inconsistent with the patterns the rest of the codebase establishes. This temporal dimension is what distinguishes Engineering Manager feedback from simple linting or code correctness checks.
The gear is also calibrated to think about the human beings who will maintain this code. It considers readability as a first-class concern, not a nice-to-have. It will flag code that is technically correct but cognitively expensive to reason about. It evaluates whether new patterns introduced in a change are self-documenting or require tribal knowledge to understand.
Prompt structure: The Engineering Manager gear’s system prompt establishes evaluation criteria across several dimensions:
- Correctness: Does the code do what it is intended to do? Are there logical errors?
- Completeness: Are error cases handled? Are all code paths covered?
- Architecture: Is the approach consistent with the system’s existing patterns? Does it introduce the right level of abstraction?
- Maintainability: Will the next engineer to touch this code understand what it does and why?
- Performance: Are there obvious performance hazards — O(n²) operations where O(log n) is straightforward, unnecessary database queries in loops, memory leaks?
- Standards adherence: Does the code follow the team’s established conventions?
The prompt explicitly instructs the Engineering Manager gear to prioritize these concerns in order — correctness before style, architecture before naming. This prevents the gear from filling a review with naming bikeshedding while missing a structural design issue.
The /code-review command in practice:
The /code-review command is designed to be run against a specific piece of code — a function, a module, a pull request diff, or a file. You typically invoke it after pasting or referencing the code you want reviewed:
/code-review
Here is the new user authentication middleware I wrote:
[paste code here]
The Engineering Manager gear produces output structured around the priority hierarchy described above. It leads with the most critical findings — potential bugs, architectural mismatches, security concerns — and follows with lower-priority observations. Every finding includes the specific line or pattern being flagged and an explanation of why it is a concern, not just what it is.
Critically, the Engineering Manager gear does not mix in QA findings (that is the QA gear’s job) and does not second-guess the product decision that motivated the code (that is the Founder gear’s job). You get pure engineering judgment with a clear scope boundary.
The /ship command as a pre-flight checklist:
/ship is the Engineering Manager gear’s most opinionated command. Before it confirms a feature is ready to deploy, it runs through a structured checklist that covers regression risk, rollback readiness, monitoring and alerting, documentation status, and dependency review. The checklist output is detailed enough to serve as a genuine pre-deployment gate rather than a rubber stamp.
QA Engineer Gear
The QA Engineer gear is the most adversarial of the three personas — adversarial in the best possible sense. Its job is to think like someone who wants the software to fail, not to make developers feel bad, but because finding the failures in a controlled environment is exponentially cheaper than finding them in production.
Philosophical stance: The QA Engineer gear operates from a fundamentally different epistemic position than the other two gears. The Founder gear asks “what should this do?” and the Engineering Manager gear asks “is this built correctly?” The QA Engineer gear asks “under what conditions will this not do what it is supposed to do?” It assumes the code will encounter edge cases. It assumes users will behave in ways the developer did not anticipate. It assumes that every system boundary is a potential failure point.
This adversarial stance is not accidental or theatrical — it reflects the actual intellectual disposition of excellent QA engineers. The QA gear is prompted to generate test scenarios that the developer almost certainly did not think of, because thinking of them requires temporarily setting aside the assumption that the happy path is the only path.
Prompt structure: The QA Engineer gear’s system prompt instructs the model to systematically enumerate failure modes across several categories:
- Boundary conditions: What happens at the minimum and maximum allowed values? What happens at values just outside those boundaries?
- Null and empty states: What happens when expected data is missing, empty, or null?
- Concurrency: What happens when two operations execute simultaneously? Are there race conditions or ordering dependencies?
- Error propagation: When a downstream dependency fails, does the error surface cleanly or does the system enter an undefined state?
- User behavior deviations: How does the feature behave when users do not follow the intended flow — clicking buttons in unexpected order, navigating back mid-process, refreshing at critical moments?
- Integration boundaries: What assumptions does this code make about the data it receives from external sources? What happens if those assumptions are violated?
The /qa command in practice:
/qa is the most technically sophisticated command in the gstack suite. When invoked against a running web application, it launches a browser automation session, executes a structured test protocol, evaluates the application across functional and performance dimensions, and produces a health_score — a numerical quality indicator saved alongside a full report to .gstack/qa-reports/.
The health score aggregates findings across weighted dimensions: core functionality (highest weight), error handling, performance benchmarks, UI interaction correctness, and API response validity. A score of 100 indicates no detected issues across all categories; real-world scores on actively developed applications typically fall between 70 and 90, with specific findings attached to each point of deduction.
The persistent report in .gstack/qa-reports/ is a meaningful feature. Unlike chat-based QA feedback that disappears when the session ends, the saved reports create an audit trail you can diff over time — watching the health score trend up as issues are fixed, or catching regressions when a new change introduces a decline.
How Persona Switching Works
From a technical standpoint, gear switching in gstack happens through Claude Code’s skill execution model. When you invoke a slash command, Claude Code reads the corresponding Markdown file from ~/.claude/skills/gstack/skills/ and injects its contents as the session’s active system context. This context injection happens at the start of the command execution and establishes the behavioral frame for that interaction thread.
The Markdown skill files are not simple prompt templates. They contain structured sections that Claude Code’s skill runner interprets and applies in a specific order:
---
name: code-review
description: Engineering Manager code review — architecture, maintainability, standards
gear: engineering-manager
---
## Role
You are an experienced Engineering Manager conducting a structured code review...
## Evaluation Criteria
Apply the following criteria in priority order...
## Output Format
Structure your response with the following sections...
## Constraints
Do not comment on business strategy or product decisions.
Do not generate new feature ideas from the code.
Focus exclusively on the engineering quality of what is written...
The gear: frontmatter field is used by the gstack system to tag which persona a command belongs to. This tagging enables features like the dual-gear /design-review command, which sequences the Founder pass and the Engineering Manager pass by loading and executing two gear contexts in order.
Persona switching between commands within the same working session is stateless at the session level — each command invocation establishes its own fresh context rather than accumulating state from previous invocations. This means running /plan followed by /code-review in the same Claude Code session does not produce a muddled hybrid of both gears. Each command is a clean context reset to its designated persona.
This stateless model has an important implication: gears are most effective when invoked for focused, single-concern tasks rather than as wrappers around open-ended conversations. The gear establishes the frame; you bring the specific artifact to evaluate.
For a deeper comparison of how role-based persona structuring works in other multi-agent frameworks, see how MetaGPT handles custom roles and actions — a different architectural approach to the same fundamental challenge of specializing AI behavior for specific tasks.
Customizing Existing Gears
Because gstack is installed as a local git clone, every skill file is directly editable. Customizing an existing gear means opening the corresponding Markdown file and modifying the system context, evaluation criteria, output format, or constraints sections.
When customization makes sense:
The default gear prompts are calibrated for general-purpose use across a wide range of software projects. But your project has specific characteristics — a particular language, a particular architecture pattern, established team conventions, a specific tech stack — that general-purpose prompts cannot anticipate. Customization lets you encode that project-specific knowledge directly into the gear’s context.
Common customizations include:
- Adding language-specific evaluation criteria to the Engineering Manager gear (e.g., Python-specific patterns, TypeScript strict mode considerations, Go idiom adherence)
- Adding project architecture context so the gear understands your specific service boundaries and does not flag intentional patterns as mistakes
- Adjusting output format to match your team’s code review conventions or issue tracking format
- Adding negative constraints that reflect your team’s agreed-upon trade-offs (e.g., “do not flag the use of any-casting in TypeScript — we have intentionally accepted this trade-off”)
How to customize safely:
Before editing any gear file, create a backup copy:
cp ~/.claude/skills/gstack/skills/code-review.md \
~/.claude/skills/gstack/skills/code-review.md.backup
Then open the original file in your editor and make your changes. Test the modified gear on a few representative tasks to confirm it behaves as expected before relying on it in production workflows.
If you are using a project-level install (the .claude/skills/gstack/ approach described in the installation guide), project-specific customizations belong in the project copy. Team-wide customizations go in the project copy; personal workflow preferences go in your global install.
One important caution: be careful about adding instructions that inadvertently relax gear constraints rather than refine them. The value of the gear system comes from its focused lens — an Engineering Manager gear that has been modified to also provide product strategy feedback has lost part of what made it useful. Additions should sharpen the focus for your specific context, not broaden it back toward the mushy blend.
Creating a New Persona
The most powerful use of the gear architecture is adding entirely new personas that do not exist in the default gstack distribution. If your workflow involves recurring tasks that have a distinct professional perspective — security auditing, accessibility review, performance engineering, API design review, documentation writing — those are strong candidates for a custom gear.
Step 1: Define the persona’s epistemic frame
Before writing a single line of prompt, answer these questions in plain language:
- What professional role does this persona inhabit?
- What is the single most important question this persona asks when evaluating anything?
- What does this persona explicitly not care about (constraints that prevent drift into other gears)?
- What output structure would a real professional in this role produce?
Take a security auditor as a concrete example:
- Role: Application security engineer
- Primary question: What vectors exist for unauthorized access or data exfiltration?
- Negative constraints: Does not optimize for performance, does not improve code readability, does not suggest feature additions
- Output structure: Findings categorized by severity (Critical / High / Medium / Low), each with CVSS description, reproduction steps, and recommended remediation
Step 2: Write the skill file
Create a new Markdown file in ~/.claude/skills/gstack/skills/ named after your command:
---
name: security-audit
description: Security Engineer audit — vulnerability analysis, threat modeling, remediation
gear: security-engineer
---
## Role
You are a senior application security engineer conducting a structured
security audit. Your job is to identify vulnerabilities, attack vectors,
and security misconfigurations in the code or system design presented to you.
You think adversarially: you assume an attacker has read this code and
is actively looking for ways to exploit it.
## Evaluation Criteria
Evaluate the input across the following security domains, in priority order:
1. **Authentication and authorization** — Are identity checks correct and
consistently applied? Are there privilege escalation paths?
2. **Input validation** — Is all user-controlled input validated and
sanitized before use? Are injection attacks (SQL, command, XSS) possible?
3. **Data exposure** — Is sensitive data encrypted at rest and in transit?
Are secrets hardcoded or logged?
4. **Dependency risk** — Are there known-vulnerable dependencies?
5. **Error handling** — Do error messages leak system internals?
## Output Format
Produce a structured security report with the following sections:
### Summary
One paragraph: overall risk posture and most critical finding.
### Findings
For each finding:
- **Severity:** Critical / High / Medium / Low
- **Category:** (from Evaluation Criteria above)
- **Description:** What the vulnerability is and why it is a risk
- **Location:** File, function, or line reference
- **Remediation:** Specific fix recommendation
### Risk Assessment
Overall score: Critical / High / Medium / Low / Informational
## Constraints
- Do not suggest feature enhancements unrelated to security.
- Do not comment on code style, naming, or architecture unless directly
relevant to a security finding.
- Do not recommend switching to a different framework or language.
- Do not hallucinate vulnerabilities — flag only what is directly evidenced
in the code presented.
Step 3: Verify the skill is discoverable
Open a new Claude Code session and type /sec — the autocomplete should show /security-audit as an available command. If it does not appear, confirm the file is saved to the correct directory and that the filename uses only lowercase letters and hyphens (no spaces, no uppercase).
Step 4: Test with representative inputs
Run your new gear against several realistic inputs before relying on it. A good test set for the security-audit gear would include: a route handler with JWT validation, a SQL query builder, a form submission endpoint, and a configuration file that may contain secrets. Evaluate whether the outputs match what a real security engineer would produce and refine the prompt accordingly.
For a comparative look at how other multi-agent systems handle role specialization at the framework level, the approach MetaGPT takes with custom roles and actions offers instructive contrasts — MetaGPT encodes roles in Python class hierarchies with typed message passing, while gstack uses prompt-level behavioral constraints. Neither is strictly superior; they reflect different trade-offs between control, flexibility, and implementation complexity.
The theoretical foundation for why role-constrained prompting works is covered in depth in our guide to prompt engineering for AI agents, including techniques like chain-of-thought prompting, system prompt layering, and negative constraint design that are directly applicable to writing effective gear prompts.
Frequently Asked Questions
Does switching gears consume extra context window?
Each gear’s skill file injects its system context at the start of a command invocation. The context size varies by gear — simpler gears with concise prompts consume a few hundred tokens, while complex multi-criteria gears like /qa or /design-review consume more. In practical terms, the context cost of a gear is small relative to the content you provide for it to evaluate. You are unlikely to hit context window limits because of gear overhead on typical code review or planning tasks.
The more relevant context consideration is that each command invocation is a fresh context by default. If you want to preserve the results of one gear’s analysis as input to the next gear’s evaluation (for example, using the Founder gear’s plan as context for the Engineering Manager gear’s architecture review), you should explicitly paste or reference the previous output in your next command invocation. Gears do not automatically chain their context across separate invocations.
Can I mix two gears in a single session?
Technically yes, but it is not recommended for the use cases the single-gear commands are designed for. The /design-review command is the deliberate exception — it is specifically engineered as a dual-gear command that sequences Founder and Engineering Manager perspectives with a clear handoff between them.
For other use cases, mixing two gears in a single interaction tends to recreate exactly the mushy blend problem the gear system was designed to solve. If you find yourself wanting to mix gears frequently, that is a signal that you might benefit from a custom gear that specifically integrates the two perspectives you need — with explicit priority rules about which lens takes precedence when they conflict.
The more effective pattern is sequential gear use: run /plan to completion, act on the output, then run /code-review as a separate command on your implementation. The separation produces cleaner, more actionable outputs than a hybrid would.
How do I prevent prompt leakage between gears?
Prompt leakage — where behavioral context from one gear bleeds into a subsequent gear interaction — is generally not a concern in gstack’s implementation because each slash command invocation establishes a clean context reset. The previous command’s system context does not persist as active behavioral instructions into the next command.
However, there is a subtler form of leakage to be aware of: conversational context. If you have been discussing a product strategy question in a Founder gear session and then invoke /code-review, Claude Code will have your recent conversation in its context window. The Engineering Manager gear’s system prompt will be active, but the model can still see the prior product discussion and may allow it to influence the code review response.
If you want a truly clean gear switch, start a fresh Claude Code session rather than switching commands within an ongoing conversation. For most use cases this is unnecessary overhead, but for sensitive or high-stakes reviews where you want maximum gear purity, the new session approach guarantees it.
What happens if two gear files have conflicting instructions?
Single-command invocations activate exactly one gear file, so conflicts between different gears are not an issue during normal use — you are always in one gear at a time.
Conflicts can arise if you create a custom gear and its instructions overlap with or contradict gstack’s internal configuration (for example, output formatting conventions that gstack applies globally). In practice, this is rare because the per-command skill files are authoritative for everything within their scope, and gstack’s global configuration does not impose behavioral constraints that would conflict with well-written gear prompts.
The more common issue is internal contradictions within a single custom gear file — instructions that are inconsistent with each other (e.g., “be comprehensive” and “be concise” without defining which takes precedence in which context). When writing custom gear prompts, prioritize explicit rules over implicit ones: if two instructions could conflict, add a meta-instruction that resolves the tie (“when comprehensiveness and conciseness conflict, prefer conciseness for findings below High severity”).
Next Steps
You now have a complete picture of gstack’s gear architecture — how the built-in Founder, Engineering Manager, and QA Engineer personas are structured, the mechanics of how persona switching works, and the full process for building and registering custom gears.
The highest-leverage next action is to run all three built-in gears against a real feature in your current project. Start with /plan on a feature you are about to build to establish the product-level frame. Implement the feature, then run /code-review on your implementation. Finally, run /qa against your development server. Experiencing the three-gear workflow end-to-end on a concrete task will give you a much sharper intuition for when each gear is appropriate than any documentation can.
If you are ready to build your first custom gear, use the security-audit example in the “Creating a New Persona” section above as a template. Adapt the role definition, evaluation criteria, output format, and constraints to match the professional perspective you want to encode. The quality of the output you get from a custom gear is directly proportional to the precision with which you define what that persona does — and, just as importantly, what it explicitly does not do.