When AI-assisted development goes wrong, it is rarely because the AI lacked capability. It is almost always because the interaction lacked structure. You asked a general question, got a general answer, and then spent the next hour figuring out which parts of that answer actually mattered for the specific problem you were trying to solve. gstack exists to close that gap — not by making Claude smarter, but by making your interactions with it more deliberate.
This guide is a hands-on tour through the real workflow scenarios where gstack’s gear system produces the most concrete value. Each use case below comes with an actual command invocation, representative output, and a description of what the specific AI persona contributes that a generic prompt would not. By the end, you will have a clear picture of how the full gear-switching sequence — plan, design, code, review, QA, ship — works as an integrated development workflow rather than a collection of disconnected prompts.
Why Structure Matters in AI-Assisted Development
Every developer who has used an LLM for more than a few weeks has run into the same failure pattern. You describe a problem, the AI produces a response, and the response is simultaneously too broad and too shallow — touching on strategic considerations, implementation details, edge cases, and deployment concerns all at once, without going deep enough on any single dimension to be truly actionable.
This is not a bug in the AI’s reasoning. It is the predictable consequence of asking a general-purpose system to perform without any role constraint. A senior engineer reviewing a pull request does not simultaneously think about product-market fit, test coverage, infrastructure costs, and coding style. They think about code quality and architecture. The product thinking happened earlier, in a different meeting, with a different participant. The QA thinking will happen later, with a third perspective. Role clarity is what makes professional teams effective at each phase of a project.
gstack replicates that role clarity in Claude Code sessions. By activating a specific gear — Founder, Engineering Manager, or QA Engineer — you constrain the AI’s perspective to exactly the domain that is relevant to your current task. The result is output that is deeper, more specific, and easier to act on than what you get from an unconstrained general-purpose prompt.
The difference becomes obvious the moment you compare outputs. Ask Claude without gstack to “review this authentication module” and you will get a blend of security considerations, naming conventions, test coverage gaps, and maybe a comment about whether a simpler implementation would serve users better. Invoke /code-review on the same file and the Engineering Manager gear focuses on architecture quality, consistency with established patterns, long-term maintainability, and specific code-level issues — nothing else. The signal-to-noise ratio is dramatically higher.
The following use cases show this difference in concrete terms, across the five workflow phases where gstack produces the most value.
Use Case 1: Feature Planning with the Founder Gear
The scenario that benefits most from structured AI assistance is also the one where developers most often skip structure entirely: early-stage feature planning. When you are excited about an idea, the instinct is to open your editor and start building. The cost of that instinct shows up three weeks later when you realize you built the wrong thing, or the right thing in the wrong order, or the right thing in a way that does not actually fit how users behave.
The /plan command activates the Founder gear, which approaches your feature description from a product strategy perspective. It asks the questions a disciplined founder would ask before writing a single line of code: What problem does this solve? Who has this problem? What does success look like? What are we not building?
Scenario: You are building a SaaS analytics dashboard and you want to add a CSV export feature. A user in a support ticket mentioned wanting to download their data, and you have been meaning to build it for two weeks.
Invocation:
/plan Add a CSV export feature so users can download their dashboard data
Representative Founder gear output:
## Feature Plan: CSV Data Export
### Problem Statement
Users need to take their analytics data outside the dashboard — for
offline analysis, sharing with stakeholders who lack access, or feeding
into external tools. The current closed-data experience creates friction
at the moment of highest user engagement (when they have found something
meaningful in the data).
### Scope (This Iteration)
IN SCOPE:
- Export current view's data (filtered, date-ranged) as CSV
- Email delivery for large exports (>10k rows)
- Export history in user settings (last 5 exports)
OUT OF SCOPE:
- Excel (.xlsx) format — prioritize after CSV adoption confirmed
- Scheduled/automated exports — separate feature, distinct user need
- API access to raw data — different persona, different workflow
### User Stories
- As an analyst, I want to export the last 30 days of event data so I
can build a custom cohort analysis in Google Sheets
- As a team lead, I want to share this week's metrics with my manager
without giving them dashboard access
### Success Criteria
- Export completes in <3s for datasets under 5,000 rows
- Email delivery triggers within 60s for large exports
- Zero data integrity issues (column headers match, no truncation)
### Key Risks
- Large tenant datasets could cause timeout or memory pressure — mitigate
with streaming CSV generation and async delivery for large payloads
- Feature scope creep toward Excel/scheduled exports — hold the line
### Implementation Sequence
1. Streaming CSV endpoint with query-param filtering
2. Frontend "Export" button in dashboard toolbar
3. Async job queue + email delivery for large exports
4. Export history in settings page
Notice what this output does not include: it does not write any code, it does not comment on database schema, and it does not mention deployment. The Founder gear stays in its lane. You get a product document — the kind of brief that you could share with a stakeholder or use to align a team — rather than a half-implemented feature.
This is the /plan command’s primary value: it forces a deliberate thinking step before the execution step, and it produces a portable artifact you can reference throughout the feature’s development lifecycle.
Use Case 2: Code Review with the Engineering Manager
Once implementation is done, most developers do a quick self-review and push. gstack gives you a second set of expert eyes before the code reaches a human reviewer — which means your PRs come in cleaner, review cycles are shorter, and you catch architectural issues before they get merged into the codebase.
The /code-review command activates the Engineering Manager gear: a persona that thinks about code the way a senior engineering leader does. It prioritizes long-term maintainability over short-term correctness. It looks for patterns that will scale and patterns that will not. It evaluates consistency with the existing codebase. It does not nitpick variable names (that is a linter’s job) and it does not drift into product strategy (that is the Founder gear’s job).
Scenario: You have implemented the CSV export feature from Use Case 1. You are about to open a pull request and you want a substantive review before sending it to a human teammate.
Invocation:
/code-review
After invoking the command, gstack’s Engineering Manager gear examines the code in your current working context — recent changes, open files, or a specific file path you point it to. A representative review might look like this:
Representative Engineering Manager gear output:
## Code Review: CSV Export Feature
### Architecture Assessment
The streaming CSV approach is correct for this use case. Using Node.js
streams avoids loading entire datasets into memory, which will matter
when enterprise tenants have 500k+ rows.
One concern: the export logic lives directly in the route handler
(routes/export.js). As export formats grow (Excel, JSON), this will
become hard to maintain. Recommend extracting an ExportService class
now, before the pattern sets.
### Specific Issues
**routes/export.js:47** — The `Promise.all` here runs all DB queries in
parallel without rate limiting. Under load, this could exhaust the
connection pool. Use `p-limit` or a sequential queue for the inner loop.
**services/csv.js:23** — `JSON.stringify(value)` is used to escape
field values, but this produces quoted strings for non-string types
(numbers become `"123"` instead of `123`). Use a CSV library like
`papaparse` or `fast-csv` for correct field escaping.
**lib/email.js:89** — The large-export email path has no retry logic.
If the email provider returns a 429 or 5xx, the job silently fails.
Add exponential backoff with a max of 3 retries.
### Missing Patterns
- No test coverage for the >5,000 row async path — this is the
riskiest code path and it has zero test cases
- Error response shape is inconsistent with the rest of the API
(other endpoints return `{ error: string }`, this returns `{ message: string }`)
- No rate limiting on the export endpoint — a single user could
trigger hundreds of expensive DB queries per minute
### What Is Working Well
- Streaming implementation is well-structured
- The export history service is cleanly separated from the main flow
- Column ordering matches the dashboard display order (good UX consideration)
### Recommended Before Merge
1. Extract ExportService (30 min refactor)
2. Fix CSV field escaping with a library
3. Add retry logic to email delivery
4. Add at least two integration tests for the async export path
This is the kind of review output that makes your human reviewer’s job easier. The architectural concerns are surfaced before they get discussed in a PR thread. The specific line references give you a clear action list. The “what is working well” section prevents the review from feeling purely critical.
The consistency benefit compounds over time. Every time you run /code-review, you get the same Engineering Manager lens applied to your code. Over months, this creates a measurable improvement in your baseline code quality — not because the AI is magic, but because consistent evaluation against consistent criteria produces consistent learning.
Use Case 3: QA Testing with the QA Engineer
The /qa command is the most technically sophisticated in gstack’s suite. It activates the QA Engineer gear, which thinks like a quality assurance professional whose entire job is to find things that break. Unlike /code-review, which operates on source code, /qa operates on your running application — it executes a test protocol against your live dev server and produces a structured quality report.
The command supports two modes that control the depth of testing:
/qa --quick runs a 30-second smoke test. It covers the critical paths — can the app load, do the primary user flows complete without errors, are there obvious broken states — without going into edge cases or visual inspection. Use this as a pre-commit sanity check or before a demo.
/qa --exhaustive runs the full QA protocol. In addition to functional tests, it covers visual consistency, accessibility issues, edge case inputs, error handling paths, and performance indicators. This mode can take several minutes on a complex application. Use it before a release or when you want a comprehensive quality baseline.
Scenario: You have finished the CSV export feature and your dev server is running at localhost:3000. You want a quality check before opening the pull request.
Invocation:
/qa --quick http://localhost:3000
What the QA Engineer gear does:
The command launches a headless browser session, navigates to your application, and executes the test protocol defined in gstack’s QA skill. It tests page load, navigation, primary interactions, console errors, and network failures. When the protocol completes, it calculates a health_score — an integer from 0 to 100 representing overall application quality — and saves a full Markdown report to .gstack/qa-reports/ in your project directory.
Representative QA report output (.gstack/qa-reports/2026-04-08-quick.md):
# QA Report — Quick Smoke Test
**Date:** 2026-04-08
**Target:** http://localhost:3000
**Mode:** --quick (30s)
**health_score: 74 / 100**
---
## Summary
The application loads and core navigation is functional. The CSV export
flow has a critical failure in the large-dataset path that prevents the
feature from working for enterprise-tier users. Two console errors are
present on initial load that suggest a race condition in data fetching.
---
## Test Results
| Test | Status | Notes |
|---|---|---|
| Page load < 3s | PASS | 1.2s on first load |
| Navigation (all primary routes) | PASS | All 6 routes accessible |
| Dashboard renders with data | PASS | — |
| CSV export < 5k rows | PASS | Completed in 2.1s |
| CSV export > 5k rows (async path) | FAIL | No feedback after submission; email not delivered in test |
| Login / logout flow | PASS | — |
| 404 page | PASS | Returns correct error UI |
| No console errors on load | FAIL | 2 errors: "Cannot read property 'data' of undefined" (line 247, dashboard.js) |
---
## Critical Issues
### [CRITICAL] Large-export async path — no user feedback
When submitting a CSV export request that triggers the async email flow,
the UI returns immediately with no confirmation message, loading state,
or error state. In the test environment, the email was not delivered
within the 60-second success criterion. Users have no way to know
whether their export is processing or has failed.
**Suggested fix:** Add a toast/notification after async export submission.
Add a "check your email" confirmation state. Verify the email job queue
is processing in the test environment.
### [MEDIUM] Console errors on dashboard load
Two "Cannot read property 'data' of undefined" errors appear in the
console during initial dashboard render. These suggest a race condition
between data fetching and component rendering. The UI appears correct
despite the errors, indicating the component recovers, but the errors
indicate fragile state handling that will cause failures under slower
network conditions.
---
## health_score Breakdown
| Category | Score | Weight |
|---|---|---|
| Functionality (core paths) | 80/100 | 50% |
| Error handling | 40/100 | 25% |
| Performance | 90/100 | 15% |
| Console cleanliness | 50/100 | 10% |
**Weighted total: 74/100**
---
## Recommended Actions Before Merge
1. Fix the async export UX — add submission feedback and verify email delivery
2. Resolve the console errors in dashboard.js around the data fetch timing
3. Re-run `/qa --quick` after fixes to confirm health_score > 85
The health_score is the most useful quick-reference indicator. A score above 90 suggests the feature is ready to review. A score between 70 and 90, like the 74 in this example, means there are real issues that should be fixed before shipping but the feature is not broken. A score below 70 indicates significant problems that warrant stopping and debugging before continuing.
The report is saved to .gstack/qa-reports/ as a permanent artifact. This gives you a historical record of quality at each point in the development cycle, which is valuable both for your own reference and as documentation of due diligence if a bug report comes in after shipping.
After fixing the issues identified in this quick report, run /qa --exhaustive before the final PR review for a full quality picture — including edge cases and visual consistency that the 30-second smoke test does not cover.
Use Case 4: Shipping a Release
The /ship command activates the Engineering Manager gear for a pre-deployment verification checklist. It is designed to be the final gate before a feature goes to production — a structured confirmation that the implementation is complete, the obvious risks are addressed, and the deployment will not introduce regressions.
Scenario: The CSV export feature has passed /qa --exhaustive with a health score of 91, the PR has been approved, and you are preparing to deploy to production.
Invocation:
/ship
Representative Engineering Manager gear output:
## Pre-Ship Checklist — CSV Export Feature
### Code Completeness
✅ Feature implementation merged to main
✅ All PR review comments resolved
✅ No open TODOs or FIXME markers in changed files
⚠️ One FIXME in lib/email.js:103 — "handle rate limit retry" — this was
identified in code review. Confirm this is addressed before deploying.
### Quality Gates
✅ QA health_score: 91/100 (exhaustive mode, 2026-04-08)
✅ No critical issues in last QA report
✅ Test suite passing (confirm: run `npm test` now if not done in last hour)
### Risk Assessment
MEDIUM: The async email delivery path is new infrastructure. Monitor
the job queue depth and email delivery rate for the first 2 hours
post-deploy. Set up an alert if job queue depth exceeds 50.
LOW: CSV streaming has not been tested at the actual production data
volumes of your largest tenants. Consider a synthetic load test before
enabling for enterprise tier, or enable for non-enterprise first with
a feature flag.
### Rollback Plan
The export endpoint is additive (new route, no schema changes). Rollback
is safe: revert the deployment and no data loss occurs. The export
history table was added via a reversible migration.
### Deployment Sequence
1. Deploy backend (export endpoint, job queue, email service)
2. Wait 2 minutes — confirm job queue is processing (check dashboard)
3. Deploy frontend (export button, confirmation UI)
4. Test the full export flow manually in production with a small dataset
5. Enable for all users (remove feature flag if used)
### Post-Deploy Monitoring (First 2 Hours)
- Watch: job queue depth (alert > 50)
- Watch: export endpoint error rate (alert > 1%)
- Watch: email delivery rate (alert < 95%)
- Check: no spike in support tickets mentioning "export"
### Verdict
🟡 CONDITIONAL GO — resolve the FIXME in lib/email.js:103 before deploying.
Once confirmed, this feature is ready to ship.
The /ship command’s value is in its systematic nature. Under deadline pressure, developers skip checklist items. The Engineering Manager gear has no deadline pressure — it will always check the same things in the same order and flag the same categories of risk. The FIXME callout in this example is exactly the kind of thing that gets overlooked in the final push to ship and surfaces as a production incident three days later.
The “Verdict” line is particularly useful. A green GO means the checklist is clear. A yellow CONDITIONAL GO means there are specific items to address before deploying. A red NO-GO means the feature has issues that make deployment risky regardless of timeline pressure.
Use Case 5: Browser Automation for Dev Tasks
The /browser command activates the QA Engineer gear with browser control capabilities. Unlike /qa, which runs a predefined protocol, /browser is an interactive command that you direct with specific tasks. It is useful for a wide range of development scenarios that require a running browser: reproducing user-reported bugs, verifying a UI fix in a specific browser state, checking that a third-party integration works end to end, or automating a repetitive manual testing flow.
Scenario 1 — Bug reproduction: A user reports that the CSV export button is not visible when the dashboard is in “compact mode.” You want to reproduce the issue without manually clicking through the UI.
Invocation:
/browser Navigate to http://localhost:3000/dashboard, enable compact mode via Settings > Display > Compact, then take a screenshot and check if the Export button is visible in the toolbar.
The QA Engineer gear launches a browser session, executes the described navigation sequence, captures a screenshot, and reports what it observed. You get a reproduction confirmation or a “could not reproduce” finding, both of which are useful — the former gives you a repro environment to debug, the latter tells you to investigate the specific browser or user state that triggered the bug.
Scenario 2 — Integration verification: You have just wired up a new Stripe webhook for subscription events. You want to confirm the full flow works before shipping.
Invocation:
/browser Go to http://localhost:3000/billing/upgrade, complete a test checkout using Stripe test card 4242 4242 4242 4242, then verify the account dashboard shows "Pro" plan status within 30 seconds.
Scenario 3 — Accessibility spot-check: Before shipping a new page, run a quick keyboard navigation check.
Invocation:
/browser Navigate to http://localhost:3000/export-history and test full keyboard navigation — confirm all interactive elements are reachable via Tab, all buttons are activatable via Enter/Space, and no focus traps exist.
The /browser command’s power comes from its open-ended instruction format. You describe what you want to verify in natural language, and the QA Engineer gear translates that into a real browser session. This is particularly valuable for scenarios that are time-consuming to set up manually — complex user flows, specific authentication states, or interactions that require multiple steps to reach.
For deeper explorations of what autonomous browser agents can do in development contexts, see our article on AutoGPT use cases, which covers browser control as part of a fully autonomous agent workflow.
Combining Commands in a Full Development Workflow
The real power of gstack is not any individual command — it is the way the commands chain together into a structured workflow that mirrors how disciplined engineering teams actually operate. Each gear is designed to hand off cleanly to the next, with each phase producing an artifact that informs the next phase.
Here is a complete feature development cycle using the full gstack command sequence:
Phase 1 — Planning (Founder Gear)
/plan Add two-factor authentication support for all user accounts
Output: A product brief with scope, user stories, success criteria, and risk assessment. This document becomes the reference point for all subsequent phases.
Phase 2 — Design Review (Founder + Engineering Manager Gears)
/design-review
After sketching the implementation approach — what library to use, where to hook into the auth flow, how to handle recovery codes — invoke /design-review with your approach. The combined Founder + Engineering Manager perspective will flag both product-level concerns (are we over-complicating the UX for the user value delivered?) and technical concerns (is this library actively maintained, does it support the auth providers you use?).
Phase 3 — Implementation
Write the code. gstack does not interfere with implementation — this is the phase where you are driving, using Claude Code’s standard capabilities alongside your own engineering judgment.
Phase 4 — Code Review (Engineering Manager Gear)
/code-review
Before opening a PR, run the Engineering Manager review on your implementation. Fix the issues it surfaces. The review’s action items become your pre-PR checklist. This saves your human reviewer time and ensures the PR arrives in a state where the high-value discussion is about architecture and trade-offs, not obvious bugs and style inconsistencies.
Phase 5 — QA (QA Engineer Gear)
/qa --quick
Run the smoke test first. If the health_score is above 85 and there are no critical issues, proceed to exhaustive mode:
/qa --exhaustive
Fix any critical or medium issues before moving to ship. Document the final health_score in the PR description — it gives reviewers a quantified quality signal.
Phase 6 — Ship (Engineering Manager Gear)
/ship
Run the pre-deployment checklist. Address any CONDITIONAL GO items. When the verdict is green, deploy.
Phase 7 — Post-Ship Documentation (Engineering Manager Gear)
/post-ship-docs
After a successful deployment, generate documentation from the code and commit history. The Engineering Manager gear produces a technical changelog, updated API documentation if applicable, and notes on architectural decisions made during the feature cycle. This is the phase most developers skip — and the /post-ship-docs command makes the cost of skipping it zero.
Phase 8 — Retrospective (Founder Gear)
/retrospective
After the feature has been live for a week, run the Founder gear’s retrospective command. It will prompt for data on what worked, what did not, what the health_score trend looked like, and what you would do differently. The output is a structured retrospective document you can use to improve your process on the next cycle.
This eight-phase sequence represents gstack’s full value proposition: not a collection of useful individual tools, but a structured workflow system that uses gear-switching to bring the right perspective to each phase of a software project. For comparison, see how CrewAI’s flows and pipelines approach structured multi-agent coordination at the framework level — a different architecture for a similar goal of structured, role-aware AI collaboration.
Frequently Asked Questions
Can I use gstack commands in any order?
Yes — the commands are independent and there is no enforced sequence. You can invoke /qa without having run /plan, and you can run /code-review multiple times throughout development rather than only at the end. The suggested sequence in this article reflects a workflow that produces consistent, high-quality results, but the system is flexible by design. Many developers use only a subset of commands — for example, just /code-review and /qa — and get significant value from that targeted application.
The one sequence that is worth preserving is /qa before /ship. The pre-deployment checklist in /ship expects that you have recent QA data — it will reference the health_score from your last report and flag if the report is stale. Running the commands out of order does not break anything, but running /ship without a recent /qa report means the checklist is working with less information than it was designed to use.
How does gstack handle projects with multiple team members?
gstack is a personal tool that runs inside each developer’s Claude Code session — there is no shared server, no central configuration, and no team-wide state. For team alignment, the recommended approach is the project-level install described in the installation guide. When everyone on the team uses the same project-level gstack install, they are all running identical command definitions, which means /code-review applies the same Engineering Manager criteria for every reviewer on the team.
The .gstack/qa-reports/ directory, however, is worth committing to source control if you want a shared quality audit trail. Each developer’s QA report runs are saved there, and committing those reports gives the team a chronological record of quality checks across the feature’s development. This is optional but useful on teams that want traceable quality documentation.
Does using gstack cost more Claude API tokens?
Each gstack command invocation uses the system prompt context of the selected gear persona on top of the content you are analyzing. This adds a small fixed overhead per command — roughly equivalent to a few hundred tokens per invocation — but because the gear’s focused output is typically shorter and more targeted than a general-purpose response, the total token cost per useful insight is often lower than ad-hoc prompting.
The /qa and /browser commands that involve browser automation do consume more tokens due to the multi-step nature of the session — each browser action involves a round-trip exchange. For most developers, this remains within normal Claude Code usage patterns. If token cost is a concern, use /qa --quick for routine checks and reserve /qa --exhaustive for pre-release runs.
Can I create my own custom workflow commands?
Yes, and this is one of gstack’s most powerful features. Each slash command is defined by a Markdown file in ~/.claude/skills/gstack/skills/. You can create new files in that directory to define custom commands, and they will appear in the slash command palette alongside the built-in 13 commands.
For example, to create a custom /security-review command, create a file at ~/.claude/skills/gstack/skills/security-review.md with a system prompt that defines the security reviewer persona and the evaluation criteria you care about. The file format mirrors the existing command files, which you can inspect as templates.
Custom commands can use any of the three gear personas as a base or define entirely new behavioral contexts. Teams that have well-established review criteria — specific security requirements, compliance checks, or domain-specific quality standards — often find that encoding those criteria as a gstack command produces more consistent results than relying on each reviewer to remember the full checklist independently.
Next Steps
You have now seen gstack applied across the full development lifecycle: planning with the Founder gear, reviewing with the Engineering Manager, testing with the QA Engineer, and shipping with the final pre-deployment checklist. The most immediate action is to run these commands on a real feature you are currently working on, rather than reading about them further.
Start with /qa --quick on your current dev server. The health_score and report give you an immediate baseline, and seeing the QA Engineer gear identify real issues in your actual codebase makes the value of the gear system concrete in a way that no documentation can match.
If you have not yet installed gstack, the installation guide covers the complete setup in under five minutes. And if you want to explore how other AI agent frameworks approach structured multi-step workflows from a different architectural angle, our guide to CrewAI flows and pipelines is a useful companion read — it shows how role-based agent coordination works at the framework level when you need to build your own structured pipelines rather than using a pre-built command system like gstack.