What Is MetaGPT? A Multi-Agent Software Company in Code

Q: Can I run MetaGPT with a local LLM?

Yes. Configure apitype: "ollama" and a local model in config2.yaml. The quality will be noticeably lower than GPT-4o, but works for simple projects and is completely free. Qwen2.5-Coder models are the recommended local option.

What Is MetaGPT?

MetaGPT is an open-source multi-agent framework with a unique twist: it models a software company, not just an AI assistant. Each agent plays a specific corporate role — Product Manager, Architect, Engineer, QA Engineer — and they collaborate via a shared message board to build software from a single natural-language requirement.

You give MetaGPT one sentence: "Create a snake game in Python". It outputs:

A PRD (Product Requirements Document)
A system design document with architecture decisions
API specifications
Fully working Python code
Unit tests
A test report

All generated sequentially, by different agents, each checking the previous agent’s work.

MetaGPT was first published in a paper by Sirui Hong et al. in 2023. The repository reached 30,000+ GitHub stars within months, making it one of the most-watched AI agent projects ever. The framework has since evolved significantly and is actively maintained.

The Software Company Metaphor

The key insight behind MetaGPT is that human teams use Standard Operating Procedures (SOPs). A product manager doesn’t just hand a vague idea to engineers — there’s a requirements document, design review, code review, and QA process. These SOPs exist because they prevent errors and miscommunication.

MetaGPT encodes the same SOPs for AI agents:

User Requirement
     ↓
Product Manager → PRD (requirements)
     ↓
Architect → System Design + API specs
     ↓
Engineer(s) → Code implementation
     ↓
QA Engineer → Unit tests + test report

Each role passes structured, typed outputs to the next — not raw text. This structured communication is what makes MetaGPT more reliable than “have one agent do everything.”

Built-in Roles

Product Manager

Turns a one-line requirement into a detailed PRD with:

User stories
Competitive analysis
Feature requirements
UI/UX notes

Architect

Reads the PRD and produces:

System design (components, data flow)
Technology stack recommendations
API interface definitions

Engineer

Reads the design documents and writes:

Implementation code (Python, JavaScript, or other)
Code that matches the API specs exactly

QA Engineer

Writes unit tests for the code and runs them, producing a test report.

Installation

Requirements: Python 3.9+, Node.js 16+

pip install metagpt

Initialize the configuration file:

metagpt --init-config

This creates ~/.metagpt/config2.yaml. Edit it to add your API key:

llm:
  api_type: "openai"
  model: "gpt-4o-mini"
  api_key: "sk-your-key-here"

Or for Anthropic Claude:

llm:
  api_type: "anthropic"
  model: "claude-sonnet-4-6-20250514"
  api_key: "sk-ant-your-key-here"

Running Your First Project

Command Line

metagpt "Create a CLI tool that converts Markdown files to HTML"

MetaGPT will create a workspace directory with all the generated files:

workspace/
  cli_markdown_converter/
    docs/
      prd.md
      system_design.md
      api_spec.md
    src/
      converter.py
      utils.py
    tests/
      test_converter.py
    README.md

Python API

import asyncio
from metagpt.software_company import generate_repo, ProjectRepo

async def main():
    repo: ProjectRepo = await generate_repo(
        "Create a REST API with FastAPI for a todo list app"
    )
    print(repo)  # prints the directory structure and files

asyncio.run(main())

Using Individual Roles

You can use MetaGPT roles individually instead of running the full company:

from metagpt.roles import ProductManager, Engineer, QaEngineer
from metagpt.context import Context
import asyncio

async def main():
    context = Context()

    pm = ProductManager(context=context)
    engineer = Engineer(context=context)
    qa = QaEngineer(context=context)

    # Generate PRD
    prd = await pm.run("Create a URL shortener service")
    print("PRD generated")

    # Generate code from PRD
    code = await engineer.run(prd)
    print("Code generated")

    # Generate tests
    test_report = await qa.run(code)
    print("Tests run")

asyncio.run(main())

Creating Custom Roles

MetaGPT’s real power is extensibility — you can define custom roles for your domain:

from metagpt.roles.role import Role
from metagpt.actions import Action

class WriteAPIDoc(Action):
    name: str = "WriteAPIDoc"

    async def run(self, code: str) -> str:
        prompt = f"Write API documentation in OpenAPI 3.0 YAML format for this code:\n\n{code}"
        return await self._aask(prompt)

class APIDocWriter(Role):
    name: str = "APIDocWriter"
    profile: str = "API Documentation Specialist"

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.set_actions([WriteAPIDoc])

    async def _act(self) -> str:
        code = self.get_memories()[0].content
        return await self.rc.todo.run(code)

MetaGPT vs Other Frameworks

Feature	MetaGPT	CrewAI	AutoGen
Primary focus	Software development	General workflows	Conversational multi-agent
Agent coordination	SOP-based (structured)	Task-based	Conversation-based
Output type	Code + docs + tests	Any task output	Text + code
Best for	Full software projects	Business workflows	Research, Q&A
Setup complexity	Medium	Low	Medium

MetaGPT shines when you want to generate complete, structured software artifacts — not just code snippets. For general task automation, CrewAI or LangChain agents are simpler.

Strengths and Limitations

Strengths:

Produces structured, professional-quality documents alongside code
Role separation prevents one agent from becoming a bottleneck
Strong for greenfield projects where you have a clear requirement
The SOP model catches more logical errors than single-agent approaches

Limitations:

Heavier setup than single-agent tools
Long generation time (5–15 minutes for a full project)
LLM quality greatly affects output — GPT-4o produces significantly better results than GPT-4o-mini
Struggles with brownfield (existing) codebases — designed for new projects

Frequently Asked Questions

How does MetaGPT differ from simply asking ChatGPT to write code?

ChatGPT writes code in one shot with no structured process. MetaGPT runs a multi-stage pipeline: requirements → design → code → tests. Each stage involves different “specialists” reviewing the previous stage’s output. This catches more errors and produces more complete, documented projects. The output quality difference is significant for anything more complex than a single script.

What language does MetaGPT generate code in?

Primarily Python, but it can generate JavaScript, TypeScript, Go, and others depending on what you specify in the requirement. Add “in TypeScript using React” or “in Go” to your requirement string.

Does MetaGPT run the generated code?

The QA Engineer role runs the generated tests. The framework doesn’t run the application itself — that’s up to you. However, since the code and tests are generated together, passing tests are a strong quality signal.

How much does a typical MetaGPT run cost?

With GPT-4o: a simple project (200–500 lines of code) costs roughly $0.50–2.00 in API tokens. Complex projects can cost $3–10. Using gpt-4o-mini reduces costs by ~10x but significantly reduces quality on architectural decisions.

Can I run MetaGPT with a local LLM?

Yes. Configure api_type: "ollama" and a local model in config2.yaml. The quality will be noticeably lower than GPT-4o, but works for simple projects and is completely free. Qwen2.5-Coder models are the recommended local option.

Next Steps

Getting Started with CrewAI — A more flexible multi-agent framework for general workflows
What is OpenDevin — Another AI software engineer, but interactive
LangChain vs AutoGen — Compare agent orchestration approaches