The Problem with AI in 2025
Everyone agrees that AI tools have gotten genuinely impressive. You can ask Claude to write a technical design doc and it'll produce something better than most humans could in the same time. You can ask Copilot to complete a function and it'll nail the implementation 80% of the time. You can spin up a Cursor session and move fast on a new codebase.
But all of that is still you, running the tools.
Each AI interaction is a one-shot conversation. You describe the problem. You get an answer. You take it somewhere else, apply some judgment, do the next thing. The context doesn't flow with you. The AI doesn't know what you worked on yesterday, what's blocked, what's urgent, or what decision you made in that meeting last Thursday that changes everything about the current approach.
You're not gaining real power over your workload. You're just getting a faster typewriter.
The real bottleneck isn't model capability. It's coordination. No single AI tool can do your whole job, and they're not designed to talk to each other.
Think about what it actually takes to ship a feature: someone has to design it, spec it, implement it, test it, document it, review it, and coordinate the handoffs between all of those. A single ChatGPT window can help you with one of those things at a time. But it has no memory of the others. It's not tracking what's blocked. It's not nudging the reviewer. It's not updating the doc when the spec changes.
You're still doing all of that. The tools just make each individual step a little faster.
What We Actually Want
What we want (what we think most builders actually want) is a team.
Not a team of people to manage (that's expensive and complicated in its own ways), but a team of specialized agents that handle different parts of the work, coordinate with each other, and actually know the context of your project without you having to re-explain it every time.
Imagine having:
- A programmer that can write and iterate on real code, run tests, and open PRs
- A researcher that can dig into a technical question, synthesize findings, and file them in a shared knowledge base
- A writer that keeps your docs current, drafts changelogs, and can produce a blog post from raw notes
- A reviewer that reads your PRs with genuine critical judgment, not just "LGTM"
- An orchestrator that knows your projects, your priorities, and which agent should be working on what, right now
That's a workforce. Not a single AI assistant you talk at, but a system that works while you're doing other things.
We're Not Describing a Fantasy. We're Running It.
Here's the part that matters: PAW isn't a product roadmap. It's a system we've been running on ourselves for months.
The orchestrator is real. The workers are real. Right now, as you read this, there are programmer agents, reviewer agents, and researcher agents running against real tasks in our own project queues. They've written code that shipped. They've reviewed PRs that caught real bugs. They've produced research that changed how we think about architecture decisions.
We're not pitching you on a demo. We built this because we needed it, and then we kept improving it because it worked.
PAW started as a personal project. One of us built it for ourselves to actually practice the AI-native development workflow we kept reading about but couldn't find implemented anywhere sensibly. The first version was rough. It had a lot of hard edges, a lot of manual wiring, a lot of "well if you know what you're doing this is powerful." That's how most good tools start. We happened to be at the University of Michigan when this happened, but it was built because we needed it, not because the university commissioned it or directed the work.
We've been filing down those edges ever since. And now we think it's ready for other people to use.
What Makes PAW Different
There are a lot of "AI agent" products appearing right now. Most of them are thin wrappers around a model with some tool-calling bolted on. They're demos, not infrastructure.
PAW is different in a few specific ways:
Real multi-agent orchestration
PAW's orchestrator doesn't just hand a task to one agent and wait. It manages a queue of work across multiple specialized workers, routes tasks to the right agent based on type and priority, enforces dependencies between tasks, and handles quality gates. A reviewer has to approve before code is considered done.
Model routing
Not every task needs the same model. A quick doc update doesn't need to go to the most expensive, highest-capability model. PAW routes tasks to the appropriate model based on complexity, cost, and latency requirements. This matters for both quality and economics when you're running agents continuously.
Worker lifecycle management
Agents spin up, do work, and spin down. They report status. They can get blocked and escalate. They handle retries and failures gracefully. This sounds mundane until you've watched a naive "agent loop" spiral into an infinite retry loop at 3am because one API call failed. Worker lifecycle is infrastructure, and we treat it that way.
Persistent memory and context
Agents share a knowledge base. When the researcher produces findings, the programmer can access them. When the reviewer leaves feedback, the programmer learns from it across tasks, not just the current one. Context doesn't disappear between sessions.
Task management that's actually integrated
Tasks aren't just prompts. They have state (queued, in_progress, blocked, done), they have dependencies, they have assignment to specific agent types, and they feed back into the orchestrator's scheduling decisions. You can see the queue, intervene, reprioritize. It's transparent.
Where We're Headed
The current version of PAW runs on your own infrastructure: a Mac mini, a Linux box, a cloud VM. You bring the compute and the API keys; PAW provides the orchestration layer and the agent library. That's the right model for builders who want control and don't want their work flowing through someone else's servers.
But we know that's not the right model for everyone. So the roadmap has two tracks:
- Docker-native self-hosting: a clean container deployment that makes it genuinely easy to run PAW on any machine, with proper volume mounts, health checks, and upgrade paths. No more "well you have to understand the internals to run it."
- A managed platform: for people who want the workforce without the infrastructure. We run the orchestrator and workers; you connect your repos, define your projects, and let PAW get to work.
Beyond deployment, we're investing in the agent ecosystem itself. More specialized workers. Better quality gates. A community of builders who can contribute agent skills and share what works.
The vision here isn't "better chatbot." It's that within a few years, every serious builder will have a crew of AI agents working alongside them, not as novelty, but as infrastructure as normal as CI/CD. We want to be the platform that makes that real.
A Note on How We Think About This
We're not trying to replace human judgment. The orchestrator doesn't make architectural decisions; it executes them. The programmer agent doesn't decide what to build; it implements what's been planned. The reviewer doesn't have final say; its feedback goes to a human before anything merges.
PAW is force multiplication, not replacement. It takes the work that's well-defined enough to delegate and actually delegates it. Competently, persistently, and without losing context. That frees up the human in the loop for the work that actually requires human judgment: setting direction, making tradeoffs, deciding what matters.
That's the product we're building. And we're already using it to build it.
If that sounds like something you want to try, request early access. We're onboarding builders in small batches and we want to hear what you're working on.