The vibe is the problem. Not the coding.

StackRanked · 2026-04-05 vibe coding spec before coding AI product management structured thinking why vibe coding fails

The vibe is the problem. Not the coding.

Mo Bitar spent two years vibe coding, accumulated months of agent-generated commits he was proud of, then read the whole codebase cover to cover and called it slop. He's back to writing by hand.

His Hacker News post hit 865 points and 634 comments. A parallel thread — "Do you have any evidence that agentic coding works?" — hit 461 points and 455 more. In nearly 1,900 combined comments, the same diagnosis surfaces again and again:

The AI gives you exactly what you asked for. The problem is what you asked for was fuzzy.

"Agents write units of changes that look good in isolation," Bitar wrote. "But respect for the whole, there is not."

That's not an AI problem. That's a spec problem.


The real failure mode

Here's what the HN threads keep describing but never naming cleanly:

You have an idea in your head. It's detailed enough to feel real. You start generating. The AI produces something impressive. You generate more. Three weeks later the codebase is a mess of plausible-looking code that no one — including the AI — can navigate cleanly.

The failure wasn't at generation. It was at the moment you opened the chat window without first answering the questions that define whether an idea is real.

Experienced engineers in those threads had good results for one consistent reason: they understood their domain well enough to give the AI a real constraint. The people drowning had one thing in common: they were using generation to figure out what they actually wanted.

That's using AI as a thinking tool when you haven't done the thinking yet.


What you actually need to answer before generating

This isn't a PRD. PRDs are documents. This is 30 minutes of answering uncomfortable questions — on paper, in a doc, out loud to yourself — before touching a chat interface.

1. What does success look like, specifically?

Not "a feature that lets users do X." A concrete outcome. "User can do X in under 3 clicks. X does not require a support email. X works on mobile without a different flow." If you can't describe the surface area of success, the AI will make it up.

2. What are you explicitly not building?

The AI will scope-creep in all directions unless you tell it what's out of scope. This is the fastest thinking you'll ever do. It costs nothing to write "out of scope: role permissions, audit logging, undo functionality." It costs weeks if you don't.

3. What already exists that this has to work with?

Not your whole tech stack. The specific parts this feature touches. What are the data models? What are the existing patterns you want this to follow? What are the things you've already built that set a precedent?

The best AI coding results in those HN threads came from one setup consistently: an existing well-structured codebase with clear patterns to follow. The AI follows examples extremely well. It invents architecture extremely poorly.

4. What are the failure modes you're worried about?

Not a full risk analysis. Just: what keeps you up about this? Where does this break badly if the AI makes a dumb assumption? Write it down. If you can name it, you can constrain it. If you can't name it, that's your signal you don't understand the problem yet.

5. What's the smallest working slice?

This one does more work than any other. "Implement the messaging feature" is a death sentence. "Allow one user to send a plain text message to another user, no attachments, no threading, no read receipts" is a building block. The AI can build building blocks. It cannot build architectures.


Why this isn't "just write a better prompt"

"My prompt sucked. It was under-specified." That's how Bitar describes the first instinct when generation goes wrong. And it leads people to write longer and longer prompts — full spec docs, multi-page requirements, detailed descriptions of edge cases.

Those fail too.

The problem isn't the length. It's that writing a long prompt is still using the AI session to do the thinking. You're discovering your requirements in real time, in the chat window, with something that will confidently execute on every ambiguity.

The five questions above are not a prompt. They're something you complete before you touch the AI. The output is not a document you hand to the AI. It's clarity you've achieved for yourself — which then makes every interaction with the AI faster, tighter, and more correctable.

When you know what success looks like, you can tell when the AI is diverging. When you've named the failure modes, you can catch them in review. When you've defined the smallest slice, the AI stays on the rails.


The people getting good results

Across all three HN threads, the engineers reporting consistent wins shared a profile:

  • They understand the domain deeply enough to know when output is wrong
  • They give the AI a small, well-scoped task against an existing codebase with patterns to follow
  • They review every diff against what they already knew they wanted

None of them were letting AI figure out the design. They were using AI to execute against a design they'd already worked out in their heads — or on paper.

"When I know exactly what I want, I can finally stay focused on the problem I'm trying to solve," as one commenter put it.

That's the unlock. Not better prompts. Not better models. Knowing exactly what you want before you start generating.


What StackRanked does with this

The reason most builders skip the pre-work isn't laziness. It's that there's no good forcing function. There's no place where you write down what you actually know about the problem before you build it.

StackRanked is that forcing function. Before you build anything, you spec it: what's the outcome, what's out of scope, what does the data model look like, what's the minimum viable slice. The spec lives in the platform. The AI doesn't touch it until you've signed off on it.

That's not waterfall. That's the 30 minutes of thinking that prevents three weeks of slop.

Start for free at stackranked.ai →


Three threads. Nearly 2,000 comments. Same failure mode every time. The problem isn't the coding. It's everything that happens before you generate the first line.