As more companies adopt vibe coding – the use of artificial intelligence (AI) to take natural-language instructions and turn them into software — managers must make sure their solutions don’t contain hidden troubles.
One of vibe coding’s tenets is to unhesitatingly trust AI. Often, vibe coders will simply accept AI’s recommendations without probing to understand why they’re made and how they’re implemented. If a suggested change doesn’t work, they presume AI will fix it in the next version. In return for simplicity and speed, they’re willing to make trade-offs in code quality or various production details. As one developer put it, vibe coding is about moving fast “and fixing things on the fly.”
Vibe coding’s potential lies in its ability to help non-developers build software. Citizen developers, the argument goes, could combine their subject-matter expertise in areas such as HR or talent acquisition to create focused solutions through a simpler, more timely process than those usually followed by IT.
AI is a cornerstone of this idea. It allows developers to describe their planned solutions in plain language, then produces the code needed to turn plans into product. In practice, this means relying on AI to map out the software, make changes when they’re needed and, in general, allow the project to proceed without the usual time- and labor-intensive code reviews. By 2028, Gartner predicts, 40% of business software will be created with an AI system acting on plain-language instructions.
The problem is that AI makes mistakes, sometimes critical mistakes. AI is still young as a technology, and the kinks haven’t all been worked out. In fact, as AI solutions become more complex, their error rates are increasing. Benchmark tests by OpenAI found that its model o3 hallucinated a full third of the time when asked about public figures and 51% of the time when answering simple, factual questions.
How Much Should You Trust Vibe Coding?
That tendency toward error affects how well the software works. For example, the AI agent of the Foster City, Calif., coding platform Replit recently deleted a user’s production database against specific instructions.
In the course of trying out Replit’s capabilities, venture capitalist Jason Lemkin found the AI agent ignored explicit code and action freezes intended to prevent this kind of error. Fortunately, despite the agent’s claims Lemkin’s data couldn’t be recovered, in fact it was.
When Lemkin asked why, the AI said it “panicked,” ran unauthorized SQL commands and dropped tables containing more than a thousand records for business executives and companies. Then it fabricated more than 4,000 user accounts with fake data and falsified certain test outputs. In this case, Replit’s AI misread empty queries — basically, instructions that don’t include defining parameters — as permission to take some kind of action, then attempted to hide its mistakes.
Developers and technology leaders say the episode highlights governance issues that surround AI and shows it needs to meet basic requirements, like separate development and production environments, automated backup systems and detailed documentation.
Coding on the Fly
The episode illustrates how AI agents aren’t completely trustworthy, and raises the question: How much can we depend on them to build and maintain software without constant human oversight? It’s an important question as businesses race to implement AI agents and use AI to develop them.
“The incident reveals the profound danger of deploying powerful tools without a deep, nuanced understanding of their limitations,” noted Baytech Consulting. “The user continued to trust the AI agent with production access despite its documented history of fabricating data, ignoring instructions and overwriting code. This is symptomatic of a broader risk with AI coding assistants.”
What happened with Replit isn’t an isolated incident. Recently, Google’s CLI development tool reportedly destroyed a variety of files while attempting to reorganize them. According to media reports, when the tool failed in its attempt to create a new directory, it moved files into the nonexistent directory anyway.
Because AI often generates code without being stress-tested, it leaves applications vulnerable to bad data, security risks or missteps in compliance, software developers say. In addition, they point out, AI models do their work based on training data, which can contain bias or factual errors.
When It Comes to AI-Written Software, Test, Test, Test
All of this shows how AI’s capabilities have outpaced safeguards meant to monitor the accuracy and quality of what it produces. “Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts,” researchers led by the University of Montreal’s Yoshua Bengio wrote last year in Science.
Incidents like Replit’s can’t be attributed to software bugs alone. People played a role by assuming the AI would behave as expected (for example, by following instructions to freeze code). The fact it generated misinformation and attempted to hide mistakes demonstrates that, to a great degree, we don’t know what we don’t know about AI.
The only response to that is caution. Any app created with vibe coding must be thoroughly tested before they’re introduced, and constantly monitored and checked once in production, experts say. “Deploying an AI model is not a one-and-done activity,” wrote security specialist Muhammad S. Khan on LinkedIn. “Continuous monitoring closes the loop of the AI lifecycle, feeding real-world results back into model improvement or risk mitigation. Auditors should verify that once the AI is in production, it doesn’t run on autopilot without appropriate checks.”
Editor's Note: Read more cautionary tales from the world of GenAI below:
- Is Generative AI Actually Freeing Workers From Low Level Work? — Take AI evangelists' promises with a grain of salt. But that doesn't mean AI success is out of reach, just that you might have to switch goals.
- The Fake Startup That Exposed the Real Limits of Autonomous Workers — A Carnegie Mellon study confirmed what many suspected: in spite of the promises of world-dominating results, agentic AI isn’t ready to run the ship.
- Poor AI Planning, Not Technology, Is Costing Jobs — As AI adoption accelerates, poor strategy, not the technology itself, is driving job losses, skill gaps and cultural breakdown.