For the previous decade and a half, I’ve been exploring the intersection of expertise, schooling, and design as a professor of cognitive science and design at UC San Diego. Some of you might need learn my current piece for O’Reilly Radar the place I detailed my journey including AI chat capabilities to Python Tutor, the free visualization instrument that’s helped hundreds of thousands of programming college students perceive how code executes. That expertise bought me occupied with my evolving relationship with generative AI as each a instrument and a collaborator.
I’ve been intrigued by this rising apply referred to as “vibe coding,” a time period coined by Andrej Karpathy that’s been making waves in tech circles. Simon Willison describes it completely: “When I talk about vibe coding I mean building software with an LLM without reviewing the code it writes.” The idea is each liberating and barely terrifying—you describe what you want, the AI generates the code, and also you merely run it with out scrutinizing every line, trusting the general “vibe” of what’s been created.
My relationship with this strategy has developed significantly. In my early days of utilizing AI coding assistants, I used to be that one who meticulously reviewed each single line, usually rewriting vital parts. But as these instruments have improved, I’ve discovered myself step by step letting go of the steering wheel in sure contexts. Yet I couldn’t totally embrace the pure “vibe coding” philosophy; the professor in me wanted some high quality assurance. This led me to develop what I’ve come to name “vibe checks”—strategic verification factors that present confidence with out reverting to line-by-line code evaluations. It’s a center path that’s labored surprisingly effectively for my private initiatives, and right now I need to share some insights from that journey.
Vibe Coding in Practice: Converting 250 HTML Files to Markdown
I’ve discovered myself more and more turning to vibe coding for these one-off scripts that remedy particular issues in my workflow. These are sometimes duties the place explaining my intent is definitely simpler than writing the code myself, particularly for knowledge processing or file manipulation jobs the place I can simply confirm the outcomes.
Let me stroll you thru a current instance that completely illustrates this strategy. For a category I train, I had college students submit responses to a survey utilizing a proprietary internet app that offered an HTML export possibility. This left me with 250 HTML information containing precious pupil suggestions, nevertheless it was buried in a large number of pointless markup and styling code. What I actually needed was clear Markdown variations that preserved simply the textual content content material, part headers, and—critically—any hyperlinks college students had included of their responses.
Rather than penning this conversion script myself, I turned to Claude with an easy request: “Write me a Python script that converts these HTML files to Markdown, preserving text, basic formatting, and hyperlinks.” Claude steered utilizing the LovelySoup library (a strong selection) and generated a whole script that will course of all information in a listing, making a corresponding Markdown file for every HTML supply.
(In retrospect, I noticed I in all probability might have used Pandoc for this conversion activity. But within the spirit of vibe coding, I simply went with Claude’s suggestion with out overthinking it. Part of the attraction of vibe coding is bypassing that analysis section the place you evaluate completely different approaches—you simply describe what you need and roll with what you get.)
True to the vibe coding philosophy, I didn’t assessment the generated code line by line. I merely saved it as a Python file, ran it on my listing of 250 HTML information, and waited to see what occurred. This “run and see” strategy is what makes vibe coding each liberating and barely nerve-wracking—you’re trusting the AI’s interpretation of your wants with out verifying the implementation particulars.
Trust and Risk in Vibe Coding: Running Unreviewed Code
The second I hit “run” on that vibe-coded script, I noticed one thing which may make many builders cringe: I used to be executing utterly unreviewed code on my precise laptop with actual knowledge. In conventional software program improvement, this is able to be thought of reckless at finest. But the dynamics of belief really feel completely different with fashionable AI instruments like Claude 3.7 Sonnet, which has constructed up a popularity for producing fairly secure and practical code.
My rationalization was partly based mostly on the script’s restricted scope. It was simply studying HTML information and creating new Markdown information alongside them—not deleting, modifying present information, or sending knowledge over the community. Of course, that’s assuming the code did precisely what I requested and nothing extra! I had no ensures that it didn’t embody some sudden habits since I hadn’t checked out a single line.
This highlights a belief relationship that’s evolving between builders and AI coding instruments. I’m rather more prepared to vibe code with Claude or ChatGPT than I might be with an unknown AI instrument from some obscure web site. These established instruments have reputations to keep up, and their guardian corporations have sturdy incentives to stop their methods from producing malicious code.
That stated, I’d like to see working methods develop a “restricted execution mode” particularly designed for vibe coding eventualities. Imagine having the ability to specify: “Run this Python script, but only allow it to CREATE new files in this specific directory, prevent it from overwriting existing files, and block internet access.” This light-weight sandboxing would offer peace of thoughts with out sacrificing comfort. (I point out solely limiting writes reasonably than reads as a result of Python scripts sometimes have to learn varied system information from throughout the filesystem, making learn restrictions impractical.)
Why not simply use VMs, containers, or cloud providers? Because for personal-scale initiatives, the comfort of working instantly alone machine is difficult to beat. Setting up Docker or importing 250 HTML information to some cloud service introduces friction that defeats the aim of fast, handy vibe coding. What I need is to keep up that comfort whereas including simply sufficient security guardrails.
Vibe Checks: Simple Scripts to Verify AI-Generated Code
OK now come the “vibe checks.” As I discussed earlier, the great factor about these private knowledge processing duties is that I can usually get a way of whether or not the script did what I supposed simply by inspecting the output. For my HTML-to-Markdown conversion, I might open up a number of of the ensuing Markdown information and see in the event that they contained the survey responses I anticipated. This handbook spot-checking works fairly effectively for 250 information, however what about 2,500 or 25,000? At that scale, I’d want one thing extra systematic.
This is the place vibe checks come into play. A vibe verify is actually an easier script that verifies a fundamental property of the output out of your vibe-coded script. The key right here is that it needs to be a lot easier than the unique activity, making it simpler to confirm its correctness.
For my HTML-to-Markdown conversion undertaking, I noticed I might use an easy precept: Markdown information needs to be smaller than their HTML counterparts since we’re stripping away all of the tags. But if a Markdown file is dramatically smaller—say, lower than 40% of the unique HTML dimension—which may point out incomplete processing or content material loss.
So I went again to Claude and vibe coded a verify script. This script merely:
- Found all corresponding HTML/Markdown file pairs
- Calculated the scale ratio for every pair
- Flagged any Markdown file smaller than 40% of its HTML supply
And lo and behold, the vibe verify caught a number of information the place the conversion was incomplete! The authentic script had didn’t correctly extract content material from sure HTML constructions. I took these problematic information, went again to Claude, and had it refine the unique conversion script to deal with these edge instances.
After a couple of iterations of this suggestions loop—convert, verify, establish points, refine—I ultimately reached some extent the place there have been no extra suspiciously small Markdown information (effectively, there have been nonetheless a couple of under 40%, however handbook inspection confirmed these had been right conversions of HTML information with unusually excessive markup-to-content ratios).
Now you would possibly fairly ask: “If you’re vibe coding the vibe check script too, how do you know that script is correct?” Would you want a vibe verify in your vibe verify? And then a vibe verify for that verify? Well, fortunately, this recursive nightmare has a sensible resolution. The vibe verify script is often an order of magnitude easier than the unique activity—in my case, simply evaluating file sizes reasonably than parsing advanced HTML. This simplicity made it possible for me to manually assessment and confirm the vibe verify code, even whereas avoiding reviewing the extra advanced authentic script.
Of course, my file dimension ratio verify isn’t excellent. It can’t inform me if the content material was transformed with the correct formatting or if all hyperlinks had been preserved appropriately. But it gave me an affordable confidence that no main content material was lacking, which was my major concern.
Vibe Coding + Vibe Checking: A Pragmatic Middle Ground
The take-home message right here is straightforward however highly effective: When you’re vibe coding, all the time construct in vibe checks. Ask your self: “What simpler script could verify the correctness of my main vibe-coded solution?” Even an imperfect verification mechanism dramatically will increase your confidence in outcomes from code you by no means truly reviewed.
This strategy strikes a pleasant stability between the pace and artistic move of pure vibe coding and the reliability of extra rigorous software program improvement methodologies. Think of vibe checks as light-weight checks—not the excellent check suites you’d write for manufacturing code, however sufficient verification to catch apparent failures with out disrupting your momentum.
What excites me in regards to the future is the potential for AI coding instruments to counsel acceptable vibe checks mechanically. Imagine if Claude or comparable instruments couldn’t solely generate your requested script but in addition proactively provide: “Here’s a simple verification script you might want to run afterward to ensure everything worked as expected.” I believe if I had particularly requested for this, Claude might have steered the file dimension comparability verify, however having this constructed into the system’s default habits could be extremely precious. I can envision specialised AI coding assistants that function in a semi-autonomous mode—writing code, producing acceptable checks, working these checks, and involving you solely when human verification is really wanted.
Combine this with the sort of sandboxed execution setting I discussed earlier, and also you’d have a vibe coding expertise that’s each releasing and reliable—highly effective sufficient for actual work however with guardrails that stop catastrophic errors.
And now for the meta twist: This total weblog put up was itself the product of “vibe blogging.” At the beginning of our collaboration, I uploaded my earlier O’Reilly article,”Using Generative AI to Build Generative AI” as a reference doc. This gave Claude the chance to research my writing model, tone, and typical construction—very like how a human collaborator would possibly learn my earlier work earlier than serving to me write one thing new.
Instead of writing your entire put up in a single go, I broke it down into sections and offered Claude with a top level view for every part separately. For each part, I included key factors I needed to cowl and generally particular phrasings or ideas to incorporate. Claude then expanded these outlines into totally shaped sections written in my voice. After every part was drafted, I reviewed it—my very own model of a “vibe check”—offering suggestions and requesting revisions till it matched what I needed to say and the way I needed to say it.
This iterative, section-by-section strategy mirrors the vibe coding methodology I’ve mentioned all through this put up. I didn’t want to put in writing each sentence myself, however I maintained management over the path, messaging, and last approval. The AI dealt with the execution particulars based mostly on my high-level steering, and I carried out verification checks at strategic factors reasonably than micromanaging each phrase.
What’s notably fascinating is how this course of demonstrates the identical rules of belief, verification, and iteration that I advocated for in vibe coding. I trusted Claude to generate content material in my model based mostly on my outlines, however I verified every part earlier than transferring to the subsequent. When one thing didn’t fairly match my intent or tone, we iterated till it did. This balanced strategy—leveraging AI capabilities whereas sustaining human oversight—appears to be the candy spot for collaborative creation, whether or not you’re producing code or content material.
Epilogue: Behind the Scenes with Claude
[Claude speaking]
Looking again at our vibe running a blog experiment, I ought to acknowledge that Philip famous the ultimate product doesn’t totally seize his genuine voice, regardless of having his O’Reilly article as a reference. But in step with the vibe philosophy itself, he selected to not make investments extreme time in countless refinements—accepting good-enough reasonably than excellent.
Working section-by-section with out seeing the total construction upfront created challenges, just like portray components of a mural with out seeing the whole design. I initially fell into the lure of copying his define verbatim reasonably than reworking it correctly.
This collaboration highlights each the utility and limitations of AI-assisted content material creation. I can approximate writing kinds and develop outlines however nonetheless lack the lived expertise that provides human writing its genuine voice. The finest outcomes got here when Philip offered clear path and suggestions.
The meta-example completely illustrates the core thesis: Generative AI works finest when paired with human steering, discovering the appropriate stability between automation and oversight. “Vibe blogging” has worth for drafts and descriptions, however like “vibe coding,” some type of human verification stays important to make sure the ultimate product actually represents what you need to say.
[Philip speaking so that humans get the final word…for now]
OK, that is the one half that I wrote by hand: My parting thought when studying over this put up is that I’m not happy with the writing high quality (sorry Claude!), but when it weren’t for an AI instrument like Claude, I might not have written it within the first place as a result of lack of time and vitality. I had sufficient vitality right now to stipulate some tough concepts, then let Claude do the “vibe blogging” for me, however not sufficient to totally write, edit, and fret over the wording of a full 2,500-word weblog put up all on my own. Thus, similar to with vibe coding, one of many nice joys of “vibe-ing” is that it tremendously lowers the activation vitality of getting began on artistic personal-scale prototypes and tinkering-style initiatives. To me, that’s fairly inspiring.