Robots Don’t Need Code Anymore — And Your Supply Chain Is About to Get Weird
The multi-billion dollar programming industry just became obsolete for physical automation. Nobody’s talking about it.
Here’s something that’ll keep supply chain executives up at night: in a lab in Beijing last month, a robot arm figured out how to fold laundry without a single line of task-specific code.
Not impressed?
Let me reframe that. For sixty years, every robot in every factory has been running on rigid, pre-programmed instructions. Move arm to coordinate X. Grip with force Y. Repeat 47,000 times. If literally anything changes — the shirt is wrinkled differently, the lighting shifts, someone sneezes — the whole thing falls apart like a house of cards in a windstorm.
Vision-Language-Action models just torched that entire paradigm.
And I’m watching Fortune 500 companies completely miss what’s happening while they’re still debugging their RPA implementations from 2019.
The Revolution Nobody Expected
I’ve spent two decades building automation systems across 110+ startups. I’ve seen every wave of “intelligent” robotics hype come and crash against the rocks of reality. Boston Dynamics’ backflipping robots? Incredible engineering, zero commercial viability. Amazon’s warehouse bots? Brilliant for one specific task, helpless the moment you ask them to do anything else.
VLA models are different.
They’re different in a way that makes my enterprise architecture brain light up like a Christmas tree, because they solve the fundamental problem that’s plagued physical automation since the 1960s: generalization.
A traditional robot is essentially an idiot savant with perfect muscle memory. It can perform one task with inhuman precision, but ask it to do something 2% different and you might as well be asking your toaster to file your taxes.
VLA models combine computer vision, natural language understanding, and physical action into one integrated system. The robot doesn’t just see pixels. It understands context, interprets instructions given in plain English, and figures out how to manipulate objects it’s never encountered before.
This isn’t incremental improvement.
This is architectural revolution.
Listen to our podcast episodes about the most interesting AI developments happening right now!!! Latest episode is here:
The $60M Burnout: What Happens When You Sell Your Soul to the AI Gods
Listen (and watch) all our episodes here! Youtube
Want to have a chat about future of AI? Your idea, startup, project or initiative with a world recognized AI expert and actual practicioner?
Book here your 15 minutes: https://calendly.com/indigi/jf-ai
Why Everyone’s Getting This Wrong
The robotics industry is making the same mistake the enterprise software world made with AI in 2023: they think this is about better tools for doing the same old things.
It’s not.
When you give machines the ability to understand natural language, perceive their environment semantically, and improvise physical actions based on intent rather than instructions, you’re not building better robots. You’re creating a new category of labor that sits somewhere between human and machine, and the implications are going to ripple through the global economy like a magnitude 9 earthquake.
Let me be specific about what’s actually happening in research labs right now — because the gap between what’s possible today and what most executives think is possible is staggering.
Google’s RT-2 model, which they casually dropped in 2023, can control a robot arm to perform tasks it was never explicitly trained on. You can tell it “pick up the extinct animal” when there’s a toy dinosaur on the table surrounded by other objects, and it figures it out. Not through pre-programming. Through reasoning about language, visual semantics, and physical interaction.
That’s not impressive robotics.
That’s alien intelligence operating in physical space.
The Stack Just Exploded
For sixty years, the robotics stack looked like this: sensors → control algorithms → actuators. Simple. Deterministic. Completely inadequate for real-world messiness.
The new stack looks nothing like that. Now you’ve got vision transformers processing visual data, large language models handling reasoning and instruction interpretation, action models translating intent into motor commands, and safety layers ensuring the robot doesn’t accidentally recreate that scene from The Terminator in your warehouse.
This matters because it changes everything about how you deploy physical automation.
Traditional robots required armies of engineers programming every movement, every contingency, every error state. The cost of deployment was measured in millions of dollars and months of integration time. VLA-powered robots? You demonstrate a task a few times, give it natural language instructions, and turn it loose.
The economics just shifted by two orders of magnitude.
Where This Gets Interesting (And Terrifying)
I’m watching three sectors where VLA deployment is about to go from research curiosity to commercial necessity:
Logistics is the obvious one. Amazon, DHL, FedEx — they’re all testing VLA-powered systems that can handle the insane variety of package shapes, sizes, and handling requirements that currently require human workers. But here’s what nobody’s talking about: the real unlock isn’t warehouse automation. It’s last-mile delivery in environments that change constantly. Robots that can navigate apartment buildings, operate elevators, read door numbers, and adapt to unexpected obstacles.
That’s not science fiction. That’s 2025.
Healthcare is where things get wild. Surgical robots today are essentially very expensive video game controllers for human surgeons. VLA models enable semi-autonomous surgical assistance — systems that can understand anatomical context, anticipate surgeon intent, and perform routine sub-tasks while the human focuses on high-stakes decisions. We’re talking about multiplying surgeon productivity by 3-5x within the decade.
The implications for global healthcare access are staggering.
Manufacturing is about to experience its third revolution. The first was assembly lines. The second was industrial robots. The third is adaptive manufacturing where production lines reconfigure themselves based on demand, robots learn new tasks through observation rather than programming, and small-batch custom production becomes as economically viable as mass production.
If you’re running manufacturing operations and you’re not gaming out this scenario, you’re planning for a world that won’t exist in 36 months.
The Middleware Gold Rush
Here’s where it gets commercially interesting for anyone building in this space. VLA models are powerful, but they’re also fundamentally unpredictable. You cannot have a robot in a hospital operating room that occasionally decides to improvise in creative ways.
This creates a massive opportunity for middleware companies that sit between the generalist AI models and physical hardware, adding safety constraints, verification layers, and human-override mechanisms.
Think of it like this: VLA models are incredible drivers, but you still need guardrails, speed limits, and emergency brakes.
Companies like Embodi (which I’m watching closely) are building exactly this layer — systems that let you harness the flexibility of VLA reasoning while maintaining the safety and predictability that regulated industries require. They’re adding human-in-the-loop override capabilities, defining safe operational boundaries, and creating audit trails for compliance.
That’s the infrastructure play.
The Framework Nobody’s Built Yet
If you’re a CTO or VP of Operations trying to figure out how to actually deploy this stuff, here’s the framework I’m using with portfolio companies. Nobody else is talking about it this way, which means you’re getting this before it becomes conventional wisdom:
Stage One: Identify High-Variance Tasks
Look for operations where your current automation fails because of environmental variability. These are your VLA candidates. Not the repetitive tasks that traditional robots handle fine — the messy, inconsistent processes that currently require human flexibility.
In logistics, that’s handling returns with unpredictable packaging. In healthcare, that’s patient mobility assistance where every body is different. In manufacturing, that’s quality inspection across product variations.
Stage Two: Build Safety Envelopes
Define the operational boundaries within which VLA improvisation is acceptable. This isn’t about constraining the AI to be less intelligent — it’s about defining the playing field. Flight-control systems use the same principle: autopilot can handle thousands of scenarios, but it operates within strictly defined parameters.
Your safety envelope needs three components: environmental constraints (physical spaces where the robot operates), action constraints (what the robot is allowed to do), and override mechanisms (how humans intervene when needed).
Stage Three: Create Learning Loops
VLA models improve with exposure to edge cases. Design your deployment so every unusual situation generates training data. When the robot encounters something unexpected, that becomes tomorrow’s normal scenario.
This requires infrastructure most companies don’t have: vision recording systems, action logging, and feedback mechanisms that let human operators quickly correct and reinforce good behavior.
Stage Four: Measure Different Metrics
Traditional robotics optimizes for speed and precision. VLA systems optimize for adaptability and learning rate. Your KPIs need to shift from “how many units per hour” to “how quickly does the system handle novel situations without human intervention.”
That’s a fundamentally different measurement framework.
The Uncomfortable Questions
If robots can learn through observation and natural language instruction, what happens to the multi-billion dollar industrial programming industry? If physical labor can be performed by systems that cost $50K and learn new tasks in days rather than months, what happens to labor economics in developing economies that built their growth models on manufacturing employment?
If healthcare robots can multiply surgeon productivity 5x, but we still have the same number of medical schools graduating the same number of doctors, where does that capacity go?
These aren’t hypothetical questions for 2040. These are strategic planning considerations for 2026.
What You Should Do Tomorrow
Stop thinking about robots as automated tools and start thinking about them as general-purpose labor that happens to be non-human. That mental model shift changes everything about how you deploy them.
Identify three high-variance physical tasks in your operations that currently require human flexibility. Document why current automation fails. That’s your VLA target list.
Find the middleware companies building safety layers for VLA deployment in your industry. They’re usually small, usually unfunded, usually led by robotics PhDs who understand both the potential and the risks. Those are your integration partners.
Build relationships with academic labs working on embodied AI. Google DeepMind, Stanford’s IRIS lab, Berkeley’s RAIL lab. The gap between research and commercialization in this field is measured in months, not years.
The Brutal Reality
Most companies will miss this wave entirely. They’ll wait for “mature” solutions, established vendors, and proven case studies. By the time those exist, the competitive advantage will be gone.
The companies that win are the ones willing to deploy imperfect systems, learn from failures, and iterate rapidly. That’s uncomfortable for enterprise buyers trained on six-sigma and zero-defect mentalities.
Too bad.
The physical world is about to become programmable through natural language. Either you figure out how to leverage that capability, or you compete against organizations that do.
I know which bet I’m making.
Links & Resources:
Google DeepMind RT-2: Vision-Language-Action Models https://www.deepmind.google/discover/blog/rt-2-new-model-translates-vision-and-language-into-action/ Google’s research on using vision-language models to control robots, demonstrating generalization to unseen tasks through natural language instructions and visual reasoning.
Stanford IRIS Lab — Embodied Intelligence Research https://irislab.stanford.edu/ Leading academic research on integrating perception, language, and action for robotic systems operating in unstructured environments.
Berkeley RAIL: Robot Learning Lab https://rail.eecs.berkeley.edu/ Research on enabling robots to learn manipulation tasks through demonstration, language guidance, and interaction with the physical world.
Physical Intelligence — VLA Commercial Deployment https://www.physicalintelligence.company/ One of the first startups commercializing vision-language-action models for real-world robotic applications in logistics and manufacturing.