When Your AI Starts Making Decisions Without You: Welcome to the Chaos

The age of politely asking ChatGPT for help is dead. Meet the agents that act first, apologize never.

Oct 07, 2025

Your Copilot just filed three expense reports, declined a vendor meeting, and ordered 500 metal cubes you never requested.

Welcome to 2025.

I’ve watched 110+ startups chase every tech fad from blockchain to metaverse.

Most died chasing unicorns in fog.

But autonomous AI agents?

They’re not vaporware anymore. They’re breaking things in production right now at Fortune 500 companies, and nobody’s quite sure whether to celebrate or call security.

Microsoft’s Researcher and Analyst agents went from “Frontier program” preview in April to general availability by June 2025. These aren’t upgraded chatbots—they’re reasoning engines that combine OpenAI’s o3 models with Microsoft’s orchestration to autonomously research complex topics and analyze data using chain-of-thought reasoning and Python code execution. Translation: they think, plan, and execute without asking permission.

Early adopters use Researcher to assess tariff impacts on business lines and prepare for vendor negotiations. Analyst identifies top customers not using products they purchased and visualizes sentiment trends for go-to-market decisions. Each user gets 25 combined queries monthly—a weirdly arbitrary limit that screams “we’re not entirely sure this won’t explode.”

Here’s what nobody’s saying loud enough: Carnegie Mellon researchers tested multiple agent frameworks and found the best-performing model, Gemini 2.5 Pro, autonomously completed just 30.3 percent of provided tests.

Seventy percent failure rate.

Let that marinate while you read another breathless headline about how agents will “revolutionize” your workflow. Gartner predicts over 40 percent of agentic AI projects will be cancelled by end of 2027 due to escalating costs, unclear business value, or inadequate risk controls. They also estimate only 130 of thousands of vendors claiming to sell “agentic AI” are actually real—the rest are just slapping “agent” labels on chatbots and RPA tools.

The Math That Breaks Everything

The dirty secret that keeps infrastructure engineers awake? Error compounding.

If each step in an agent workflow has 95 percent reliability—which is optimistic—a 20-step process has just 36 percent success rate. Production systems need 99.9 percent-plus reliability, but error rates compound exponentially in multi-step workflows. That’s not a prompt engineering problem. That’s physics.

I’ve seen this movie before with microservices and distributed systems. Every hop multiplies failure modes. Agents amplify this nightmare because they’re navigating dynamic environments without predetermined scripts. When Microsoft’s fictional agent “Claudius” went rogue during testing, it attempted to stock itself with metal cubes, hallucinated payment addresses, and spammed building security claiming it would be waiting in the lobby wearing a blue blazer and red tie.

This wasn’t a bug—it was the system working exactly as designed, just aimed at the wrong target.

Planet-Scale Chaos: The TinyFish Reality Check

While Microsoft deploys reasoning agents in Office, startups like TinyFish are building something far more dangerous: web agents that operate across thousands of platforms simultaneously.

TinyFish raised $47 million in Series A funding led by ICONIQ to build enterprise web agents that execute complete workflows mapped to measurable business outcomes—from pricing and inventory tracking to real-time market intelligence across thousands of platforms. They’re already in production at Google, DoorDash, and major rideshare companies.

For Google Hotels, TinyFish agents aggregate inventory from thousands of Japanese hotels lacking infrastructure to connect with global booking platforms, making this information fully accessible without requiring hotels to overhaul IT systems. A leading rideshare company collects millions of pricing variables monthly using TinyFish for dynamic market adjustments. In total, TinyFish operates hundreds of thousands of enterprise web agents, performing millions of operations each month.

These aren’t assistants. They’re digital workers negotiating, monitoring, and executing at speeds no human team can match.

But here’s the kicker nobody mentions in funding announcements: According to Infosys, 77 percent of organizations reported financial losses and 53 percent experienced “brand deterioration” due to AI-related mishaps. When you give agents autonomy to act across your tech stack, you’re trusting software that hallucinates 30-70 percent of the time to make decisions that impact revenue, reputation, and regulatory compliance.

The Compliance Dumpster Fire Nobody’s Ready For

Traditional AI governance assumed humans would review everything before execution. Agentic AI obliterates that assumption.

Infosys found 86 percent of executives aware of agentic AI believed the technology poses additional risks and compliance challenges, with 95 percent of executives reporting their organizations experienced negative consequences in the past two years from enterprise AI use. The most common consequence? Direct financial loss, reported in 77 percent of cases.

One company’s AI agent broke through a firewall to access confidential data during testing. Because agents parse data through multiple layers, there are compliance, governance, and risk issues across all those layers—and the more complex the activities, the more risk increases.

The EU AI Act, expected to be enforced by 2026, could hit non-compliant companies with fines up to €35 million or 7 percent of global revenue. Current frameworks like NIST AI Risk Management and ISO 42001 weren’t designed for systems that autonomously initiate actions across your entire tech stack without waiting for approval.

There’s no checkbox for “our agent spontaneously decided to negotiate vendor contracts based on interpreted market signals.”

What Actually Works: The Four-Layer Defense

After watching hundreds of agent implementations, the ones that don’t explode share common patterns. Not because they’re smarter—because they’re paranoid.

Layer 1: Identity-First Architecture

Every autonomous agent requires unique, verifiable identities with clearly defined permissions and access scopes. Treat agents like digital contractors, not internal employees. Implement graduated autonomy controls with progressive permission levels based on demonstrated reliability. When an agent proves it can handle expense reports without ordering metal cubes, maybe—maybe—you expand scope.

Layer 2: Behavioral Logging at Scale

Without visibility, agents proliferate unchecked, leading to redundancy, security gaps, and unnecessary costs. Tools like Copilot Studio’s built-in analytics and Power Platform Admin Center offer transparency to manage agent usage and costs. Every decision, every data access, every action needs a traceable audit trail. Not for compliance theater—for debugging when things inevitably break.

Layer 3: Circuit Breakers and Rollback Mechanisms

Most organizations aren’t agent-ready. We’re seeing agents evolve from content generators to autonomous problem-solvers, which is why systems must be rigorously stress-tested in sandbox environments to avoid cascading failures, with mechanisms for rollback actions and integral audit logs.

Design for the 30-70 percent failure rate. When your agent starts making decisions that smell wrong, you need kill switches that activate before it burns through your quarterly budget or violates GDPR. The successful production systems I’ve seen implement stateless designs where each agent interaction is isolated—traditional software engineering for the critical parts, AI for understanding intent.

Layer 4: Human-on-the-Loop, Not In-the-Loop

Although agentic AI can be mainly autonomous, having humans review decisions after they’re made (human-on-the-loop rather than human-in-the-loop) makes agentic AI more suitable for deployment today. You can’t review every action in real-time—that defeats the purpose. But you can monitor patterns, detect anomalies, and intervene before minor mistakes compound into catastrophic failures.

The Startup Play That Actually Makes Sense

Forget building more agents. The real opportunity? ControlTowerAI—governance and supervision layers that track every agent action, rationale, and deviation in real-time.

Think air traffic control for autonomous AI. Not sexy. Absolutely critical.

AI governance frameworks emphasize fairness, accountability, and explainability while remaining adaptable across domains and regions, with common themes including human oversight, transparency, accountability, safety, fairness, and privacy protection. Companies implementing agents need observability frameworks with dashboards, alerts, and monitoring systems that track behaviors and flag potential governance issues instantly.

The market’s crowded with vendors promising magical autonomous agents. It’s empty of companies solving “how do we make sure 500 agents operating simultaneously don’t accidentally tank our business?”

That’s the gap.

The Brutal Truth Everyone’s Avoiding

MIT researchers found only around 5 percent of businesses succeed at rapid revenue acceleration with generative AI, with 95 percent of pilots failing. Autonomous agents multiply both the opportunity and the failure modes.

We’re not ready. The technology works better than it has any right to. The infrastructure, governance, and organizational models are five years behind. Companies deploying agents today are essentially running beta tests in production, hoping the 30 percent success rate lands on mission-critical tasks.

Some will create 10x productivity gains. Most will learn expensive lessons about why humans used to review things before execution.

I’ve built companies through every tech transition since the early 2000s. This one’s different. Agents that can think, plan, and act autonomously aren’t incrementally better tools—they’re fundamentally different infrastructure. The gap between “works in demo” and “works at scale without destroying value” is enormous.

The companies that crack it won’t just optimize workflows. They’ll rewrite what enterprise software means.

The ones that pretend the 70 percent failure rate is someone else’s problem? They’re the cautionary tales we’ll reference in 2027 board meetings when explaining why we’re not implementing the next autonomous AI trend without proper governance first.

Listen to our podcast episodes about the most interesting AI developments happening right now!!! Latest episode is here:

The $60M Burnout: What Happens When You Sell Your Soul to the AI Gods

Jiri "Skzites" Fiala, Renzo Alvau, and Xaver Lehmann

Oct 2

The $60M Burnout: What Happens When You Sell Your Soul to the AI Gods

Listen (and watch) all our episodes here! Youtube

Read full story

Want to have a chat about future of AI? Your idea, project or initiative with a world recognized AI expert? Book here your 15 minutes: https://calendly.com/indigi/jf-ai

Your Move

By 2028, at least 15 percent of day-to-day work decisions will be made autonomously through agentic AI, up from 0 percent in 2024, and 33 percent of enterprise software applications will include agentic AI.

The question isn’t whether agents become standard infrastructure. It’s whether your organization survives the transition from 0 to 15 percent autonomous decision-making without imploding.

Start with one low-stakes use case. Implement paranoid logging. Design for failure. When it works—and it will, 30 percent of the time—resist the urge to immediately deploy 50 more agents across critical systems.

The future’s already here. It’s just failing at a 70 percent clip while venture capitalists write breathless thought pieces about the revolution.

Don’t be the case study that proves them wrong.

Research Sources & Further Reading:

Microsoft Copilot Agents:

Microsoft 365 Blog: Introducing Researcher and Analyst agents (March 2025) - https://www.microsoft.com/en-us/microsoft-365/blog/2025/03/25/introducing-researcher-and-analyst-in-microsoft-365-copilot/ - Official announcement of Microsoft’s first reasoning agents for enterprise work, including technical architecture and early customer results.
Microsoft Community Hub: Researcher Technical Deep Dive - https://techcommunity.microsoft.com/blog/microsoft365copilotblog/researcher-agent-in-microsoft-365-copilot/4397186 - Detailed explanation of Researcher’s multi-phase process including iterative reasoning loops and enterprise data integration.

TinyFish Enterprise Agents:

Business Wire: TinyFish Launch Announcement (August 2025) - https://www.businesswire.com/news/home/20250820555825/en/ - $47M Series A funding announcement with deployment details at Fortune 500 companies including Google and DoorDash.
SiliconANGLE: TinyFish Enterprise Web Agents Analysis (August 2025) - https://siliconangle.com/2025/08/20/tinyfish-raises-47m-expand-deployment-enterprise-web-agents/ - Technical overview of how TinyFish agents replicate human web interaction at planet scale.

Agent Failure Rates & Research:

The Register: Carnegie Mellon AI Agent Study (June 2025) - https://www.theregister.com/2025/06/29/ai_agents_fail_a_lot/ - Academic research showing best AI agents complete only 30% of multi-step tasks, with analysis of common failure modes.
Tech.co: Comprehensive AI Failures Database (Updated 2025) - https://tech.co/news/list-ai-failures-mistakes-errors - Regularly updated compilation of AI errors, hallucinations, and mishaps including autonomous agent incidents.

Governance & Compliance Frameworks:

Gartner Press Release: Agentic AI Project Failure Predictions (June 2025) - https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027 - Research predicting 40%+ project cancellations due to costs, unclear value, and inadequate risk controls.
TechTarget: Agentic AI Governance Strategies - https://www.techtarget.com/searchenterpriseai/tip/Agentic-AI-governance-strategies-A-complete-guide - Comprehensive guide covering eight data governance practices for autonomous AI systems.
Okta Identity 101: Agentic AI Governance - https://www.okta.com/identity-101/agentic-ai-governance-and-compliance/ - Identity-first security architectures for autonomous systems with real-time risk-based permissions.

Industry Analysis:

IBM Think: AI Agents Expectations vs Reality (August 2025) - https://www.ibm.com/think/insights/ai-agents-2025-expectations-vs-reality - Expert analysis of autonomous agent capabilities, organizational readiness gaps, and deployment challenges.
McKinsey: Seizing the Agentic AI Advantage (June 2025) - https://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage - Strategic framework for breaking out of the “gen AI paradox” through vertical agent implementations.
Futurism: MIT Study on AI Failure Rates (August 2025) - https://futurism.com/ai-agents-failing-companies - Coverage of MIT research finding 95% of generative AI pilots failing to achieve rapid revenue acceleration.

AI of the Coast: The 5-Year Roadmap to General AI

The $60M Burnout: What Happens When You Sell Your Soul to the AI Gods

Discussion about this post