AI of the Coast: The 5-Year Roadmap to General AI

AI of the Coast: The 5-Year Roadmap to General AI

30% of Your Equipment Will Fail Without Warning—We Built the System That Sees It Coming

The $2.1 Billion Lie Every Facility Manager Tells Themselves

Jiri "Skzites" Fiala's avatar
Jiri "Skzites" Fiala
Dec 06, 2025
∙ Paid

Here’s a fun exercise.

Walk into any major facility management company and ask about their predictive maintenance capabilities.

They’ll show you dashboards.

Beautiful dashboards.

Color-coded alerts and real-time sensor feeds streaming across monitors like the control room of a science fiction movie. Temperature readings, vibration data, power consumption graphs—thousands of data points updating every few seconds.

Then ask them one simple question: what percentage of equipment failures did you predict last quarter?

The silence tells you everything.

Despite billions invested in IoT infrastructure, 30% of equipment failures remain completely unplanned.

Not reduced from some higher number.

Just... 30%.

Sitting there, immovable, mocking every sensor installation and every dashboard deployment like a monument to wasted capital expenditure.

The industry’s reactive maintenance costs exceeded $2.1 billion last year.

Average downtime hovers around 22%. And the executives responsible for these numbers keep buying more sensors, as if the problem were insufficient data rather than insufficient intelligence applied to the data they already have.

I’ve seen this movie before.

Why Sensors Don’t Solve Problems

Let me explain something that every IoT vendor prays you never realize.

A sensor measures.

That’s it.

A temperature sensor tells you the current temperature. A vibration sensor tells you the current vibration frequency. An electrical meter tells you current power draw.

What no sensor can tell you is whether that temperature reading means anything. Whether that vibration pattern indicates imminent bearing failure or normal operation within acceptable tolerances. Whether that power spike is the beginning of motor degradation or just someone turning on a vacuum cleaner.

The data isn’t the insight. The data is the raw material that might, under very specific conditions, with exactly the right processing, become insight.

Every facility management company I’ve worked with has the same fundamental architecture: sensors feeding databases, databases feeding dashboards, dashboards feeding humans who have neither the time nor the expertise to interpret what they’re seeing.

The CEO gets a weekly report showing 47 anomalies flagged.

Nobody can tell which of those 47 actually matter.

So they respond to all of them. Or none of them. Both approaches fail at roughly the same rate.

The Predictive Maintenance Intelligence Platform

We built PMIP for a hospital network—12 facilities, critical HVAC systems, medical equipment where failure doesn’t just mean inconvenience but potentially means patients on operating tables when the climate control dies.

The architecture we deployed represents what I call the Multi-Modal Prediction Stack—a framework that emerged from watching dozens of predictive maintenance implementations fail because they tried to solve everything with one technique.

Layer one: LSTM neural networks for time-series anomaly detection. Long Short-Term Memory networks excel at patterns that unfold over time—the slow degradation curves that human analysts miss because the change from Tuesday to Wednesday is imperceptible, even though the change from January to March spells catastrophe. We deployed these across hundreds of IoT sensor feeds, analyzing not individual readings but trajectories and correlations.

Layer two: LLM-powered knowledge management. Here’s the counterintuitive insight that nobody talks about. The most valuable predictive data often isn’t in sensor feeds at all—it’s buried in 15 years of maintenance logs, technician notes, and equipment repair histories. Unstructured text that no traditional analytics system can touch. We ingested this historical knowledge using Claude Sonnet 4, transforming tribal knowledge locked in filing cabinets and legacy databases into queryable intelligence.

Layer three: Computer vision for physical inspection. Thermal imaging analysis, visual deterioration detection, equipment condition assessment. The sensors measure internal states, but external signs often provide earlier warning—discoloration, wear patterns, alignment drift. The computer vision models complement sensor data with inspection intelligence.

Layer four: Survival analysis and RUL prediction. Remaining Useful Life algorithms that don’t just flag anomalies but estimate time-to-failure with confidence intervals. The difference between knowing something is degrading and knowing you have 23 days before critical failure determines whether you can schedule maintenance during planned downtime or scramble during a crisis.

The Real Numbers

Development cost: €95,000.

AI agent processing: 700 hours.

Monthly infrastructure: €18,000 for combined GPU compute and IoT data processing.

Total pilot investment: €140,000.

Full implementation scales to €520,000 internal cost, to €1.5 million customer pricing, plus €55,000 monthly subscription and per-asset fees.

Those numbers will strike some people as expensive.

Compare them to the alternative: a single unplanned chiller failure in a hospital data center runs $150,000 to $300,000 when you factor equipment damage, emergency repair premiums, and downtime costs.

The system paid for itself with the second prevented failure.

The pilot ran 16 weeks.

Full implementation takes 9-12 months.

ROI materializes within 12-18 months, faster if you’re in a sector where downtime carries regulatory or safety implications.

What Actually Happened

The hospital network achieved 87% prediction accuracy on equipment failures, with 2-4 week advance warning. Unplanned downtime dropped 45%. Critical equipment availability hit 99.2%.

Annual maintenance costs reduced by €2.8 million through optimized preventive scheduling.

But here’s the honest part: not everything worked immediately.

The first three months were brutal. The LSTM models generated false positives at an unacceptable rate—flagging normal operational variations as anomalies because we hadn’t properly calibrated for the specific behavior patterns of medical-grade HVAC systems. The maintenance staff, already skeptical of yet another technology initiative, started ignoring alerts entirely.

We rebuilt the threshold calibration system from scratch. Added a feedback loop where technicians could mark predictions as accurate or false. The models learned. False positives dropped from 34% to under 8% by month four.

The knowledge management layer faced a different challenge. The historical maintenance logs contained years of inconsistent terminology, abbreviations, and missing context. “Replaced compressor” in 2015 might reference the same equipment as “swapped comp unit” in 2019, but the system needed explicit training to recognize the connection.

These are the implementation realities that vendor demos never show you.

The Hybrid Architecture That Makes It Work

Data sovereignty matters enormously in this domain. Hospital equipment data, proprietary maintenance procedures, facility-specific failure patterns—none of this should leave the client’s infrastructure.

We deployed a hybrid architecture: on-premise LLM processing for all sensitive data and equipment-specific analysis; cloud APIs only for general equipment knowledge, industry benchmarking, and pattern recognition training that benefits from cross-client anonymized data.

The boundary between local and cloud processing isn’t arbitrary. It follows data sensitivity precisely: anything that could identify specific equipment, specific facilities, or specific operational patterns stays on-premise. Only abstract patterns—”reciprocating compressor degradation curve type 3”—transit to cloud infrastructure.

This architecture decision added approximately €30,000 to development costs and created ongoing operational complexity.

Worth every cent.

Share

The alternative—explaining to a hospital CIO why their equipment performance data lives on external servers—is a conversation that ends implementations before they begin.

What We Actually Learned & Detail Architecture

First insight: multimodal beats single-modal at every scale.

Keep reading with a 7-day free trial

Subscribe to AI of the Coast: The 5-Year Roadmap to General AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Jiri "Skzites" Fiala · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture