Website Downtime Statistics: What the Data Actually Shows (2026)
How much does downtime really cost? What causes most outages? How long do they last? We compiled the published research on website and infrastructure downtime so you don't have to.
Everyone in the monitoring industry throws around the same statistic: downtime costs $5,600 per minute. The number is cited in blog posts, vendor pitch decks, and executive presentations—often without a source. It comes from a 2014 IHS Markit study commissioned by Emerson Network Power (now Vertiv), which found the average cost per minute of unplanned data centre downtime was $7,900. That’s for enterprise data centres. Website downtime for a small SaaS startup costs considerably less per minute—and considerably more for an e-commerce company on Black Friday.
This article compiles what the actual published research shows: real costs, real causes, real timelines, and what the numbers mean for teams running web services.
The Cost of Downtime: What Research Actually Shows
The $5,600 Figure (and Why It Varies So Widely)
The frequently cited “$5,600 per minute” figure originates from the 2014 IHS/Emerson research, later updated by Gartner. It’s an average across large enterprise organisations, weighted towards financial services, healthcare, and retail. For a mid-market company, the real number is lower; for a large financial institution, it’s often far higher.
ITIC’s (Information Technology Intelligence Consulting) annual reliability surveys provide more granular data. Their 2022 Global Server Hardware and Server OS Reliability Report found:
- 91% of enterprises say one hour of downtime now costs their organisation more than $300,000
- 44% of enterprises say one hour of downtime costs more than $1 million
- These figures have increased year-over-year as businesses become more dependent on digital infrastructure
The range is enormous. ITIC surveyed companies across size and industry, and the spread ran from a few thousand dollars for a small regional business to tens of millions for a large financial exchange.
E-Commerce: A Clearer Signal
E-commerce provides the clearest direct measurement of downtime cost because revenue per minute is more straightforward to calculate. A retailer doing $10 million/day in online revenue loses approximately $6,900 per minute when the site is unavailable. During peak periods like Black Friday, where revenue per minute can be 5–10x normal, the calculus changes dramatically.
The 2018 Amazon Prime Day outage illustrated this starkly. Amazon’s UK and US sites experienced loading issues for approximately 63 minutes. Based on Amazon’s reported 2018 annual revenue (~$232 billion) and the traffic concentration during Prime Day, analyst estimates placed the potential revenue impact at between $72 million and $99 million—roughly $1.1–$1.6 million per minute of disruption. Amazon has not officially disclosed a figure.
For smaller retailers, Shopify publishes aggregate platform performance data. During Black Friday 2023, Shopify processed over $4.2 billion in total sales—approximately $4.5 million per minute at peak. Any platform-wide outage during that window would have had immediate, measurable consequences for tens of thousands of merchants simultaneously.
How Often Does Downtime Actually Happen?
Data Centre and Infrastructure Outages
The Uptime Institute’s annual Global Data Centre Survey tracks data centre outages worldwide. Their 2023 report found:
- 54% of data centre operators reported a significant outage in the past three years
- 25% of those outages were classified as severe or serious
- The most common cause of significant outages was power-related failures (at 43% of incidents), followed by software and IT systems issues (22%) and network issues (18%)
- Only 15% of outages were caused by hardware failures—a significant shift from a decade ago, when hardware was the dominant cause
This shift toward software, configuration, and network causes matters for monitoring strategy. Hardware failures are often detected immediately by physical infrastructure alerts. Software failures—a bad deployment, a configuration error, a dependency outage—can be subtle, partial, or slow to surface.
Website and API Availability
Availability data for public-facing web services is harder to aggregate across the industry because most companies don’t publish outage durations. StatusPage data from major SaaS providers offers a partial picture.
Published post-mortems from major providers document notable incidents:
- GitHub (October 2018): 24-hour partial outage during database migration. Availability dropped to approximately 99.66% for the month.
- AWS us-east-1 (December 2021): Multi-hour outage affecting AWS services globally. Impacted a significant portion of the internet including major streaming, delivery, and SaaS services.
- Cloudflare (June 2022): Global outage affecting routing in 19 data centres for approximately 60 minutes.
- Facebook/Meta (October 2021): BGP routing configuration error took down Facebook, Instagram, and WhatsApp for approximately 6 hours globally.
What these incidents have in common: they were caused by configuration changes, not hardware failures. The Facebook outage was triggered by a BGP route update that accidentally withdrew Facebook’s own routing prefixes. The Cloudflare outage was caused by a routing policy change. The AWS outage was triggered by a misconfigured automation.
The Causes of Downtime: What the Data Shows
Software Deployment is the Biggest Risk
Multiple post-mortem databases and incident surveys consistently show that deployment and configuration changes cause the majority of production incidents. Google’s Site Reliability Engineering book, published with data from Google’s internal SRE practice, notes that configuration changes represent one of the highest-risk periods for any service.
A survey by PagerDuty of their customer base found that approximately 70% of incidents are triggered by change events—deployments, configuration updates, or infrastructure changes—rather than spontaneous hardware or network failures.
This is why monitoring needs to catch problems quickly after deployments. A monitoring system checking every 5 minutes might miss an incident that resolves (or causes cascading failures) within that window.
Human Error in Infrastructure
The Uptime Institute’s research consistently attributes 70–80% of data centre outages to human error in some form—either directly (an engineer running the wrong command) or indirectly (missing or inadequate processes, insufficient testing of change procedures). This doesn’t mean humans are careless; it reflects that most infrastructure failures trace back to a decision, procedure, or automation written by a person.
Third-Party Dependencies
As services have become more interconnected, third-party dependency failures have become a larger share of total incidents. Your application might be perfectly healthy, but if your payment processor, CDN, DNS provider, or authentication service has an outage, your users experience your service as unavailable.
The AWS December 2021 outage demonstrated this clearly: organisations with no direct AWS relationship experienced service disruptions because services they depended on—Slack, Venmo, iRobot, and others—were running on AWS.
How Long Does Downtime Last?
Mean Time to Detect (MTTD)
Many organisations don’t find out about outages from their monitoring tools—they find out from customer complaints. A 2021 survey by Catchpoint found that for 40% of incidents, the first notification came from an external source (a customer, a social media report, or a partner) rather than an internal monitoring system.
This indicates that monitoring coverage is still far from universal. An incident that begins at 3am and generates no external customer reports until business hours—because users aren’t active—might go undetected for hours.
Mean Time to Resolve (MTTR)
PagerDuty’s 2023 State of Digital Operations report provides MTTR benchmarks across industries:
- Technology companies: Median MTTR of approximately 30–60 minutes per incident
- Financial services: Median MTTR of 60–90 minutes
- Healthcare: Median MTTR of 90–180 minutes
- Retail: MTTR varies significantly; e-commerce incidents are often resolved faster due to direct revenue pressure
MTTR has improved over the past five years as observability tooling has matured. But the single largest improvement factor across organisations is faster detection—getting an alert to an on-call engineer within minutes of an incident starting, rather than after a user has already filed a support ticket.
What Happens After Downtime: The Reputational Tail
The direct revenue cost of an outage is usually the immediate focus, but the reputational impact often lasts longer.
User Trust Degrades After Outages
A survey by Statista in 2022 found that 44% of users said they would switch to a competitor after a service was unavailable for more than one hour. For B2B SaaS companies with contractual SLAs, the consequences are more direct: SLA breaches trigger credits, and repeated breaches trigger cancellation clauses.
Akamai has published research showing that 53% of mobile web users abandon pages that take longer than 3 seconds to load. While this is a performance measurement rather than a downtime measurement, it demonstrates that user tolerance for unavailability is extremely low—and that tolerance is shaped by what alternatives exist.
The SLA Penalty Math
For a SaaS company with a standard 99.9% uptime SLA (which allows for roughly 8.7 hours of downtime per year), breaching that SLA means issuing service credits. A typical SLA credit is 10–25% of monthly fees per hour of breach.
For a customer paying $10,000/month on a 99.9% SLA:
- 1 hour of unplanned downtime → 10% credit = $1,000 credit
- If 100 such customers are affected → $100,000 in service credits for a single incident
This is before factoring in churn risk from customers who choose not to renew rather than claim credits.
What 99.9% Uptime Actually Means
Uptime percentages are frequently misunderstood as annual figures, but they’re typically calculated monthly:
| Uptime SLA | Monthly downtime allowed | Annual downtime allowed |
|---|---|---|
| 99% | 7.2 hours/month | 3.65 days/year |
| 99.9% | 43.8 minutes/month | 8.77 hours/year |
| 99.95% | 21.9 minutes/month | 4.38 hours/year |
| 99.99% | 4.4 minutes/month | 52.6 minutes/year |
| 99.999% | 26 seconds/month | 5.26 minutes/year |
Most SaaS services operate between 99.9% and 99.99% in practice. The jump from 99.9% to 99.99% requires significantly more investment in redundancy, testing, and incident response—but also reduces allowable monthly downtime from 43 minutes to under 5 minutes.
Google’s SRE book explicitly discusses accepting the appropriate level of risk for a given service. The argument is that a service targeting 99.99% uptime when its users are on 99% reliable networks is spending engineering budget on reliability improvements users can’t perceive.
The Monitoring Gap
The most consistent finding across downtime research is the gap between organisations that have monitoring and organisations that have monitoring that actually catches incidents quickly.
Checking a service every 5 minutes from a single location is not the same as 30-second checks from 35 global locations. An outage affecting only users in Southeast Asia—because a CDN edge node is misconfigured—might not be visible to a monitor running from a single US server.
The most effective monitoring setups share common characteristics:
- Frequent checks (≤60 seconds) so incidents are caught immediately
- Multi-region coverage so regional issues aren’t missed
- Diverse check types — not just HTTP status codes, but DNS resolution, SSL validity, response content, and TCP connectivity
- Fast alert routing with clear escalation paths so the right person is notified immediately
The cost of not having monitoring is, on average, hours of undetected downtime per incident. The cost of having it is a few dollars a month.
Key Takeaways
- The “$5,600 per minute” figure is a large-enterprise average. Your actual cost per minute of downtime depends on your revenue model, traffic, and industry.
- 70–80% of outages are caused by change events, not hardware failures. This means post-deployment monitoring is critical.
- 40% of incidents are first detected by customers, not monitoring systems—an indication that monitoring coverage remains insufficient across the industry.
- MTTR has improved, but the biggest gains come from faster detection, not faster resolution.
- Reputational and contractual costs (churn, SLA credits) often exceed the direct revenue cost of an outage.
- 99.9% uptime allows 43 minutes of downtime per month—most teams are surprised by how much that is.
Start monitoring in 30 seconds
StatusApp gives you 30-second checks from 35+ global locations, instant alerts, and beautiful status pages. Free plan available.