Monitoring for SaaS Companies: Beyond Simple Uptime Checks
SaaS companies need more than basic uptime monitoring. Learn how to build a monitoring strategy that covers APIs, webhooks, background jobs, and user-facing performance.
If you are running a SaaS product, “is the website up?” is the bare minimum monitoring question. Your customers depend on your APIs, webhooks, background processing, and integrations working correctly around the clock. A comprehensive monitoring strategy is the difference between catching issues in seconds and finding out from angry customer support tickets.
Why SaaS Monitoring Is Different
SaaS products have unique monitoring challenges:
- Multi-tenant architecture: An issue affecting one customer segment might not affect others
- API-first design: Your API is the product, not just a website
- Integration dependencies: You depend on (and are depended upon by) third-party services
- Background processing: Queues, cron jobs, and async workflows run silently until they break
- Global user base: Performance needs to be consistent across regions
- SLA commitments: You have contractual uptime obligations
Traditional website monitoring catches maybe 20% of these concerns. Here is how to cover the rest.
The SaaS Monitoring Stack
1. API Endpoint Monitoring
Your API is your product’s interface with the world. Monitor it like the critical asset it is.
What to monitor:
- Authentication endpoints (login, token refresh)
- Core CRUD operations (create, read, update, delete)
- Search and query endpoints
- Webhook delivery endpoints
- Public API endpoints (if you have a developer platform)
How to monitor effectively:
Do not just check for a 200 status code. Validate the response body:
{
"type": "api",
"name": "User API - List Users",
"url": "https://api.example.com/v1/users",
"method": "GET",
"headers": {
"Authorization": "Bearer ${MONITORING_API_KEY}"
},
"expectedStatus": 200,
"expectedBody": "\"users\"",
"maxResponseTime": 2000,
"interval": 30
}
This catches scenarios where your API returns 200 but with an empty response, an error message in the body, or degraded performance.
2. GraphQL Monitoring
If your API uses GraphQL, standard HTTP monitoring is not enough. GraphQL always returns 200 (even for errors), so you need to validate the response payload.
query HealthCheck {
currentUser {
id
email
}
systemStatus {
database
cache
queue
}
}
StatusApp’s GraphQL monitor type handles this natively — it sends your query and validates the response structure.
3. Webhook Delivery Monitoring
If your SaaS sends webhooks to customers (payment events, status updates, data sync), failed webhook delivery is a silent failure that customers notice before you do.
Use heartbeat monitoring for webhook processors:
// In your webhook delivery worker
async function processWebhook(event) {
try {
await deliverWebhook(event);
// Ping StatusApp heartbeat after successful delivery
await fetch('https://heartbeat.statusapp.io/YOUR_HEARTBEAT_ID');
} catch (error) {
handleFailure(event, error);
}
}
If the heartbeat is not pinged within the expected interval, you know the webhook processor has stalled.
4. Background Job Monitoring
SaaS products rely on background jobs for billing, email sending, data processing, report generation, and more. These jobs fail silently.
Set up heartbeat monitors for each critical job:
| Job | Expected Interval | Alert After |
|---|---|---|
| Billing processor | Every hour | 90 minutes |
| Email digest | Daily at 9 AM | 10 AM |
| Data export | On demand, max 4 hours | 5 hours |
| Cache warmup | Every 15 minutes | 20 minutes |
| Database backup | Daily at 2 AM | 3 AM |
5. SSL and Certificate Monitoring
An expired SSL certificate on your API endpoint means every customer integration breaks simultaneously. This is a company-wide incident. Monitor SSL certificates with alerts at 30, 14, and 7 days before expiration.
6. DNS Monitoring
DNS issues are notoriously difficult to debug in the moment. Monitor your DNS records:
- A/AAAA records: Your primary domain and API subdomain
- CNAME records: CDN and service aliases
- MX records: If you use a custom domain for email
- TXT records: SPF, DKIM, and DMARC for email deliverability
7. Server and Infrastructure Monitoring
Even with managed cloud providers, you need visibility into:
- CPU utilization: Sustained high CPU indicates scaling needs
- Memory usage: Memory leaks accumulate slowly then crash suddenly
- Disk usage: Log files and database storage grow over time
- Network I/O: Traffic spikes and bandwidth limits
StatusApp’s server monitoring agent provides real-time visibility into these metrics from a lightweight process.
Building Your Status Page
Every SaaS company needs a public status page. It is where your customers check when something feels wrong, and it is where you communicate during incidents.
A good status page includes:
- Component status: Break your service into components (API, Dashboard, Webhooks, Data Processing)
- Current incidents: Real-time updates during issues
- Uptime history: 90-day uptime visualization for each component
- Subscriber notifications: Email/SMS alerts for subscribers
StatusApp includes status pages on all plans. You can connect it to a custom domain (status.yourapp.com) and embed it in your documentation.
SLA Monitoring and Reporting
Most SaaS contracts include uptime SLAs (commonly 99.9% or 99.95%). You need data to prove compliance.
| SLA | Allowed Downtime (Monthly) | Allowed Downtime (Annual) |
|---|---|---|
| 99.0% | 7h 18m | 3d 15h 36m |
| 99.9% | 43m 49s | 8h 45m 57s |
| 99.95% | 21m 55s | 4h 22m 58s |
| 99.99% | 4m 23s | 52m 35s |
StatusApp’s analytics track your actual uptime to the second, making SLA reporting straightforward. You can generate reports showing uptime by time period, region, and monitor.
Alert Strategy for SaaS
Who Gets Alerted?
- On-call engineer: All critical alerts (PagerDuty or SMS)
- Engineering team: High-priority alerts (Slack channel)
- Support team: Customer-facing service degradation (email or Slack)
- Management: Extended outages exceeding 15 minutes (email)
Alert Thresholds
Avoid alerting on every minor fluctuation:
- Response time: Alert when p95 exceeds 2x normal, not on individual slow requests
- Error rate: Alert when error rate exceeds 1%, not on single errors
- Availability: Use confirmation checks (2-3 failures from different locations) before triggering
The Complete SaaS Monitoring Checklist
Here is what a comprehensive SaaS monitoring setup looks like:
- Homepage and login page (Website monitors)
- Core API endpoints with response validation (API monitors)
- GraphQL queries with response checks (GraphQL monitors)
- SSL certificates for all domains (SSL monitors)
- DNS records for primary and API domains (DNS monitors)
- Domain expiration dates (Domain monitors)
- Background job heartbeats (Heartbeat monitors)
- Server resources: CPU, memory, disk (Server monitors)
- Database connectivity (TCP monitor)
- Cache layer (Redis/Memcached) connectivity (TCP monitor)
- Public status page configured and linked
- Alert channels configured and tested
- SLA reporting baseline established
StatusApp’s Business plan (500 monitors at $49/month) covers even complex SaaS architectures comfortably.
Build a monitoring setup your SaaS deserves. Start with StatusApp free and expand as you grow.
Start monitoring in 30 seconds
StatusApp gives you 30-second checks from 35+ global locations, instant alerts, and beautiful status pages. Free plan available.