GraphQL Monitoring: Why Traditional HTTP Checks Aren't Enough
GraphQL APIs break conventional monitoring assumptions. Learn why standard HTTP checks miss GraphQL failures and how to monitor GraphQL APIs effectively.
GraphQL has fundamentally changed how APIs work, but most monitoring tools have not caught up. They still treat every HTTP endpoint the same way: send a request, check for a 200 status code, done. For GraphQL, this approach misses the majority of real failures.
The GraphQL Monitoring Problem
Here is why traditional HTTP monitoring fails for GraphQL:
Everything Returns 200
The most fundamental difference: GraphQL always returns HTTP 200, even when things go wrong. A successful query, a validation error, an authentication failure, and a complete resolver crash all return the same status code.
Traditional API:
GET /api/users → 200 (success)
GET /api/users → 401 (auth failure)
GET /api/users → 500 (server error)
GraphQL:
POST /graphql → 200 (success)
POST /graphql → 200 (auth failure, error in body)
POST /graphql → 200 (resolver crash, error in body)
POST /graphql → 200 (partial data, some resolvers failed)
A monitoring tool that only checks for HTTP 200 will report “all clear” while your GraphQL API is completely broken.
Single Endpoint, Multiple Services
REST APIs have many endpoints, each serving a specific resource. A failure in the users service affects /api/users but not /api/products. You can monitor each independently.
GraphQL has one endpoint (/graphql) that serves everything. A monitor hitting /graphql might succeed because the query it sends happens to resolve against a healthy service, while other queries fail because they depend on a broken service.
Partial Failures
GraphQL can return partial data — some fields resolve successfully while others error:
{
"data": {
"user": {
"name": "Alice",
"orders": null
}
},
"errors": [{
"message": "Orders service unavailable",
"path": ["user", "orders"]
}]
}
This is a successful HTTP response (200) with valid JSON, but the orders service is down. Traditional monitoring sees a healthy endpoint.
How to Monitor GraphQL Properly
1. Send Real Queries
Do not just ping the /graphql endpoint. Send actual GraphQL queries that exercise your resolvers:
query MonitoringCheck {
# Check user service
currentUser {
id
}
# Check product service
products(limit: 1) {
id
name
}
# Check order service
recentOrders(limit: 1) {
id
status
}
}
This single query validates three backend services. If any of them fail, the response will contain errors.
2. Validate Response Content
Check the response body for the presence of expected data and the absence of errors:
{
"type": "graphql",
"url": "https://api.example.com/graphql",
"query": "query { systemHealth { status } }",
"expectedBody": "\"status\":\"ok\"",
"notExpectedBody": "\"errors\"",
"interval": 30
}
StatusApp’s GraphQL monitor type supports this natively — send a query, check the response, alert on unexpected content.
3. Monitor Multiple Query Types
Create separate monitors for different query categories:
Health check query (30-second interval):
query HealthCheck {
systemHealth {
database
cache
search
fileStorage
}
}
Critical business query (60-second interval):
query CriticalPath {
me { id }
featuredProducts(limit: 1) { id }
cart { itemCount }
}
Complex query performance (5-minute interval):
query PerformanceCheck {
products(
filter: { category: "test" }
sort: { field: CREATED_AT, direction: DESC }
limit: 10
) {
id
name
variants { id price }
reviews(limit: 5) { rating }
}
}
4. Track Response Time by Query Complexity
GraphQL query performance varies dramatically based on complexity. A simple field lookup might take 10ms while a deeply nested query with multiple joins takes 2 seconds.
Monitor queries at different complexity levels:
- Simple: Single-level queries (target: under 100ms)
- Medium: 2-3 levels of nesting (target: under 500ms)
- Complex: Deep nesting with pagination (target: under 2 seconds)
5. Monitor Schema Changes
Unexpected schema changes can break clients:
query SchemaCheck {
__schema {
queryType { name }
mutationType { name }
}
__type(name: "User") {
fields { name }
}
}
If a field disappears from the schema, this query’s response will change, and your content validation will catch it.
GraphQL-Specific Failure Modes
N+1 Query Problems
GraphQL makes it easy to write queries that cause N+1 database queries on the backend. A query that works fine with 10 items might time out with 1000 items.
Monitor representative queries at realistic data volumes, not just against empty test databases.
Resolver Timeouts
Individual resolvers can time out while others succeed, resulting in partial data. Monitor for the presence of errors in responses, not just for the presence of data.
Rate Limiting
GraphQL’s flexible queries can consume wildly different amounts of server resources. A single complex query might consume more resources than 100 simple ones. Monitor for rate limiting errors:
{
"errors": [{
"message": "Query complexity exceeds limit",
"extensions": { "code": "QUERY_COMPLEXITY_LIMIT" }
}]
}
Authentication and Authorization Failures
A GraphQL API might be accessible but return null data due to auth issues:
{
"data": {
"currentUser": null
}
}
This is technically a valid response, but if your monitoring query should return a user, a null value indicates an auth failure. Validate specific expected values in your monitoring queries.
Dedicated Health Check Endpoint
Consider creating a dedicated health query that checks all backend services:
type SystemHealth {
database: ServiceStatus!
cache: ServiceStatus!
search: ServiceStatus!
fileStorage: ServiceStatus!
emailService: ServiceStatus!
paymentGateway: ServiceStatus!
}
enum ServiceStatus {
OK
DEGRADED
DOWN
}
type Query {
systemHealth: SystemHealth!
}
This gives your monitoring a single query that validates your entire backend stack.
Monitoring GraphQL Subscriptions
If you use GraphQL subscriptions (WebSocket-based real-time data), standard HTTP monitoring will not cover them. You need:
- WebSocket connection monitoring: Can clients establish a connection?
- Subscription delivery monitoring: Are messages being delivered within expected latency?
- Connection stability: Do connections stay alive or drop frequently?
For WebSocket monitoring, use TCP monitors to verify the WebSocket port is accepting connections, and heartbeat monitors within your subscription handler to verify messages are flowing.
Recommended Setup
| Monitor | Type | Query | Interval |
|---|---|---|---|
| Health check | GraphQL | systemHealth query | 30 sec |
| Auth validation | GraphQL | currentUser query | 60 sec |
| Critical queries | GraphQL | Business-critical paths | 60 sec |
| Complex queries | GraphQL | Performance-sensitive queries | 5 min |
| Schema integrity | GraphQL | Introspection query | 15 min |
| WebSocket endpoint | TCP | Connection check | 60 sec |
StatusApp’s GraphQL monitor type makes this straightforward: define your query, set expected response content, and get alerted when things break.
Monitor your GraphQL APIs properly. Start with StatusApp — purpose-built GraphQL monitoring included on all plans.
Start monitoring in 30 seconds
StatusApp gives you 30-second checks from 35+ global locations, instant alerts, and beautiful status pages. Free plan available.