GraphQL Monitoring: Why Traditional HTTP Checks Aren't Enough

GraphQL has fundamentally changed how APIs work, but most monitoring tools have not caught up. They still treat every HTTP endpoint the same way: send a request, check for a 200 status code, done. For GraphQL, this approach misses the majority of real failures.

The GraphQL Monitoring Problem

Here is why traditional HTTP monitoring fails for GraphQL:

Everything Returns 200

The most fundamental difference: GraphQL always returns HTTP 200, even when things go wrong. A successful query, a validation error, an authentication failure, and a complete resolver crash all return the same status code.

Traditional API:

GET /api/users → 200 (success)
GET /api/users → 401 (auth failure)
GET /api/users → 500 (server error)

GraphQL:

POST /graphql → 200 (success)
POST /graphql → 200 (auth failure, error in body)
POST /graphql → 200 (resolver crash, error in body)
POST /graphql → 200 (partial data, some resolvers failed)

A monitoring tool that only checks for HTTP 200 will report “all clear” while your GraphQL API is completely broken.

Single Endpoint, Multiple Services

REST APIs have many endpoints, each serving a specific resource. A failure in the users service affects /api/users but not /api/products. You can monitor each independently.

GraphQL has one endpoint (/graphql) that serves everything. A monitor hitting /graphql might succeed because the query it sends happens to resolve against a healthy service, while other queries fail because they depend on a broken service.

Partial Failures

GraphQL can return partial data — some fields resolve successfully while others error:

{
  "data": {
    "user": {
      "name": "Alice",
      "orders": null
    }
  },
  "errors": [{
    "message": "Orders service unavailable",
    "path": ["user", "orders"]
  }]
}

This is a successful HTTP response (200) with valid JSON, but the orders service is down. Traditional monitoring sees a healthy endpoint.

How to Monitor GraphQL Properly

1. Send Real Queries

Do not just ping the /graphql endpoint. Send actual GraphQL queries that exercise your resolvers:

query MonitoringCheck {
  # Check user service
  currentUser {
    id
  }
  # Check product service  
  products(limit: 1) {
    id
    name
  }
  # Check order service
  recentOrders(limit: 1) {
    id
    status
  }
}

This single query validates three backend services. If any of them fail, the response will contain errors.

2. Validate Response Content

Check the response body for the presence of expected data and the absence of errors:

{
  "type": "graphql",
  "url": "https://api.example.com/graphql",
  "query": "query { systemHealth { status } }",
  "expectedBody": "\"status\":\"ok\"",
  "notExpectedBody": "\"errors\"",
  "interval": 30
}

StatusApp’s GraphQL monitor type supports this natively — send a query, check the response, alert on unexpected content.

3. Monitor Multiple Query Types

Create separate monitors for different query categories:

Health check query (30-second interval):

query HealthCheck {
  systemHealth {
    database
    cache
    search
    fileStorage
  }
}

Critical business query (60-second interval):

query CriticalPath {
  me { id }
  featuredProducts(limit: 1) { id }
  cart { itemCount }
}

Complex query performance (5-minute interval):

query PerformanceCheck {
  products(
    filter: { category: "test" }
    sort: { field: CREATED_AT, direction: DESC }
    limit: 10
  ) {
    id
    name
    variants { id price }
    reviews(limit: 5) { rating }
  }
}

4. Track Response Time by Query Complexity

GraphQL query performance varies dramatically based on complexity. A simple field lookup might take 10ms while a deeply nested query with multiple joins takes 2 seconds.

Monitor queries at different complexity levels:

Simple: Single-level queries (target: under 100ms)
Medium: 2-3 levels of nesting (target: under 500ms)
Complex: Deep nesting with pagination (target: under 2 seconds)

5. Monitor Schema Changes

Unexpected schema changes can break clients:

query SchemaCheck {
  __schema {
    queryType { name }
    mutationType { name }
  }
  __type(name: "User") {
    fields { name }
  }
}

If a field disappears from the schema, this query’s response will change, and your content validation will catch it.

GraphQL-Specific Failure Modes

N+1 Query Problems

GraphQL makes it easy to write queries that cause N+1 database queries on the backend. A query that works fine with 10 items might time out with 1000 items.

Monitor representative queries at realistic data volumes, not just against empty test databases.

Resolver Timeouts

Individual resolvers can time out while others succeed, resulting in partial data. Monitor for the presence of errors in responses, not just for the presence of data.

Rate Limiting

GraphQL’s flexible queries can consume wildly different amounts of server resources. A single complex query might consume more resources than 100 simple ones. Monitor for rate limiting errors:

{
  "errors": [{
    "message": "Query complexity exceeds limit",
    "extensions": { "code": "QUERY_COMPLEXITY_LIMIT" }
  }]
}

Authentication and Authorization Failures

A GraphQL API might be accessible but return null data due to auth issues:

{
  "data": {
    "currentUser": null
  }
}

This is technically a valid response, but if your monitoring query should return a user, a null value indicates an auth failure. Validate specific expected values in your monitoring queries.

Dedicated Health Check Endpoint

Consider creating a dedicated health query that checks all backend services:

type SystemHealth {
  database: ServiceStatus!
  cache: ServiceStatus!
  search: ServiceStatus!
  fileStorage: ServiceStatus!
  emailService: ServiceStatus!
  paymentGateway: ServiceStatus!
}

enum ServiceStatus {
  OK
  DEGRADED
  DOWN
}

type Query {
  systemHealth: SystemHealth!
}

This gives your monitoring a single query that validates your entire backend stack.

Monitoring GraphQL Subscriptions

If you use GraphQL subscriptions (WebSocket-based real-time data), standard HTTP monitoring will not cover them. You need:

WebSocket connection monitoring: Can clients establish a connection?
Subscription delivery monitoring: Are messages being delivered within expected latency?
Connection stability: Do connections stay alive or drop frequently?

For WebSocket monitoring, use TCP monitors to verify the WebSocket port is accepting connections, and heartbeat monitors within your subscription handler to verify messages are flowing.

Recommended Setup

Monitor	Type	Query	Interval
Health check	GraphQL	systemHealth query	30 sec
Auth validation	GraphQL	currentUser query	60 sec
Critical queries	GraphQL	Business-critical paths	60 sec
Complex queries	GraphQL	Performance-sensitive queries	5 min
Schema integrity	GraphQL	Introspection query	15 min
WebSocket endpoint	TCP	Connection check	60 sec

StatusApp’s GraphQL monitor type makes this straightforward: define your query, set expected response content, and get alerted when things break.

Monitor your GraphQL APIs properly. Start with StatusApp — purpose-built GraphQL monitoring included on all plans.