Health checks

Overview

This document touches on health checks for Runway services deployed in Cloud Run.

Runway service owners can define 3 types of health check:

startup: determines if a container has started and is ready to receive traffic.
liveness: determines whether to restart a container. Depends on a successful startup probe.
readiness: determines whether a container should receive traffic. Unlike startup and liveness probes, a failing readiness probe removes the container from the load balancer without restarting it.

For information on defining probes, refer to the Runway manifest schema. You may also refer to Cloud Run’s guide for more information and GCP recommended best practices.

NOTE: All Cloud Run services have a default TCP startup probe which tries to open a TCP connection on the container port.

Readiness probes

Readiness probes differ from startup and liveness probes in two important ways:

initial_delay_seconds is not supported.
success_threshold is supported (unique to readiness probes).

`failure_threshold` limit

Cloud Run enforces a maximum failure_threshold of 3 for readiness probes. This is stricter than startup and liveness probes, which allow much higher values.

Setting failure_threshold above 3 will cause a deployment error:

failure_threshold must be a number between 0 and 3.

A typical readiness probe configuration looks like:

spec:
  readiness_probe:
    path: /-/readiness
    period_seconds: 5
    timeout_seconds: 2
    failure_threshold: 3  # maximum allowed by Cloud Run
    success_threshold: 1

If your service uses LabKit v2, the /-/readiness endpoint is registered automatically by the httpserver package and runs all registered checks concurrently.

Examples

Example of services using HTTP probes

Example of services using readiness probes

Secrets Manager (OpenBao)

Example of services using gRPC probes

secret detection