Incident Response

Make it stop

Flux

We use Flux to periodically reconcile the Kubernetes cluster state with the declarative intended state. During incident response, this can undo ad-hoc fixes, preventing mitigation of the incident. In this situation, reconciliation can be suspended with the flux CLI:

flux suspend kustomization "${RUNWAY_SERVICE_ID:?}" -n runway-provisioner

You can then change cluster resources, e.g. with kubectl patch, without Flux undoing these changes shortly after. However, other controllers are still active and may interfere with your chagnes, such as the “Vertical Pod Autoscaler” which controls a Pod’s memory allocation.

After the incident, make sure to re-enable Flux reconciliation. This is also done using the Flux CLI:

flux resume kustomization "${RUNWAY_SERVICE_ID:?}" -n runway-provisioner