Cloudflare Integration
Summary
Currently any Runway services that have an external load balancer enabled are exposed to the Internet with DDoS protection provided by Google’s Cloud Armor.
If service owners would prefer to have their service fronted by Cloudflare, there is currently no self-service option and it would have to be configured out-of-band (for example: docs.runway.gitlab.com is configured in config-mgmt).
This blueprint outlines how Runway will implement Cloudflare integration so that service owners can enable their services to be protected by Cloudflare in a self-service fashion.
Motivation
Runway does not currently provide self-service functionality to have services fronted by Cloudflare and does not meet production readiness criteria for WAF.
Cloudflare is the CDN that sits in front of most of our production services, so the Runway platform should provide a frictionless experience for service owners to have their services fronted by Cloudflare.
Example Runway workload interested in this feature: GitLab Secrets Manager
Goals
- To allow service owners enable their service to be fronted by Cloudflare.
- To protect Runway endpoints with a standard set of WAF rules.
Non-goals
-
Ability to use a segregated Cloudflare zone for each Runway service instead of a shared zone.
Rationale: security concerns from having a Cloudflare API token with the ability to create/destroy zones in the GitLab account (unable to lock it down to a certain pattern of zones).
-
Ability to customize WAF rules.
Rationale: start with a standard set of WAF rules for all Runway workloads. This can be iterated on if some level of customization is required. If extensive customization is required for a service, it would likely warrant deploying an out-of-band isolated Cloudflare zone for the service.
-
Ability to set rate limits.
Rationale: this is arguably a desirable feature but to keep the first iteration small, this will be added later once requirements are better understood.
Design
Shared Cloudflare Zone
We will create a Cloudflare zone shared by all Runway workloads (staging and production):
svc.gitlab.net
This Cloudflare zone will be created in config-mgmt
and imported into
provisioner. The reason for not creating the zone directly in provisioner is
that we would have to give provisioner fairly permissive Cloudflare API token
access in order to manage zones. Instead, we let config-mgmt create the zone,
which does have a permissive Cloudflare API token, and we import it into
provisioner.
DNS endpoints:
- Production:
<service>.svc.gitlab.net
- Staging:
<service>.staging.svc.gitlab.net
The origin for these services will be their existing
<service>[.staging].runway.gitlab.net
.
Provisioner will configure any WAF rules that apply to all services. The WAF rules to be configured are TBD, but we will be working closely with the Foundations team to establish the rules.
Provisioning
Service owners will automatically get a Cloudflare protected endpoint for their
workload if they have an external load balancer enabled. If needed, this can be
disabled. For example: AI gateway would disable the Cloudflare protected
endpoint as this service is routed via Cloud Connector so it does not need a
.svc.gitlab.net
endpoint.
TLS
Advanced Certificate
Manager
(paid add-on) will be enabled for svc.gitlab.net
and we will use Total TLS to
issue certificates for each proxied hostname.
Restricting inbound traffic
Reconciler will determine whether to restrict inbound access to their Runway workload from Cloudflare only based on the following flowchart:
Notes
spec.network_policies.cloudflare
is arunway.yml
setting that takes highest precedence. If enabled, inbound traffic to the load balancer will be restricted to Cloudflare.- The logic behind “Service is part of the list of existing services?” is
that we do not want to cause an outage for existing services by restricting
inbound traffic as existing services are currently receiving traffic directly
to the load balancer. Service owners will need to migrate traffic to their
.svc.gitlab.net
Cloudflare endpoint then enablespec.network_policies.cloudflare
.
Adding proxied DNS records
Reconciler will use a Cloudflare API token with DNS read & write access to the
svc.gitlab.net
zone only for the purposes of managing proxied DNS records as
shown in the following diagram:
Observability
We will add a new instance of
cloudflare-exporter
to scrape the shared zone. Similar to Cloud
Connector,
we can leverage cloudflare_zone_firewall_events_count
to alert on anomalies.