Infrastructure Upgrade Woes: Traefik Pass-Through Routing Broke ACME
I've been doing some upgrades on my home IT infrastructure. Twice recently I've been “got” by unfortunate backward-incompatible changes that were made in software I rely on. This one's in Traefik, which is otherwise pretty stable and reliable.
My certificates started expiring. That's when I noticed. The automatic renewal attempts were failing.
Briefly, the behaviour change affects Traefik “routers” (route configurations) that are intended as a full pass-through. When I originally set these up, the traffic they passed through included ACME challenges used for issuing TLS/SSL certificates. Some of these routes are routing a whole block of domain name-space to further Traefik instances which handle specific subdomains; some are routing to VMs running other services such as YUNoHost; and all these expect to manage their own certificates through the ACME protocol.
When I updated to or beyond Traefik v2.11.2 or v3.0.0-rc4, the ACME handling in the downstream services (including those “further Traefik instances”) stopped working. They issued certificate requests to the LetsEncrypt service, which accordingly sent a challenge, but they did not receive the challenge and so were denied a certificate. Because the primary Traefik instance had started intercepting ACME challenges even on routers defined with the “passthrough” option.
This backwards-incompatible change was reported by someone in May 2024. Traefik decided not to revert it but to work on a new option to be able to select the old behaviour. The new option, AllowACMEByPass
was released in September 2024 in versions v2.11.9 with a bug fix in .10; and as far as I can see in v3.1.3 (corresponding bug fix in .4) although here it is noted in the change log only as “Merge v2.11 into v3.1”.
Two Kinds of Pass-Through
Now there are two kinds of “passthrough” behaviour available:
- pass-through where ACME TLS challenge is handled by (this instance of) Traefik;
- pass-through where ACME TLS challenge is passed through along with the rest of the traffic.
I am not yet clear about the effects, if any, on non-passthrough routers, on non-TLS routers, and/or on any other kinds of ACME challenge. (HTTP challenge was already deprecated/withdrawn a little while ago so there may be no others relevant.)
Requesting a certificate via ACME of course requires knowing what domains to request. For the first kind (the new kind) of passthrough, this instance of Traefik has to know the domains it is routing. But I had used HostRegex
here, to pass through all possible subdomains of a domain; only my downstream Traefik or YUNoHost instance knew exactly what domains these were. Therefore I have to either get Traefik to revert to the old behaviour, or let the upstream Traefik know the exact set of domains.
The new AllowACMEByPass
option is a behaviour switch which can be specified per entry-point. Presently, it lacks sufficient documentation about how it affects each router.
Requesting Clear Documentation
I have added the following comment on issue #10684, Let's encrypt TLS Challenge failing when behind a traefik TCP Router:
The documentation for this new behaviour switch could do with being clearer.
As background, in my case I am trying to configure Traefik with some routers doing a full pass-through including ACME pass-through (some to further Traefik instances, some to other services that manage their own ACME). This was working in a previous Traefik version, I'm just trying to restore the previous behaviour. I've read the issue, describing why and when the behaviour was changed. What I'm missing is a precise description. I don't ask here just to get a one-off answer for my particular case today, I want to ask that the docs be clarified so it's clear for everyone every time.
So far it seems there is just this documentation of AllowACMEByPass
entrypoint option, and some abbreviations of that in a couple of other places.
It says it determines if “a [...] router can handle”... but it is an entrypoint option so it applies to many routers. It needs to state the rule that determines what kind of handling applies to each specific router. In other words, I don't just want to know how to configure Traefik such that “a (unspecified) router can handle” but I want to decide which particular routers will pass through ACME and which will handle it for me. (I gather from previous reading around, that the decision is also related to whether a certificate resolver is configured at all (globally) and/or on the router specifically; but it needs to be stated as a precise rule.)
Clarify which types of routers this affects. (I happen to know it's particularly useful for TLS pass-through routers, as that's my use case. It ought to make clear whether it also affects non-TLS and/or non-passthrough routers.)
There are now two kinds of TLS pass-through: where ACME is handled by (this instance of) Traefik, and where ACME is passed through along with the rest of the traffic. The documentation of router option
passthrough
should now state that there are these two kinds, and should say (or point to documentation on) how to select the desired kind.
For the record, this is what the documentation said at the time when I posted this message:
AllowACMEByPass¶
Optional, Default=false
allowACMEByPass
determines whether a user defined router can handle ACME TLS or HTTP challenges instead of the Traefik dedicated one. This option can be used when a Traefik instance has one or more certificate resolvers configured, but is also used to route challenges connections/requests to services that could also initiate their own ACME challenges. No Certificate Resolvers configuredIt is not necessary to use the `allowACMEByPass' option certificate option if no certificate resolver is defined. In fact, Traefik will automatically allow ACME TLS or HTTP requests to be handled by custom routers in this case, since there can be no concurrency with its own challenge handlers.
Fixing It
In my case, where I already have certResolver
set to null on my passthrough routers, I was able to upgrade to v3.1.4 and add allowACMEByPass: true
on my HTTPS entrypoint. This seems to be doing what I want, reverting to the old behaviour of passing through ACME challenges on such routers.
# in static configuration (traefik.yml)
entryPoints:
web_https:
address: ":443"
allowACMEByPass: true
...
```yaml
# in dynamic configuration (a file in 'rules/' dir in my case)
tcp:
routers:
passthrough-1-https:
entryPoints:
- web_https
rule: HostRegexp(`...`)
service: "passthrough-1-https"
tls:
passthrough: true
certResolver: ~
Follow/Feedback/Contact: RSS feed · Fedi follow this blog: @julian@wrily.foad.me.uk · matrix me · Fedi follow me · email me · julian.foad.me.uk Donate: via Liberapay All posts © Julian Foad and licensed CC-BY-ND except quotes, translations, or where stated otherwise