Directing Traefik
Table of Contents:
My homelab has changed a lot over the years, and in no way is this more true than the reverse proxy sitting at the core of it all. As hosted services have sprawled, some access requirements widened to the WAN and Tailscale, I’ve wrestled with too many reverse proxies to try and get a stable and scalable solution out of them. Caddy served my homelab well for a good while1 but ultimately it’s been Traefik that I’ve settled with long term. I want to take you on the journey I had setting up Traefik, guided by some goals for exposing our homelab services and implementing them using Traefik.
Setup, direction, and pre-reqs
So why Traefik?
I admit I’m somewhat biased, the majority of my homelab is still centred around Docker, and Traefik is honestly unmatched for integration with the Docker ecosystem. Config can be defined centrally with per container (or stack) config defined in the compose labels themselves. This allows us to define how a service is accessible entirely from the compose file of the service itself. It does have a bit of a learning curve, and Caddy is genuinely a much simpler solution, but for a Docker centric homelab I consider it a worthwhile endeavour. As we go to accomplish some of the later goals on our list, Traefik router system makes these much more viable at the cost of a steeper learning curve overall.
Goals and assumptions
So starting off, we will assume that the state of our homelab currently satisfies the assumptions:
- We are running some services in Docker containers.
- There is some way to access our machine from the WAN via a domain name that we own (this could be a static IP with port forwarding, a CloudFlare tunnel2, or however you would like to expose your machine to the WAN).
- The ability to locally control DNS, and a record for your domain pointing to your public IP (if not using tunnels).
Installing Docker, setting up services, and forwarding ports if required are beyond the scope of this post. You should be able to find plenty of information on these online if you get stuck, or want to learn more about any of these assumed areas. Also a fair warning, we will be using split horizon DNS to achieve a public/private access model for our homelab. I fully recognise this isn’t for everyone, it’s just the mechanism that we’ll be making use of in this post.
Using Traefik, we will aim to achieve:
- Have a mechanism to mark a Docker service as private, or internet facing.
- Require only minimal config per service, with most config defined centrally.
- Avoid leaking information about what private services are running.
Local DNS
With our domain name pointing to our public IP address, or nothing at all if we’re using tunnels, we need to do some additional setup. Lets assume the public IP address model for now. When you visit your own domain from inside your homelab, your machine is going to try to access this through your external IP address. For a router like (Opn/Pf)Sense, you may have some NAT hairpin which automatically sends this off to your homelab machine, but this often isn’t the case with consumer routers. As our Traefik access paradigm hinges the IP accessing the resource, we need to set up some local DNS.
Split-horizon DNS just means that our domain is going to resolve to a different IP depending on if we are inside our LAN, or outside on the wider internet. If your router supports it, this can be accomplished by adding a custom DNS rule for toaster.dog pointing to the IP of your server. If your router doesn’t support override rules directly, you can spin up Docker containers for something like PiHole or Bind9 to achieve the same goal. Again the details on how to do this are outside of this posts scope, but provided you are able to run nslookup toaster.dog and get a response of your servers IP address, then you’re all good to continue.
For an added benefit, even without opening up your homelab to the LAN at all, you’d now be able to use this domain name in place of your machine IP. For instance, if you were running Jellyfin on 192.168.0.200:5000, provided you have a rule mapping toaster.dog -> 192.168.0.200, you could now access it from toaster.dog:5000. We can’t fully make use of subdomains till later (which would be something like jellyfin.toaster.dog), but it’s a lot nicer than just using it’s IP every time3
Traefik config
The naive approach
We’ll start off with exposing a service “caddy”, which will run a webpage. Following any “get started with Traefik” guide, you’ll end up with a Traefik config and compose file looking something like this:
# traefik.yml
entrypoints:
websecure:
address: :443
# traefik_dynamic_config.yml
http:
middlewares:
homelabIpAllowList:
ipAllowList:
sourceRange:
- "192.168.0.0/24"
# caddy compose file
labels:
- traefik.enable=true
- traefik.http.routers.caddy.service=caddy
- traefik.http.routers.caddy.tls=true
- traefik.http.routers.caddy.tls.certresolver=letsencrypt
- traefik.http.routers.caddy.entrypoints=websecure
- traefik.http.routers.caddy.rule=Host(`blog.toaster.dog`)
- traefik.http.routers.caddy.middlewares=homelabIpAllowList # Only present if the service is private only
- traefik.http.services.caddy.loadbalancer.server.port=80
If you wanted this service to be private, a solution that I found and originally used was simply adding && ClientIp(`192.168.0.0/24`). Very simple to get off the ground, however there are a few issues here. There’s a lot of boilerplate required for each and every service, and there’s a much more opaque issue involving our goal of avoiding leaking information.
When blog.toaster.dog is marked as private with homelabIpAllowList, a random user on the internet is correctly not able to see the service. However, visiting a private website will give the attacker a 403 Forbidden HTTP response, but if they visit fake.toaster.dog no router at all will match, and they will get a 404 Not Found HTTP response.
A visitor from the WAN may not be able to see what is running on that domain, but they know something is running on that domain. Maybe somewhat pedantic, but it’s something we can avoid, so lets try a different approach.
A better solution
Something worth mentioning before we try our next solution is that Traefik can have as many entrypoints as you like, and each entrypoint does not necessarily have a port exposed on the host machine. By defining additional Traefik entrypoints without opening the port on the Docker container, we have endpoints which can only be routed to by other Traefik endpoints. Consider our new Traefik config file:
# traefik.yml
entrypoints:
web:
address: :80
http:
redirections:
entryPoint:
to: websecure
scheme: https
websecure:
address: :443
private:
address: :621 # NOT open on the host
public:
address: :666 # NOT open on the host
All we’ve done here is define two new entrypoints, public and private. Their associated ports are not open on the host machine, we only open up :80 and :443. To use these endpoints, we need to route to them from other endpoints, and we will do so using a “front door” style setup on websecure. Let’s now consider the associated dynamic config for Traefik:
# traefik_dynamic_config.yml
http:
routers:
catch_all_private:
entryPoints:
- "websecure"
tls:
certResolver: "letsencrypt"
rule: HostRegexp(`^.*.?toaster\.dog$`) && (ClientIP(`192.168.0.0/24`) || ClientIP(`192.168.1.0/24`))
service: "private_entrypoint"
priority: 9001 # Ensure it matches before anything else
catch_all_public:
entryPoints:
- "websecure"
tls:
certResolver: "letsencrypt"
rule: HostRegexp(`^.*.?toaster\.dog$`)
service: "public_entrypoint"
priority: 9000 # Ensure it matches right after private
services:
private_entrypoint:
loadBalancer:
servers:
- url: http://127.0.0.1:621
public_entrypoint:
loadBalancer:
servers:
- url: http://127.0.0.1:666
This setup now makes it such that the websecure endpoint only has one job: to pick the different between public and private traffic and send it off to the associated endpoint. The config on a given service now looks like this:
# caddy compose file
labels:
- traefik.enable=true
- traefik.http.routers.caddy.entrypoints=public # or `private`
- traefik.http.routers.caddy.rule=Host(`blog.toaster.dog`)
- traefik.http.services.caddy.loadbalancer.server.port=80 # Caddy's port, unrelated to Traefik
Our per-service compose is now much cleaner. We only need to simply define the entrypoint as “public” or “private” to set the visibility of the service, and as we are on a second layer of routing, we don’t even need to define TLS behaviour. This is handled for us at the websecure layer, public is routed internally and flows on from whatever websecure was using. The information leakage problem is also now solved, as we are no longer using middleware which will throw a 403 Forbidden; services which do not match any rules, or private services attempting to be accessed public, both will now throw a 404 Not Found.
This does of course introduce one new problem, we currently can’t access our public services from our own homelab, as we will get sent to the private services without seeing public. This can be fixed easily by adding a third router, and ensuring it matches last as a final fallback:
# traefik.yml
http:
routers:
public_access_from_private:
entryPoints:
- "private"
rule: HostRegexp(`(?i)^.*.?toaster\.dog$`)
service: "public_entrypoint"
priority: 2 # Ensure it matches *last*
Extra credit
Fantastic, everything is stable and works finally! So let’s change it again!4 Everything onwards is a collection of extra bits and bobs which may be useful for Traefik.
Custom 404 page
Given we’re now making quite a lot of use of sending mismatched users a 404 response, lets serve them a better 404 page than what Traefik will give us by default. Assuming you have a webserver (in this case Caddy) with a 404 page URL, you can define labels to serve this in place of Traefik’s 404 response:
labels:
# Handle regular traffic to the domain without any subdomain (unrelated to 404 page)
- traefik.http.routers.caddy.service=caddy
- traefik.http.routers.caddy.entrypoints=public
- traefik.http.routers.caddy.rule=Host(`toaster.dog`)
- traefik.http.services.caddy.loadbalancer.server.port=80
# Define middleware to redirect to the 404 page
- traefik.http.middlewares.404-redirect.redirectregex.regex=^.*$
- traefik.http.middlewares.404-redirect.redirectregex.replacement=https://toaster.dog/404.html # Or whatever URL you like
# Define the catch all service for invalid requests
- traefik.http.routers.fallback.service=fallback
- traefik.http.routers.fallback.entrypoints=public
- traefik.http.routers.fallback.priority=1 # Matches absolutely and *strictly* last
- traefik.http.routers.fallback.rule=HostRegexp(`(?i)^.*.?toaster.dog$`)
- traefik.http.routers.fallback.middlewares=404-redirect
- traefik.http.services.fallback.loadbalancer.server.port=80
DNS challenge certificates
When first starting out with Traefik and you search “Traefik ACME certs”, you’re going to wind up with a config like this:
certificatesResolvers:
letsencrypt:
acme:
email: ""
storage: /ssl-certs/acme.json
caServer: "https://acme-v02.api.letsencrypt.org/directory"
httpChallenge:
entrypoint: web
This is the simplest way to get TLS certificates working on your domain, where it lets you prove you own a given domain and thus are allowed to serve a TLS cert for it. This does have a few limitations, not necessarily problems, but things we may want to address:
- You must have port 80 open to complete the ACME challenge. This isn’t necessarily an issue, but it’s a caveat to be aware of. You may find yourself keeping port 80 open on your WAN for the sole reason of permitting certificate renewal in this fashion.
- Certificates obtained using an ACME HTTP challenge do not support wildcard certificates, certificates are always a 1:1 relationship to the domain they match to.
Not having wildcard certificates means that every subdomain on our server, blog.toaster.dog and uwu.toaster.dog for example, require their own unique certificate to be registered. This is also the case for nested subdomains, so blog.staging.toaster.dog for example would also be in the same boat. The issue comes from our final fallback 404 router we defined earlier. We’ve allowed this router to match on HostRegexp(`(?i)^.*.?toaster.dog$`) which allows any subdomain at all of toaster.dog. This opens us up to a potential denial of service attack, where an attacker could make repeated requests to subdomains which we do not define, but will serve a response for (think 1.toaster.dog, 2.toaster.dog). Every novel subdomain requested would require a request to LetsEncrypt to solve a new challenge and get a new certificate. If performed at scale, this can quickly have us running up against a rate limit.
What we would like to do is be able to serve a single certificate that is valid for the entirety of our subdomains, so *.toaster.dog. And we can do exactly that using DNS challenge based certificates. The config can look something like this:
certificatesResolvers:
prod-letsencrypt-dns:
acme:
email: ""
storage: /ssl-certs/prod-acme-dns.json
caServer: "https://acme-v02.api.letsencrypt.org/directory"
dnsChallenge:
provider: cloudflare
resolvers:
- "1.1.1.1:53"
In the case of using Cloudflare I only need to provide an email and token to Traefik using the CF_API_EMAIL and CF_DNS_API_TOKEN environment variables respectively. Setting the certificate resolver on your “front door catch all” router is now sufficient to serve this certificate for every subdomain on your homelab.
Additionally, note the last line of this config where we manually specify the DNS resolver as 1.1.1.1:53, or Cloudflare’s DNS server. You don’t need to use Cloudflare’s server of course, but it’s most definitely wise to specify any external DNS server. If your homelab machine uses your internal split-horizon DNS, your domain name is going to start resolving to a local IP, which completely bricks the ability for the challenge to function at all. In the best case it’ll just fail to set up for the first time, or in my case, you can perform the initial DNS challenge before setting up your own internal DNS. Giving you a nice little 4 month ticking time bomb for your certificate to time out and not be able to renew itself.
Regardless, with the above config you should now be able to serve one certificate for every subdomain associated with your homelab. If you were to open up a nested subdomain, like *.staging.toaster.dog this would require a new certificate however, as wildcard certificates only cover one level of subdomains. Notably too, if a nested subdomain is requested that Traefik has not already had defined, out of the box it will simply serve a self signed certificate instead of attempting to resolve this with a DNS challenge. This effectively resolves our issue with denial of service, but means end users will see an “untrusted certificate” page if they try to access nested subdomains. You could fix this by limiting the regular expression on our “front door” router to something like: HostRegexp(`^[^.]+\.(?:staging\.)?toaster.dog$`), which now will only match *.toaster.dog and *.staging.toaster.dog.
TCP passthrough
My partner (who also writes over on blog.laven.dev) also has been afflicted with the curse of homelabbing, and working under the same roof has forced me to learn a bit more of what you can do in Traefik, namely full TCP passthrough. Without using something like a VPN or Cloudflare tunnel, WAN traffic coming in on :443 can only be directed at an IP, the router can’t pick the difference between their traffic and mine. Considering a scenario where my machine sits in front of theirs, I also did not want to manage their certs for them. Partially because I’d be MITM’ing their traffic, but also because they need the full experience of managing their reverse proxy, debugging hell builds character after all.
Thankfully, as is becoming a trend here, Traefik does make this pretty easy to achieve, and we can start with:
tcp:
routers:
lavendev:
rule: HostSNIRegexp(`^.*\.?laven\.dev$`)
entryPoints:
- "websecure"
service: "lavendev_passthrough"
tls:
passthrough: true
services:
lavendev_passthrough:
loadBalancer:
servers:
- address: "192.168.0.222:443" # The IP of their machine on the LAN
Of course, at this point we’re just passing through all traffic. We don’t get the “private” and “public” behaviour that we do on our main setup, and this now presents the main issue here. When their box received the traffic from my machine in front, the ClientIP is always going to be the IP of my machine, and thus always appear to be coming as private traffic from the LAN. Searching around, I tried a solution with setting the X-Forwarded-For header but this is unfortunately not usable under the rules: segment of the router config.
It’s a bit of a hacky config, but for now we’ve made do with my machine making the decision on if traffic is public or private, then passing it straight to their public and private entrypoints. This is effectively the same as our original solution, but the two entrypoints need to have their ports open on the machine, and of course not everything is handled on the one machine.
Thankfully they’re an AV technician, and care a significant amount less about the specifics of the implementation and more about “does it actually work though”, so our boxes still happily use this setup.
tcp:
routers:
lavendev_passthrough_private:
rule: HostSNIRegexp(`^.*\.?laven\.dev$`) && (ClientIP(`192.168.0.0/24`) || ClientIP(`192.168.1.0/24`))
entryPoints:
- "websecure"
service: "lavendev_passthrough_private"
tls:
passthrough: true
lavendev_passthrough_public:
rule: HostSNIRegexp(`^.*\.?laven\.dev$`)
entryPoints:
- "websecure"
service: "lavendev_passthrough_public"
tls:
passthrough: true
services:
lavendev_passthrough_private:
loadBalancer:
servers:
- address: "192.168.0.222:621" # Open port on their machine
lavendev_passthrough_public:
loadBalancer:
servers:
- address: "192.168.0.222:666" # Open port on their machine
-
This very page is still being served with Caddy! ↩︎
-
Depending on what you’re hosting you may wish to be cautious about Cloudflare ToS and potentially avoid this option. Or like me, you may just not be a fan of MITMaaS. Either which way. ↩︎
-
It also means that you can change the IP later with minimal impact. Setting up your Jellyfin for instance to point to
toaster.dog:5000means you can change the IP later in DNS without changing anything on your TV, or other client devices. ↩︎ -
This is best paired with making changes at 11PM on a work night, really keeps life interesting. ↩︎