Patrick Evans

Highly Available DNS for Home Network - 11/23/2023

A recent project I worked on was improving the fault tolerance of my home network, specifically DNS. Previously, I was running a single instance of Pi-hole, which filters out unwanted DNS queries, and forwarded the rest to my upstream Windows Domain Controllers with integrated DNS. From there, queries go out to a public resolver. This approach had a few drawbacks. Two issue stemmed from that the Pi-hole instance was running bare-metal on a Raspberry Pi, which while usually reliable, was not tolerant of hardware issues. Patching the Raspberry Pi or rebooting it for other reasons would also cause a DNS service outage, which was undesirable. The Raspberry Pi was also being used for other services which occasionally could introduce undesirable system load. Another issue I could often encounter, if the upstream Windows DNS servers stopped responding to queries, Pi-hole cached these failed lookups, which would persist even after the issue with the upstream Windows servers was resolved and required a service restart. The solution I designed is pictured below. DNS queries are now sent to a single IP address (192.168.1.2), provided via DHCP, which is a load balanced IP address on my ADC (now NetScaler again) VPX appliance, pointed at two Pi-hole instances.

NetScaler Configuration

Getting a NetScaler instance up and running is actually pretty easy, since as of v12.1, Citrix offers a Freemium licensing option, which is bandwidth restricted to 20 Mbps and doesn't provide access to certain features like GSLB or Citrix Gateway, but neither limitation is an issue for this use case. Configuring a simple load balancer for servers on a NetScaler isn't particularly difficult and many general guides exist. At a high level, you need to:

- Define the servers that will provide the DNS service.

- Define a Load Balancing Service Group containing those servers.

- Define a Load Balancing Virtual Server, with a Virtual IP listening at the IP address you'll be pointing clients to, and bind the above Service Group.

Additionally, you can bind a monitor for the Service Group to ensure DNS lookups function properly, rather than servers just responding to pings or other simple health checks. I configured a DNS monitor with the parameters shown on the right, specifically to query for my local domain name, and ensure it resolves to one of the IP addresses of my domain controllers. Multiple IP addresses can be added to the list to be considered a valid response. Don't forget to save your changes since they won't persist through reboot otherwise!

Pi-hole Container Setup

The upstream Pi-hole instances are configured with Docker Compose, deployed as containers on a Docker Swarm cluster, and managed via Portainer. I opted for Docker Swarm over a more complex tool like Kubernetes given the relatively low complexity of this project's requirements. I may follow up with migrating these containers to being managed with Kubernetes in the future. Creating a Docker Swarm and joining nodes to it is fairly straightforward, and Docker's own documentation is pretty great for those steps (link). Managing these Pi-hole containers via Docker Compose and deploying them to the cluster was more complex since not a lot of reference documentation existed. To the side is the Docker Compose YAML used for this. A couple things to note about the Compose file:

This is running in replicated mode with the intent to be deployed to two specific nodes. This is handled via the settings under "deploy". Specifically note the requirement of the target nodes requiring the label of "pihole==true". This can be set via command line from the Swarm leader: "docker node update --label-add pihole=true <node id>"

I'm directly publishing the container's ports to the corresponding ports on the host. This will use direct volumes for storage on the nodes, rather than bind mounts.

Most of the Pi-hole settings are configurable via the Compose file. However not all of them are, particularly custom defined Allow/Blocklists entries, Client Group Management, and others. For these settings, I recommend exporting/importing via the Teleporter backup feature under the settings page. These will be stored in the "pihole.etc" volume.

Example Compose File

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
version: '3.8'
services:
  pihole:
    image: pihole/pihole:latest
    deploy:
      mode: replicated
      replicas: 2
      update_config:
        delay: 30s
      placement:
        max_replicas_per_node: 1
        constraints: [node.labels.pihole==true]
      restart_policy:
        condition: on-failure
        max_attempts: 3
        delay: 30s
        window: 120s
    ports:
      - target: 53
        published: 53
        protocol: tcp
        mode: host
      - target: 53
        published: 53
        protocol: udp
        mode: host
      - target: 80
        published: 80
        protocol: tcp
        mode: host
    environment:
      DHCP_ACTIVE: 'false'
      DNSMASQ_LISTENING: 'all'
      DNS_BOGUS_PRIV: 'true'
      DNS_FQDN_REQUIRED: 'true'
      PIHOLE_DNS_: '192.168.1.5;192.168.1.6;fe80::b1f2:c67d:5464:e10f;fe80::f576:da56:d322:4dc'
      REV_SERVER: 'true'
      REV_SERVER_CIDR: '192.168.0.0/16'
      REV_SERVER_TARGET: '192.168.1.5'
      REV_SERVER_DOMAIN: 'domain.lan'
      TZ: 'America/Chicago'
      WEBTHEME: 'default-dark'
    volumes:
      - pihole.etc:/etc/pihole/
      - pihole.dnsmasqd:/etc/dnsmasq.d/
    networks:
      - host
networks:
  host:
    external: true
volumes:
    pihole.etc:
    pihole.dnsmasqd:

Takeaways

Advantages:

- A single IP to point to for DNS queries reduces network complexity

- When defining two DNS servers for clients, clients only fail over to the secondary one if the first is unavailable. This allows DNS queries to be consistently balanced between both Pi-Hole nodes and reduce system load.

- The custom DNS monitor ensures my upstream Windows servers are answering domain queries with healthy responses.

- Pi-hole containers defined via YAML increases flexibility to deploy additional nodes if needed.

Disadvantages:

- The primary disadvantage of this setup is a single point of failure remains, with the NetScaler node being the listening IP for DNS queries. I found this risk to be tolerable though since I dont use my NetScaler VPX for other purposes, and its significantly more stable by comparison to the DNS servers themselves.

In the future, I'd like to further investigate maintaining between synchronicity between the Pi-Hole docker nodes. I plan to do this with either the handy gravity-sync tool by vmstan, and/or by using shared storage for the volumes being used by the docker nodes.