redundancy | .random

A few weeks ago I decided to explore High Availability for my home network for routing. Initially I thought I’d put a couple Mikrotik’s between my Opnsense and my ISP’s hardware. However after thinking about it, decided to just run a secondary Opnsense as a VM since the ISP hands out CGNAT addresses to all devices connected to their hardware. It turns out, the VM is even faster than my hardware router, so I might even migrate the primary to be virtualized too.

First I thought it would be pretty simple, just turn on HA on the primary router and be done with it it. No, it’s not that simple…it’s never that simple. Because of the way I have my IPv6 setup, the VPN configuration from the primary gets copied over and goes connects from the secondary as well. As far as I can tell there’s no way to prevent this. So I whipped out my .plan file and went to work.

The first thing I did was create a new VPN tunnel for the second machine. Next I created new VXLANs between the routers and the ipv6 cloud endpoint and bridged them together so they all act as if they are on the same network. The ipv6 cloud node then routes traffic to a CARP address that’s shared with the routers.

For ipv4, I changed the primary router’s IP address and added the old default gateway address as a CARP address.

Individual ipv4/ipv6 TCP connections fail, for example, my SSH sessions disconnect, but otherwise the rest of the family doesn’t notice a thing now when I failover the routers.

This all really came in handy not to long ago when my physical router froze up while I was away. No one noticed that it had happened and cheerfully went along doing their school, play, and memeing. Once I was back home, I rebooted the hardware and no one noticed anything.

Other things I noticed…

When first bridging the VXLANs together, pinging the other router would go thru the ipv6 cloud node. Fixing this required enabling STP in netplan, setting the router’s port id lower, and adding the vxlan interfaces as auto edge. Big note here, enabling STP for an interface in netplan also makes updating the interface impossible. The only way I found to enable updates to the netplan is to reboot the machine.

OPNSense will sync things like firewall rules, but the interfaces must have the same “internal identifier”. The easiest way I found to ensure this is to download a backup of the configuration and rename the interfaces. Note that this is not the same as the name you give an interface. Internal interface for names follow the pattern “optN” where N is the +1 of the last interface created. To simplify things, I changed the internal names to match the name I had given the port (eg vxlan99, br1, etc). This really tripped me up at first as the rules were being applied to the wrong interfaces. This made syncing the HA stuff a nightmare as the link kept failing after because the firewall suddenly stopped accepting traffic from the other router on the syncing interface.

Things I wish were different

As alluded to earlier, I thought OPNSense HA would just simply sync settings and the secondary would only be enabled if the primary failed. That is not the case and I’m not sure why they let you sync things that would have to be specific to each router like vpn profiles or interface settings. I could be missing something though.

It would be nice to have some ability to map things to stuff on the other side, have a finer grain control of what can get synced, or some sort of ability to change a setting for the secondary. For example CARP settings can be synced…but what’s the point if the settings are the same and each node is taking over the IP all the time? It would be nice to be able to have some settings like the advertisement interval be changed or automatically assigned so that the secondary node will always be second.

I’ve been finding that running DNS on my “NAS” isn’t the best of ideas and I’d like to have a “highly available” DNS system. Nothing’s quite worse than getting a call from the Mrs complaining that the internet isn’t working while you’re in the middle of a system update or because the cat stepped on a power switch. (Racking everything is another long term goal.)

I have been using Technitium’s DNS Server for my network. It feels light and snappy, lets me do unholy things with DNS, has really good filtering abilities. The problem I have with it though is there is no ability to configure one and it’s peers update from that too. There’s no replication going on (though according the github, it’s in the works).

Which is why I like ADDS DNS, records are replicated to all nodes and it just works(tm). So I light up some Server Cores and setup my own little forest.

The resulting DNS structure is very simple. On each of my Proxmox servers, I run a copy of Technitium and a ADDS/DNS server. The ADDS/DNS is pointed at each of the Technitium for forward lookups which in turn look to Quad9 and Google (I know I know, but they seem to run a pretty decent DNS resolver) for their own forward lookups.

Going forward, I’d like to get the DHCP integrated into the DNS servers so hostnames are updated. My network is fairly small so manually adding records for the important things isn’t too much of a hassle. If I had a Proxmox cluster and Ceph pool setup, I’d probably have forgone this, but I don’t have either….yet.

Post draft update

We lost power this last week and the main Proxmox server is out of action which means that I’ve lost half my DNS servers now. Except for the little hiccup in that I had forgotten to add Technitium2 to the forwarding list on ADDS/DSN2, this thing has worked perfectly.

.random

Tag: redundancy

root# net install ha

Other things I noticed…

Things I wish were different

HA DNS in the Home

Post draft update