A few weeks ago I decided to explore High Availability for my home network for routing. Initially I thought I’d put a couple Mikrotik’s between my Opnsense and my ISP’s hardware. However after thinking about it, decided to just run a secondary Opnsense as a VM since the ISP hands out CGNAT addresses to all devices connected to their hardware. It turns out, the VM is even faster than my hardware router, so I might even migrate the primary to be virtualized too.
First I thought it would be pretty simple, just turn on HA on the primary router and be done with it it. No, it’s not that simple…it’s never that simple. Because of the way I have my IPv6 setup, the VPN configuration from the primary gets copied over and goes connects from the secondary as well. As far as I can tell there’s no way to prevent this. So I whipped out my .plan file and went to work.
The first thing I did was create a new VPN tunnel for the second machine. Next I created new VXLANs between the routers and the ipv6 cloud endpoint and bridged them together so they all act as if they are on the same network. The ipv6 cloud node then routes traffic to a CARP address that’s shared with the routers.
For ipv4, I changed the primary router’s IP address and added the old default gateway address as a CARP address.
Individual ipv4/ipv6 TCP connections fail, for example, my SSH sessions disconnect, but otherwise the rest of the family doesn’t notice a thing now when I failover the routers.
This all really came in handy not to long ago when my physical router froze up while I was away. No one noticed that it had happened and cheerfully went along doing their school, play, and memeing. Once I was back home, I rebooted the hardware and no one noticed anything.
Other things I noticed…
When first bridging the VXLANs together, pinging the other router would go thru the ipv6 cloud node. Fixing this required enabling STP in netplan, setting the router’s port id lower, and adding the vxlan interfaces as auto edge. Big note here, enabling STP for an interface in netplan also makes updating the interface impossible. The only way I found to enable updates to the netplan is to reboot the machine.
OPNSense will sync things like firewall rules, but the interfaces must have the same “internal identifier”. The easiest way I found to ensure this is to download a backup of the configuration and rename the interfaces. Note that this is not the same as the name you give an interface. Internal interface for names follow the pattern “optN” where N is the +1 of the last interface created. To simplify things, I changed the internal names to match the name I had given the port (eg vxlan99, br1, etc). This really tripped me up at first as the rules were being applied to the wrong interfaces. This made syncing the HA stuff a nightmare as the link kept failing after because the firewall suddenly stopped accepting traffic from the other router on the syncing interface.
Things I wish were different
As alluded to earlier, I thought OPNSense HA would just simply sync settings and the secondary would only be enabled if the primary failed. That is not the case and I’m not sure why they let you sync things that would have to be specific to each router like vpn profiles or interface settings. I could be missing something though.
It would be nice to have some ability to map things to stuff on the other side, have a finer grain control of what can get synced, or some sort of ability to change a setting for the secondary. For example CARP settings can be synced…but what’s the point if the settings are the same and each node is taking over the IP all the time? It would be nice to be able to have some settings like the advertisement interval be changed or automatically assigned so that the secondary node will always be second.