root# net install ha

A few weeks ago I decided to explore High Availability for my home network for routing. Initially I thought I’d put a couple Mikrotik’s between my Opnsense and my ISP’s hardware. However after thinking about it, decided to just run a secondary Opnsense as a VM since the ISP hands out CGNAT addresses to all devices connected to their hardware. It turns out, the VM is even faster than my hardware router, so I might even migrate the primary to be virtualized too.

First I thought it would be pretty simple, just turn on HA on the primary router and be done with it it. No, it’s not that simple…it’s never that simple. Because of the way I have my IPv6 setup, the VPN configuration from the primary gets copied over and goes connects from the secondary as well. As far as I can tell there’s no way to prevent this. So I whipped out my .plan file and went to work.

The first thing I did was create a new VPN tunnel for the second machine. Next I created new VXLANs between the routers and the ipv6 cloud endpoint and bridged them together so they all act as if they are on the same network. The ipv6 cloud node then routes traffic to a CARP address that’s shared with the routers.

For ipv4, I changed the primary router’s IP address and added the old default gateway address as a CARP address.

Individual ipv4/ipv6 TCP connections fail, for example, my SSH sessions disconnect, but otherwise the rest of the family doesn’t notice a thing now when I failover the routers.

This all really came in handy not to long ago when my physical router froze up while I was away. No one noticed that it had happened and cheerfully went along doing their school, play, and memeing. Once I was back home, I rebooted the hardware and no one noticed anything.

Other things I noticed…

When first bridging the VXLANs together, pinging the other router would go thru the ipv6 cloud node. Fixing this required enabling STP in netplan, setting the router’s port id lower, and adding the vxlan interfaces as auto edge. Big note here, enabling STP for an interface in netplan also makes updating the interface impossible. The only way I found to enable updates to the netplan is to reboot the machine.

OPNSense will sync things like firewall rules, but the interfaces must have the same “internal identifier”. The easiest way I found to ensure this is to download a backup of the configuration and rename the interfaces. Note that this is not the same as the name you give an interface. Internal interface for names follow the pattern “optN” where N is the +1 of the last interface created. To simplify things, I changed the internal names to match the name I had given the port (eg vxlan99, br1, etc). This really tripped me up at first as the rules were being applied to the wrong interfaces. This made syncing the HA stuff a nightmare as the link kept failing after because the firewall suddenly stopped accepting traffic from the other router on the syncing interface.

Things I wish were different

As alluded to earlier, I thought OPNSense HA would just simply sync settings and the secondary would only be enabled if the primary failed. That is not the case and I’m not sure why they let you sync things that would have to be specific to each router like vpn profiles or interface settings. I could be missing something though.

It would be nice to have some ability to map things to stuff on the other side, have a finer grain control of what can get synced, or some sort of ability to change a setting for the secondary. For example CARP settings can be synced…but what’s the point if the settings are the same and each node is taking over the IP all the time? It would be nice to be able to have some settings like the advertisement interval be changed or automatically assigned so that the secondary node will always be second.

Routing ‘Round

I sit behind a IPv4 only CGNAT which is very annoying for self-hosting stuff or using TunnelBroker for IPv6. Sure, there’s options like Cloudflare’s Tunnel for the self-hosted service stuff, but that’s a huge ask on the trust. It also doesn’t address the lower level networking needed for 6in4/6to4/6rd/etc.

An important and something I initially overlooked was choosing a VPS host that has low latency with the TB service. The lower ping time from the VPS to TB, the lower impact on your IPv6 internet experience. For example, if your ping time from your home to VPS is 50ms and the ping time from the VPS to TB is 30, you’re going to have a total IPv6 ping of at least 80ms from your home.

I like Wireguard and I already have a v4 Wireguard network up and running, so I’m using that as the starting point. I have tried using Wireguard for all traffic before from my router without much success (likely a skill issue). Plus I find the whole “allowed-ips” a huge hassle. So why not just overlay a VXLAN? That way, I could treat the thing like a direct link and route any traffic over it without needing to worry about those dang “allowed-ips” settings.

#/etc/netplan/100-muhplan.yml
network:
  version: 2
  tunnels:
    # he-ipv6 configuration is provided by tunnel broker
    # routes: is changed slightly to use a custom routing table.
    # routing-policy is added to do source based routing so we don't interfere with host's own ipv6 connection
    he-ipv6:
      mode: sit
      remote: x.x.x.x
      local: y.y.y.y
      addresses:
        - dead:beef:1::2 # this endpoint's ipv6 address
      routes:
        - to: default
          via: dead:beef:1::1 #the other end of the tunnel's ipv6 address
          table: 205 #chose any id you want, doesn't really matter as long as it's not used.
          on-link: True
      routing-policy:
        - from: dead:beef:1::2/128 #same as this endpoint
          table: 205
        - from: dead:beef:2::/64 # the routed /64 network.
          table: 205
        - from: dead:beef:3::/48 # the routed /48 if you choose to use it.
          table: 205
        - from: dead:beef:3::/48 # put /48 to /48 traffic into the main table (or whatever table you want)
          to: dead:beef:3::/4
          table: 254
          priority: 10 #high priority to keep it "above" the others.
    # setup a simple vxlan
    # lets us skip the routing/firewall nightmare that wireguard can add to this mess
    vxlan101:
      mode: vxlan
      id: 101
      local:  a.a.a.1 #local wg address
      remote: a.a.a.2 #remote wg address (home router)
      port: 4789
  bridges:
    vxbr0:
      interfaces: [vxlan101]
      addresses:
        - dead:beef:2::1/64 #could be anything ipv6, but for mine I used the routed /64 network.
      routes:
        - to: dead:beef:3::/48
          via: dead:beef:2::2 # home router
          on-link: true

Figuring out where things were failing was tricky as I wasn’t sure if the issue was with my home firewall or my VPS. Pings to and from the VPS were working, but nothing going thru it worked. I thought I had the right ip6table rules in place but evidently I did not. Hmm

Reviewing my steps, I found I had forgotten all about net.ipv6.conf.all.forwarding! A quick sysctl got it working and adding a conf file to /etc/sysctl.d/ to make it survive reboots.

Ping traffic was flowing both ways, but trying to visit an v6 website like ip6.me would fail. Grabbed a copy of tshark and watched the traffic. It showed me the issue was with the VPS. Thankfully UFW has some logging that helped track down the issue was indeed with ip6tables and the output helped me write the necessary rules to allow traffic thru.

ip6tables -I FORWARD 1 -i he-ipv6 -d dead:beef:3::/48 -j ACCEPT;
ip6tables -I FORWARD 1 -o he-ipv6 -s dead:beef:3::/48 -j ACCEPT;

Which allowed the traffic to flow and and out of the home network. However reboots are a problem. Thankfully UFW allows us to easily add the above rules so they survive a reboot.

ufw route allow in on he-ipv6 to dead:beef:3::/48
ufw route allow out on he-ipv6 from dead:beef:3::/48

#these don't seem to be needed? Default UFW firewall has ESTABLISHED,RELATED set.

Of course after I started setting the static addresses and updating my local DNS, I realized I should have done a ULA prefix and used NPTv6 to make future migrations easier.

Improvements?

  • I’d like to revisit using Wireguard without vxlans. It is another layer that can go wrong and may be something that isn’t needed and cuts down the maximum MTU.
  • If I ever got a second home internet connection, I’d like to aggregate traffic so that my effective home internet speed improves.
  • Migrate NPTv6 up to the VPS.

HA DNS in the Home

    I’ve been finding that running DNS on my “NAS” isn’t the best of ideas and I’d like to have a “highly available” DNS system. Nothing’s quite worse than getting a call from the Mrs complaining that the internet isn’t working while you’re in the middle of a system update or because the cat stepped on a power switch. (Racking everything is another long term goal.)

    I have been using Technitium’s DNS Server for my network. It feels light and snappy, lets me do unholy things with DNS, has really good filtering abilities. The problem I have with it though is there is no ability to configure one and it’s peers update from that too. There’s no replication going on (though according the github, it’s in the works).

    Which is why I like ADDS DNS, records are replicated to all nodes and it just works(tm). So I light up some Server Cores and setup my own little forest.

    The resulting DNS structure is very simple. On each of my Proxmox servers, I run a copy of Technitium and a ADDS/DNS server. The ADDS/DNS is pointed at each of the Technitium for forward lookups which in turn look to Quad9 and Google (I know I know, but they seem to run a pretty decent DNS resolver) for their own forward lookups.

    Going forward, I’d like to get the DHCP integrated into the DNS servers so hostnames are updated. My network is fairly small so manually adding records for the important things isn’t too much of a hassle. If I had a Proxmox cluster and Ceph pool setup, I’d probably have forgone this, but I don’t have either….yet.

    Post draft update

    We lost power this last week and the main Proxmox server is out of action which means that I’ve lost half my DNS servers now. Except for the little hiccup in that I had forgotten to add Technitium2 to the forwarding list on ADDS/DSN2, this thing has worked perfectly.

    RegreSSHion & ProgreSSHion

    With the recent news of OpenSSH getting haxed…again, I was wondering if it would be possible to marry Wireguard’s not-so-chatty traffic model with ssh.

    Then it hit me…why not just only listen on (or allow traffic from) Wireguard interfaces? So I whipped up a test Ubuntu 24 instance and starting banging rocks together.

    First, lets get Wireguard installed.

    $ sudo apt install -y wireguard wireguard-tools

    Second, lets make a new Wireguard conf for the server and ourselves.

    #The server
    [Interface]
    Address = fd80:892b:9b39::1/64
    PrivateKey = lolLmaoEVEN==
    ListenPort = 51820
    
    [Peer]
    Publickey = EWdRFVVrfaE9PsRaKIX9a8h3BpS/EaUr/F0sxT09+UI=
    AllowedIPs = fd80:892b:9b39::2/64
    #The client
    [Interface]
    Address = fd80:892b:9b39::2/64
    PrivateKey = lolLmaoEVEN==
    ListenPort = 51820
    
    [Peer]
    Publickey = HagDqXuHxulbxKvGgPLtWy7LCv1IGwAJb1wLB40ligk=
    AllowedIPs = fd80:892b:9b39::1/64
    Endpoint = myserver.example.com:51820

    Third, lets enable the service, add a firewall rule to allow SSH traffic on our new interface, bring up the interface, and test the port.

    $ sudo systemctl enable wg-quick@wgsshd0.service
    $ sudo ufw allow in on wgsshd0 to any port 22
    $ sudo wg-quick up wgssh0
    $ sudo screen -S testing bash -c "ufw delete allow 22/tcp; sleep 120; ufw allow 22/tcp"

    Now let’s test it from another machine.

    $ ssh abc@myserver.exmaple.com # should fail.
    $ ssh abc@[fd80:892b:9b39::1] # should succeed.

    Finally, if everything works delete any other SSH port rules you may have.

    Final Thoughts

    Probably the biggest issue I can see is that this requires another service to be running before remote management of the machine is possible. Though with most popular hosts have a feature that lets you access the console thru other means. So this might be a none issue for most. Might be prudent to still have some firewall rules letting thru traffic from a limited subset of addresses to the sshd on the server’s pubic IP address.

    Second biggest would be that every admin would need a Wireguard interface setup on every machine they use. While we’d need to get their ssh keys configured, it’s another thing to keep track of.