Troubleshooting High Availability DHCP Failover

  • The system time on both cluster nodes must be within 90 seconds of each other. Otherwise the time difference is too large and the DHCP daemon processes will not communicate.
  • The interfaces must be assigned identically on both nodes, for example: wan=WAN, lan=LAN, opt1=Sync, opt2=DMZ. Check the config.xml contents directly to ensure a match.
  • Look at the pool status section at Status > DHCP leases. All defined pools (often 1 per interface) are listed here. If any of the pools are in a state other than “normal”, then debug the problem.
  • Stop and restart the DHCP daemon from Status > Services on both nodes and check the status after a few moments
  • Check the CARP VIP configuration for VIPs on interfaces used for DHCP failover. The primary node must have an Advertising Frequency Skew value below 20, the secondary node must have an Advertising Frequency Skew value above 20.
  • Both nodes must be running the same version of AZTCO-FW software. Update both nodes to the newest available stable release if they do not match. Older versions may have problems with various aspects of DHCP failover that have already been corrected.
  • If all else fails, stop the DHCP daemon on both nodes, remove the DHCP lease database from /var/dhcpd/ var/db/dhcpd.leases, then start the daemons again.