This week I had to deal with a setup in which I needed to distribute additional static network routes using DHCP.
The setup is easy but there are some caveats to take into account. Also, DHCP clients might not behave as one would expect.
The starting situation is a working DHCP client/server deployment. Some standard virtual machines would request their network setup over the network. Nothing new. The DHCP server is dnsmasq, and the daemon is running under Openstack control, but this has nothing to do with the DHCP problem itself.
By default, it seems dnsmasq sends to clients the
Routers (code 3) option,
which usually contains the default gateway for clients to use.
My situation required to distribute one additional static route for another
subnet. My idea was for DHCP clients to end with this simple routing table:
user@dhcpclient:~$ ip r default via 10.0.0.1 dev eth0 10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.100 172.16.0.0/21 via 10.0.0.253 dev eth0 ^^^ extra static route
To distribute this extra static route, you only need to edit the dnsmasq config file and add a line like this:
For the initial configuratoin tests I was simply refreshing the lease from the client DHCP side.
This got my new static route online. But, and here comes the interesting part, in the case of a
reboot, the DHCP client would not add the default route to the local configuration. The different
behaviour is documented in
dhclient-script(8). So, apparently refreshing the lease and doing a
full machine reboot are completely different situations from the DHCP client point of view.
To try something similar to a reboot situation, I had to use this command:
user@dhcpclient:~$ sudo ifup --force eth0 Internet Systems Consortium DHCP Client 4.3.1 Copyright 2004-2014 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Listening on LPF/eth0/xx:xx:xx:xx:xx:xx Sending on LPF/eth0/xx:xx:xx:xx:xx:xx Sending on Socket/fallback DHCPREQUEST on eth0 to 255.255.255.255 port 67 DHCPACK from 10.0.0.1 RTNETLINK answers: File exists bound to 10.0.0.100 -- renewal in 20284 seconds.
Anyway the root problem is the same: Why would the DHCP client not install the default route?
This was really surprissing at first, and led me to debug DHCP packets using
TIME: 2018-09-11 18:06:03.496 IP: 10.0.0.1 (xx:xx:xx:xx:xx:xx) > 10.0.0.100 (xx:xx:xx:xx:xx:xx) OP: 2 (BOOTPREPLY) HTYPE: 1 (Ethernet) HLEN: 6 HOPS: 0 XID: xxxxxxxx SECS: 8 FLAGS: 0 CIADDR: 0.0.0.0 YIADDR: 10.0.0.100 SIADDR: xx.xx.xx.x GIADDR: 0.0.0.0 CHADDR: xx:xx:xx:xx:xx:xx:00:00:00:00:00:00:00:00:00:00 OPTION: 53 ( 1) DHCP message type 2 (DHCPOFFER) OPTION: 54 ( 4) Server identifier 10.0.0.1 OPTION: 51 ( 4) IP address leasetime 43200 (12h) OPTION: 58 ( 4) T1 21600 (6h) OPTION: 59 ( 4) T2 37800 (10h30m) OPTION: 1 ( 4) Subnet mask 255.255.255.0 OPTION: 28 ( 4) Broadcast address 10.0.0.255 OPTION: 15 ( 13) Domainname xxxxxxxx OPTION: 12 ( 21) Host name xxxxxxxx OPTION: 3 ( 4) Routers 10.0.0.1 OPTION: 121 ( 8) Classless Static Route xxxxxxxxxxxxxx .....D.. [...] ---------------------------------------------------------------------------
(you can use this handy command both in server and client side)
So, the DHCP server was sending both the
Routers (code 3) and the
Classless Static Route (code 121) options to the clients. So, why would
fail the client to install both routes?
I obtained some help from folks on IRC and they pointed me towards RFC3442:
DHCP Client Behavior [...] If the DHCP server returns both a Classless Static Routes option and a Router option, the DHCP client MUST ignore the Router option.
So, clients are supposed to ignore the
Routers (code 3) option if they get
an additional static route. This is very counter-intuitive, but can be easily
workarounded by just distributing the default gateway route as another
classless static route:
dhcp-option=option:classless-static-route,0.0.0.0/0,10.0.0.1,172.16.0.0/21,10.0.0.253 # ^^ default route ^^ extra static route
Obviously this was the first time in my career dealing with this setup and situation. My conclussion is that even old-enough protocols like DHCP can sometimes behave in a counter-intuitive way. Reading RFCs is not always funny, but can help understand what’s going on. I wonder why the protocol is defined this way. I bet there are valid reasons for it. Or not? Who knows.
You can read the original issue in Wikimedia Foundation’s Phabricator ticket T202636, including all the back-and-forth work I did. Yes, is open to the public ;-)