When we started working on Tanium Provision, one of the things we needed to build was a mechanism to do network booting. It’s fine to boot from a USB key (and necessary in some scenarios, something to discuss later), but powering on a machine and pressing a couple of keys is much easier. So, we built the Tanium PXE service.
For the rest of this blog post, I want to focus on the technology needed for network booting. And I specifically don’t say “PXE booting” in that context, because we can do more than that. So let’s dig in.
How does PXE boot work?
When a device with an Ethernet connection boots up, it will process the entries in the boot order. Most of the time, that doesn’t include network booting (certainly not before booting from the hard drive, as that would prevent the locally-installed OS from running automatically). So, you have to press a key to choose a temporary boot device. (On VMs, this can get tricky, so what I typically do is create a new empty VM with the hard drive first and network device second, then create a checkpoint with the VM off in that state. So it will PXE boot when it starts up, install an OS, and then continue to run that OS from the hard drive, at least until I want to do it all over again by reverting to the checkpoint.) The key that you press depends on the hardware (e.g. Dell uses F12), but once you see the menu, you can choose the PXE option. In the image below, I’m using Proxmox/QEMU (pressing Esc during startup and then navigating to the boot menu), where I can choose PXEv4 (PXE over IPv4):
So after you press enter to select that option, what happens? Here’s a Wireshark capture that shows the conversation:
The device sends out a DHCP Discover request, asking a DHCP server for an IP address. In that request, it indicates via the “Vendor class identifier” (option 60) that it wants to do a PXE boot by specifying the “PXEClient” string:
The device then gets two different responses to that DHCP Discover request, with the first from the DHCP server (10.102.1.1) with an IP address that it can have (10.102.1.124):
The second response comes from the PXE server, which saw the initial DHCP Discover request. (You need to have your network configured appropriately for both the DHCP server and PXE server to see the broadcast packet. If those servers are on a different network segment, that means BOOTP forwarders/DHCP helpers. And while in theory you could run DHCP and PXE on the same server, that changes the process, so don’t do that.) That second response doesn’t offer an IP address, but rather just indicates that “yes, I am available to help you PXE boot”:
The device will then send a DHCP Request packet to officially request the offered IP address from the DHCP server, and the DHCP server will respond with a DHCP ACK packet to confirm that the IP address has been reserved and can be used by the device.
Right after that, the device will send out a proxyDHCP request directly to the PXE server (as it kept track of the IP address of that from the second DHCP Offer packet, the one with the PXEClient identifier):
And the PXE server will respond with a proxyDHCP Offer directly to the device, specifying the TFTP server and file name that the device should use to boot:
The device will then do a TFTP file transfer (using UDP):
Notice that the device said “I can support a 1468 block size” and for whatever reason it initially aborts the transfer before making the actual read request with the block size and a window size of 4. But the PXE service responded back only with the block size agreement, with no confirmation on the window size. So, the device then is forced to send an ACK for each (1468-byte) block of data. (Our PXE server can do sliding windows to support these larger window sizes, but that’s disabled by default.) The extra ACKs are no big deal for small files on fast networks.
The initially-transferred file will always be a .EFI file (UEFI binary) on a UEFI system, and it will be loaded and executed as soon as the transfer completes. It will be provided with the details of where it was retrieved from, so it can then do TFTP transfers of any additional files that it requires to complete the PXE boot process. In the case of Tanium Provision, we load shimx64.efi, which then loads grubx64.efi, and that attempts to fetch some configuration files:
(It only succeeds on grub.cfg, but you can see it is trying to find a bunch of other “more specific” files before getting that general, default file.) For efficiency, everything from that point forward (e.g. the pre-boot OS that we use to drive the Tanium Provision process) is retrieved via HTTP (hence why the window size doesn’t really matter — we’re only transferring a couple of MB of data over TFTP), and we’re then ready to deploy the “real” OS:
Simple enough. This process goes back to standards created in the late 90’s, with the last significant change coming in 2015 with the addition of sliding window support. But with ubiquitous support, it remains popular.
But wait, there’s more
With the introduction of UEFI and its growing popularity (helped along by Microsoft requiring that it be used for devices that ship with Windows 10 preinstalled), there was an opportunity to add “more stuff” that would have been exceedingly hard to do on top of the initial BIOS firmware — including a full network stack that supported HTTP directly. Intel talked about that back in 2015, leveraging that network stack to support an HTTP boot protocol:
But as with all firmware enhancements, it can take some time before it shows up on “real” devices. Fortunately, using Proxmox/QEMU, you can always drop in the latest-and-greatest UEFI firmware from Intel’s open source EDK2 project and try it out in a virtual machine. With the most recent Proxmox release, that’s easier than ever as the HTTP boot support is there by default. Notice the additional option:
So what happens when you choose that option? It starts out the same with the DHCP requests:
But that first DHCP Discover request from the device looks a little different:
Instead of saying “PXEClient” it says “HTTPClient”. Our Tanium PXE service responds back to that request saying “yes, we can support HTTP boot”:
And then instead of doing a TFTP transfer, the device will make HTTP requests:
The end result is the same; the only real difference is the transport protocol for the files being transferred. And since HTTP, as a TCP-based protocol, is more efficient than TFTP, the process can be faster. And for those of you who have network caching appliances, this HTTP traffic could be cached at the network edges, possibly even enabling a single centralized server that offers up its services to devices on the other end of a WAN link (not that I’ve actually tried that — yet).
Alright, that works with Proxmox, but what about real devices? When we started working on this, I asked a few of the OEMS if they had devices that supported HTTP boot and got back answers that indicated that they didn’t — but I think to a certain extent that’s a misunderstanding, given that this is referred to as “HTTP(s) boot” indicating that it supports both HTTP and HTTPS. We don’t use HTTPS because that’s a bit of a pain in this scenario — the SSL/TLS cert used by the server needs to be trusted *by the UEFI firmware* and that’s a bit of a pain to configure. But using HTTP (for this non-sensitive boot content on an internal network) is fine in this scenario (no worse that TFTP).
As an example, I have a Dell Optiplex 7090. When I press F12 to get the boot menu, HTTPs boot is an option:
And when I choose that option, it has a nice UI that displays the progress of the HTTP boot process:
Very nice. Do other OEMs support this? I am guessing that yes they do, but you would have to confirm with them. Hopefully you’ll have better luck than I did. (I believe my Lenovo ThinkStation P620 supports this too, but since that computer is hosting my servers, it’s pretty hard to reboot it to find out.)
I know there have been folks who have benchmarked the networking stack in the UEFI firmware to see what kind of HTTP file transfer throughput it can get, but again for our implementation that doesn’t matter as we’re not transferring too much using the firmware; our “bootstrap OS” retrieves most of itself using the OS’s networking stack, which works very well.
It’s also quite feasible for the UEFI firmware to support this over wireless network connections — let me know if you find any devices that support this, as I’d love to try it…
After a number of years where PXE was the only option, having HTTP boot as another choice is welcome. Using protocols from the 1990’s works, but it is nice to have something a little more modern.
Categories: Geeking Out