Geeking Out

Connect the dots: Reverse-engineering an Autopilot hash

I previously posted details on breaking down an Autopilot hash (really, an OAv3 hash as these aren’t Autopilot-specific), leveraging the OA3TOOL.EXE utility in the ADK to convert the hash into a printable XML format. But how do you go from that 4000-byte (exactly) string of letters to something intelligible? With a little bit of reverse engineering, you can figure it out.

It’s first useful to recognize the format of what you’re looking at. As I mentioned previously, the hash (which isn’t a hash at all) is encoded in base64. How can you tell? You just learn to recognize what base64 looks like. It uses 64 printable characters to represent binary data: every 3 bytes of data (3*8=24 bits) are transformed into 4 bytes of printable characters (4*6=24 bits). So if you only see those 64 printable characters, you know it’s base64.

Alright, so how do you decode base64 into binary data? There are plenty of websites that will do it, if you don’t mind someone, umm, “processing” your data. Since I have a MacBook Pro, I used that: There is a “base64” utility included in the OS that you can use, e.g. “cat blah.txt | base64 -D -o blah.bin” which takes the input file and writes out the binary result. You could use PowerShell as well, e.g. “$binary = [Convert]::FromBase64String($base64EncodedText)”.

So then you have binary data, and in this case it’s very binary — you can look at the file and make out some strings, but that’s not a very good way to go about deciphering all of the data. So you need a good way to display the binary data in a format that is useful, i.e. hexadecimal. My personal preference for doing this is HxD, a freeware utility that has been around for quite some time, although you could probably use the “Format-Hex” PowerShell cmdlet if you wanted something that achieves the goal, without helping you interpret what you are seeing.

So 4000 characters of base64 text turns into 3000 characters of binary data, but I only care about the first 1K or so (it’s variable length, so it depends on the machine) — all of those “A” characters at the end of the base64 string represent binary zeroes (four A’s = three bytes of binary 0’s).

So, you can see everything pretty squished together. This isn’t unusual for binary data, generally it’s encoded by specifying three pieces:

  • The type or ID of the current item.
  • The length of the current item.
  • The value of the current item.

Those might be juggled around, but in this case that is exactly the structure: A series of these. Just to throw you off a little bit, the file starts with a two-character header, “OA”, indicating the file type (in this case, created by the OA3TOOL or equivalent). After that, there are two bytes that represent the total length of the data (almost, more on that later). If I select those two bytes in HxD, it tells me that the 16-bit value is 938, so that’s how many bytes of data there are in the file.

After that, we start to see a pattern:

  • A two-byte type (where typically only the first byte is set, as there aren’t enough types to require more than 8 bits yet).
  • A two-byte length (sometimes using two bytes).
  • The data.

The format of the data varies based on the type, but we can walk through the content:

So we can see a variety of different types (30 or so) with lengths varying by type. That’s when the fun starts though — figuring out what each of those represents. Sometimes that’s obvious from looking at the data, other times it requires comparing values between multiple hashes, but sometimes it’s just a mystery. At least in this case we’ve got something to compare it against: the output of OA3TOOL.EXE /DECODEHWHASH.

Here’s my deciphering of each type:

  • 1 (01 00) = Tool Build, OS Build
  • 2 (02 00) = Processor packages, cores, etc.
  • 3 (03 00) = Processor manufacturer
  • 4 (04 00) = Processor model
  • 5 (05 00) = RAM info
  • 6 (06 00) = Disk capacity
  • 7 (07 00) = Disk serial number
  • 8 (08 00) = Network adapter info
  • 9 (09 00) = Display resolution
  • 10 (0A 00) = System enclosure stuff (chassis type, power platform role)
  • 11 (0B 00) = Offline Device Id
  • 12 (0C 00) = Smbios Uuid
  • 13 (0D 00) = TPM Version
  • 14 (0E 00) = Smbios System Serial Number
  • 15 (0F 00) = Smbios Firmware Vendor
  • 16 (10 00) = Smbios System Manufacturer
  • 17 (11 00) = Smbios System Product Name
  • 18 (12 00) = Smbios SKU Number
  • 19 (13 00) = Smbios System Family
  • 20 (14 00) = Smbios Firmware Vendor
  • 21 (15 00) = Smbios Board Product
  • 22 (16 00) = Smbios Board Version
  • 23 (17 00) = Smbios System Version
  • 24 (18 00) = ProductKeyID
  • 25 (19 00) = TPM EkPub
  • 26 (1A 00) = Product Key PkPn
  • 27 (1B 00) = Four bytes of FF’s or 00’s?
  • 28 (1C 00) = Disk SSN Kernel
  • 29 (1D 00) = Video details (memory, etc.)
  • 30 (1E 00) = Video card info

With a little more work to decode each of these, you can create a PowerShell cmdlet that does the full(-ish) decoding:

If you look at the “total length” (bytes 2-3, starting from 0), you’ll see that there is some extra data after the “end” of the values. This data starts with “CS” and then specifies the total length of the data, which is always 36. Since the type and length are always 4 bytes, that means there are 32 bytes of “real data.” So what is this real data, and why does it exist outside of the “total length”? Simple, which also explains the “CS”: This is a checksum of the all the “real data.” If you don’t immediately recognize what checksums are 32 bytes long, the HxD app will let you select the full chunk of data and then calculate various checksums on it, a nice feature. At the end of the process, it will just confirm that it’s a SHA-256 hash — pretty common.

Some quick points from this:

  • There are some items in the hash (that isn’t a hash) that OA3TOOL doesn’t decode, e.g. the processor manufacturer. So this shows more data that that tool.
  • The “offline device ID” and “offline device ID type” need some additional exploration. It appears that if the OS has a functional TPM driver (which it will any time the TPM is enabled in the firmware) the type will be “TPK_EK,” but if the TPM isn’t enabled it gets a value via a UEFI variable.
  • Interestingly, Hyper-V VMs don’t appear to have “real” disk serial numbers, not sure why that’s true. Perhaps it has to do with the way the VHDX was initially created (e.g. converting a WIM to VHD).

But what’s the point of all this, other than increasing our understanding of the what’s in a hardware hash (that isn’t a hash at all)? Well, the next step to all of this is to generate our own hashes — I hope that’s the last post in this series…

Categories: Geeking Out

Tagged as: