Windows Autopilot

Troubleshooting Windows Autopilot, a reference

I’ve done a variety of blogs on troubleshooting Windows Autopilot, which you can read up on for historical reference:

And there are troubleshooting notes in several other blogs as well.  And yet, reading back through these I realize a few things:

  • I’ve learned a lot about troubleshooting Windows Autopilot, MDM, Azure AD, etc. since then.
  • There’s always more to learn.
  • The overall troubleshooting process is still too difficult.  I would go as far as saying it sucks to a certain degree – even after looking at hundreds of sets of logs, it’s still easy to get lost.  (Yes, we want to implement improvements to make it easier overall – those are on our list.)

So I wanted to take a step back and talk about a suggested methodology to use for troubleshooting.  Let’s start with the basics.  If you run into an issue, use the Intune portal to open a support case via the “Help and Support” node.  (They are free support cases, included with your Intune subscription, so take advantage of them)  You should be prepared to provide a few things:

  • A useful description of the problem.  While this seems obvious, you’d be surprised at the number of “it didn’t work” descriptions.  How far did you get in the process?  What error did you see?  Did it work before and is now failing?  What scenario are you using (user-driven, self-deploying, white glove; AAD vs. Hybrid AAD)?
  • A screenshot or photo of the error.  I’ve seen page-long descriptions of the problem encountered, and still am no closer to understanding the situation.  A screenshot goes a long way.
  • A full set of logs.  We’ll always ask you to run the MDMDiagnosticsTool to create a cab file (which you will need to upload somewhere, or enclose it in a zip file before sending it, as O365 will strip CAB files – you’ll slow down the process if it takes two days for us to get a CAB file).

Let’s focus in on that last point for a bit.  You should run MDMDiagnosticsTool on Windows 10 1809 and above to collect logs.  (If you are still using older Windows 10 versions, you really should move forward – the MDM and Autopilot capabilities are much better.  On older versions, you would need to use the older LicensingDiag.exe.  That’s tied back to Autopilot’s initial implementation that was built on top of the Windows 10 licensing and activation stack.  But with newer versions, you’ll get much better info from MDMDiagnosticsTool.)  You might see two different variations on the command, based on the scenario that you are performing:

  • MDMDiagnosticsTool.exe -area Autopilot -cab c:\Autopilot.cab
    • This gathers most of the available logs related to Windows Autopilot, OOBE, MDM, Azure AD, etc.  It also gathers the hardware details (via the hardware hash), registry information, and much more.  We’ll go through that in detail in a moment.
  • MDMDiagnosticsTool.exe -area Autopilot;TPM -cab c:\autopilot.cab
    • This adds additional TPM diagnostics to the previous set of logs.  Those details would be needed when you are having issues with a self-deploying or white glove scenario.  There’s no harm in gathering them in other scenarios, but there are cases where the additional TPM info can generate error messages or popups, so to avoid people thinking that’s somehow related to their issue, you can leave it off for other scenarios.

After you run the command, you can copy off the resulting CAB file to a USB key, a file share, or wherever you want.  The support personnel should be able to provide details on how to get the file to them.  Once you’ve done that, others on the team are able to look at those logs to help diagnose the issue. 

Alright, but what if you wanted to try to figure it out yourself?  First, it’s useful to understand what’s inside that CAB file (focusing on Windows 10 1903 – earlier versions will have fewer files), so let’s go through a list.

File nameUsefulnessComments
CloudExperienceHostOobe.etl.*LowETL trace files.
LicensingDiag.cabLowIf you’re running into Windows activation issues, you might care about this, but otherwise, it’s not useful for Autopilot troubleshooting.
AgentExecutor.logLowThis is picked up from the Intune Management Extensions log folder (C:\ProgramData\Microsoft\IntuneManagementExtension\Logs) but I’ve never found anything useful in it.
AutopilotConciergeFile.jsonLowAt this point, this file is not used.
AutopilotDDSZTDFile.jsonHighThis file contains the Autopilot profile settings being used for the device.
CertReq_enrollaik_Output.txtHighThis file only exists when the TPM area is included.  It provides a simulation of the TPM attestation process and logs the results, so it’s useful to see why the “real” TPM attestation might be failing.
CertUtil_tpminfo_Output.txtMediumThis file only exists when the TPM area is included.  It provides more details about the TPM chip or firmware used in the device.
DeviceHash_*.csvHighThis contains the serial number and full hardware hash for the device.  While that hash might not look useful to you, it tells us a lot about the device, including the version of Windows 10, patches that are installed, TPM firmware version, and a lot more stuff.
DiagnosticLogCSP_Collector_Autopilot.etlLowETL trace files.
DiagnosticLogCSP_Collector_Autopilot.etl.mergedLowETL trace files.
DiagnosticLogCSP_Collector_DeviceEnrollment.etlLowETL trace files.
DiagnosticLogCSP_Collector_DeviceProvisioning.etlLowETL trace files.
IntuneManagementExtension.logHighThis log will capture excruciating detail about the installation of Win32 apps being deployed via Intune.  (Use one of the ConfigMgr log viewing tools, e.g. CMTrace.exe, to view this.)
LicensingDiag_Output.logLowThis captures the output of the LicensingDiag.exe command that generated the previously-mentioned LicensingDiag.cab.
MDMDiagHtmlReport.htmlMediumThis is the same report you can get from the Settings app that provides more details on all the MDM policies that have been applied to the device. 
MdmDiagLogMetadata.jsonLowThis records the areas that were specified on the MDMDiagnosticsTool command line (or those added automatically).
MDMDiagReport.xmlMediumThis is a machine-readable XML version of the HTML report above.
MdmDiagReport_RegistryDump.regMediumThis dump the contents of a variety of registry keys that are useful to determining the state of the machine, including MDM enrollment details, Autopilot details, and related info.  Support technicians may use this to find related information in Intune.
MdmLogCollectorFootPrint.txtLowThis shows everything that MDMDiagnosticsTool tried to collect and put into the CAB file.
microsoft-windows-aad-operational.evtxHighThis event log shows Azure AD join and Hybrid Azure AD Join-related info. 
microsoft-windows-appxdeploymentserver-operational.evtxLowThis event log shows details from UWP app installations.
microsoft-windows-assignedaccess-admin.evtxLowThis event log contains events related to kiosk configuration.
microsoft-windows-assignedaccessbroker-admin.evtxLowThis event log contains even more events related to kiosk configuration.
microsoft-windows-assignedaccessbroker-debug.evtxLowThis event log contains even more events related to kiosk configuration.
microsoft-windows-assignedaccessbroker-operational.evtxLowThis event log contains even more events related to kiosk configuration.
microsoft-windows-assignedaccess-operational.evtxLowThis event log contains even more events related to kiosk configuration.
microsoft-windows-devicemanagement-enterprise-diagnostics-provider-admin.evtxHighThis event log covers MDM enrollment (including failure reasons) and other pertinent MDM activities.
microsoft-windows-devicemanagement-enterprise-diagnostics-provider-debug.evtxLowThis event log is usually empty.
microsoft-windows-devicemanagement-enterprise-diagnostics-provider-operational.evtxLowThis event log has lots of MDM-related activity in it, but I’ve never found any of it to be of any value.
microsoft-windows-moderndeployment-diagnostics-provider-autopilot.evtxHighThis is the key event log used by Autopilot, and one that you’ll almost always want to look at.
microsoft-windows-moderndeployment-diagnostics-provider-managementservice.evtxLowThis event log has some Autopilot-related activity in it, but this is more “housekeeping” stuff that isn’t typically useful.
microsoft-windows-provisioning-diagnostics-provider-admin.evtxLowThis event log contains events related to the application of provisioning packages (PPKGs), which are used to configure some Windows default settings.  Typically you can ignore this one.
microsoft-windows-shell-core-operational.evtxMediumThis is the event log that the shell uses for most things, including tracking the OOBE process, registering apps when a user signs in, etc.
microsoft-windows-user device registration-admin.evtxMediumThis event log shows details around Hello for Business and related configuration details.
setupact.logMediumIf you are familiar with the logs created by Windows Setup, you’ll recognize this one.  This logs all the stuff going on in OOBE, and can be useful for troubleshooting any OOBE weirdness.
TpmHliInfo_Output.txtHighThis log (which is created even when not specifying the TPM area) contains basic details about the TPM in the device: the manufacturer, the firmware level of that TPM, whether it has a required EK cert, etc.

While that might seem intimidating, if we focus initially on only the high priority ones, in bold above, it’s a little easier to follow.  Let’s walk through this somewhat methodically.

First, look at the AutopilotDDSZTDFile.json and check these settings:

  • CloudAssignedDomainJoinMethod.  If this is 0, the device has been configured to join Azure AD.  If it is 1, the device has been configured to join AD (ODJ, Hybrid Azure AD Join).
  • DeploymentProfileName.  This will tell you the name of the Autopilot profile that was assigned to this device.
  • CloudAssignedOobeConfig.  This is a set of flags (described here) that specify how OOBE should behave (e.g. which pages should be skipped).  For user-driven scenarios, you’ll typically see a value of 28 (user should not be an admin) or 30 (user should be an admin), although that could change as new settings are added.  For self-deploying, bits 5 (32), 6 (64), and 7 (128) will typically be set, so the value will be bigger. (Note that you won’t see a value for white glove, as that is checked service-side, not on the client.)

So now you should know what scenario (user-driven vs. self-deploying) and join method (AAD vs. AD/ODJ/Hybrid AADJ).  Next, open the Autopilot event log (microsoft-windows-moderndeployment-diagnostics-provider-autopilot.evtx) and look for errors.  If you see any errors like this:

AutopilotManager reported that MSA TPM is not configured for hardware TPM attestation even though the profile indicates it is required. Autopilot cannot proceed

Then you know the device is trying to white glove or self-deploying mode, and TPM attestation is failing.  Check this blog for more info on that.  As a quick summary, there are a few other items you can look at:

  • TpmHliInfo_Output.txt.  If you see an error that there is a missing EK cert, that’s a problem.
  • CertReq_enrollaik_Output.txt.  Look for errors communicating with the test TPM attestation service, or processing the resulting certificate response.  (This can indicate a problem with the EK cert, fixed in a later 1903 cumulative update – see the known issues list – or a problem with the date/time on the device.

If you know that TPM attestation has succeeded, but you’re still seeing an error in the Autopilot event log with error 0xC1036501:

Autopilot discovery failed to find a valid MDM.  Confirm that the AAD tenant is properly provisioned and licensed for exactly one MDM. 

Then make sure there is only on MDM app defined in Azure AD (as described in my previous blog). 

If you received any “something went wrong” errors with an 8018* error code, check the microsoft-windows-devicemanagement-enterprise-diagnostics-provider-admin.evtx to see what caused the MDM enrollment error.  You should find events like this one:

clip_image002

The reason text is what you’re looking for.

If you received any “something went wrong” errors with an 801C* error code, check the microsoft-windows-aad-operational.evtx and microsoft-windows-shell-core-operational.evtx event logs to see what caused the Azure AD Join failure.

If you get an error like “OOBEIDPS” check the microsoft-windows-shell-core-operational.evtx event log and setupact.log to see if they reported any network-related errors.

If you get a timeout, things get really interesting:

  • If there is a timeout during ESP and you’re using conditional access, check the IntuneManagementExtension.log to see if there are any AAD conditional access-related errors.
  • If there is a timeout during user ESP and it’s a hybrid AADJ scenario, check the microsoft-windows-aad-operational.evtx event log to see if the device registration process completed.  You’ll have to do this by omission: You’ll see an event that says “Device is not cloud domain joined: 0xC00484B2” (event 1089) every few minutes until the device registration process completes, which can take up to 30 minutes (as AAD Connect only syncs every 30 minutes).  When that event stops, the device has been registered.  (Then you may see events about the user not having an AAD user token…)
  • If you’ve added or changed an app recently and now you’re getting ESP timeouts, the app may not be configured properly. Check the IntuneManagementExtension.log to see if any apps failed to install.  Look for entries like this:

    [Win32App] Got InstanceID: Win32App_aaaaaaaa-70f7-4570-8394-92909f0f9919_1, installationStateString: 2

    State “2” means the install failed, state “3” means the install was successful.  Right now, if an app install fails, the failure is not reported by ESP, so it will sit until the timeout happens.

There are probably 101 other possible failure causes, which may get into more logs or even the ETL trace files (which you can try to use with Windows Performance Analyzer, Microsoft Message Analyzer, TRACEFMT, or similar tools – more challenging).  At some point, you may need to admit defeat and open a support case.  But hopefully you can make an initial stab at it.

Categories: Windows Autopilot

8 replies »

  1. Hi Michael, great article as usual.
    Besides the errors we encounter with the AutoPilot process, I think one of the biggest issue is the time it takes to finish the process. In my testing with the Hybrid Domain Join it takes more then one hour to finish. Although I do understand why some steps take a long time from the User perspective it is not very good. And I am even questioning how this is better then traditional imaging. Again from the User perspective, I do see the improvements for IT.

    I was wondering if Microsoft is working on improving the time it takes to finish these process. I feel we need to keep it around the 10 minute mark to really be able to use it in production mode.

    regards,

    Jeroen Dijkman

    Like

  2. when you are targeting apps to the device I usually use a dynamic group and target the enrollment profile used so I can target to specific personas and apps. I cannot see the devices in White Glove adding themselves to an enrollment profile, is there a way to pick up these devices as WhiteGlove enrollment profiles or how else can you group these devices to target applications?

    Like

    • I’m not sure I follow the question. Are you saying you are using “enrollmentProfileName” in a dynamic group query? I’d have to check if that gets set during the initial white glove enrollment, but even if it did the group might not be updated fast enough for white glove to see the targeted apps.

      Like

  3. Hi Michael, would there be any benefit of running “MDMDiagnosticsTool.exe -area DeviceProvisioning” as well? I have seen the “DeviceProvisioning” area in a couple of other blogs, but I am not quite sure what it captures and how useful that is. Thanks!

    Like

    • If you specify Autopilot, it will always include DeviceProvisioning and DeviceEnrollment. You can see that in the MdmLogCollectorFootprint.txt file:

      AreaName : deviceenrollment;deviceprovisioning;autopilot;

      Like

    • Hi Jason, the configurations for the different options available seem to be stored in the registry:
      “Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MdmDiagnostics\Area”

      Like