Windows Event log analysis-Troubleshooting the BSOD WHEA_UNCORRECTABLE_ERROR
Have you ever encountered a disturbing BSOD - Blue Screen Of Death on your windows machine?? taking a look closer, this screen provides you a more specific issue that your windows OS ran into.
Recently, i faced this issue after replacement of my new SSD for my ultrabook. This small blog post describes on how this issue was tackled and how the root-cause for the same was discovered by log analysis of windows event logs.
The worst thing about an SSD is that it just dies out without any warning, one day suddenly, my laptop complained that it found no bootable device and halted the boot process.
So, immediately i ran into service center and got the SSD replaced and my backup was restored and i was back to work.
After a couple of days, the laptop started giving BSODs and with specific error message - "WHEA_UNCORRECTABLE_ERROR".
Looking at the error message, i started searching for the related issues that can be culprit for the BSOD. I came across several articles online and support articles from Microsoft. The cause for this issue i found was:
A hardware failure.
Failing hard drive.
Improperly seated CPU.
The fix for this issue (that didnt work for me):
Update your windows to latest version
Update your drivers / BIOS
Check for component seating on laptop.
As from my end, all the above stuff was on its right place but the issue still came up randomly every time i booted the laptop.
I was frustrated and decided to dig deeper.
Firstly, tried to locate the error dump file on the machine, but unfortunately, the error dump was not getting saved on the machine after every BSOD. So no clue here.
Next, my favourite part - Log analysis.
Enter log analysis - windows event logs.
Logs are like sentries on watch tower who see everything. But the sad part is that very few people care to look at logs. Remind you that logs contain tons of valuable information which can be converted into actionable intelligence if properly analysed.
Noting the timeline, i started to look at event logs and i was surprised with the event codes and critical and error class messages that were logged at the booting time of system.
Under the windows logs --> system branch, the Event ID 41 was dominant and following that, other critical and Error codes were fired up causing havoc for the windows OS during boot time.
Event codes that took attention - 41,161,6008,29 and were investigated further.
From the documentation relating to error codes from microsoft and other external sources, it narrowed down to issues with 'Kernel Power'
The sequence of issues that were cycling through -
Event ID 6008 --> The previous system shutdown at ......... was unexpected
Event ID 41 --> The system rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Event ID 161 --> Dump file creation failed due to error during dump creation, this gave a rough idea on what is going on with the system.
Image1 - Event ID-41 Windows system logs showing kernel power issues.
Image2 - Event ID-161 failure to create dump file creation.
Image3 - Event ID-6008 Unexpected System shutdown.
My assumption at this point was that the power input to the laptop could have been faulty which gave rise to other issues or could have damaged my SSD and because of that, the dump file was not getting created.
So, i checked for the power issues with charger and checked the batteries, everything was working as expected.
The culprit was found.
After a some more digging around internet, i came across the article about the windows 10 feature called - Fast Startup Mode.
What is Fast Startup Mode?
It is a sort of midway between a machine getting into hibernate and powered off state.
This mode was designed to reduce the time taken by the Windows to boot up. But the catch here is that this prevents the machine from a regular shutdown which can cause issues and on a long run, could potentially damage your hardware (My experience as i my recent SSD died while this feature remained active)
Disable the fast startup - Now my laptop is working like a charm and without any ugly BSODs.
It is fairly straightforward to disable this feature. Hit the Win + R and open the run dialog and type powercfg.cpl On the left side of the menu, click "Choose what the power buttons do"
Scroll down to "Shutdown settings" and uncheck the box for "Turn on fast startup".
Note: If these options are greyed out then you may need to click "Change settings that are currently unavailable".
It is very important that we should keep an eye on the errors that we encounter on our day to day life. While many issues do get resolved by a simple reboot of your system or updating the relevant OS and related packages, there are issues that are co-related and need a careful examination, which can help us resolve the issues. For a proper investigation of issues, log analysis plays a vital role in pinpointing issues and even get to the root-cause.
Often logs, as i earlier pointed out, are mostly neglected and i have seen system owners and administrators turn their back on logs. This should change and all of us - the tech savvy guys should keep the skills related to log analysis in their toolkit and be better equipped.