PC Rebooting While Idling, No Info in Event Viewer - How to Diagnose?

Got a PC that’s a hodgepodge build of components from a few different upgrades over the last several years. It’s been pretty stable throughout that span, after I replaced a flaky power supply a couple of years ago. However, for the last couple of weeks I’ve had a recurring reboot issue.

Seemingly out of the blue, about once per day, while idling, the PC reboots itself and lands back on the login screen. There’s no instigating events I can find in Windows Event Viewer; just a “unexpected power off just occurred” complaint from after it’s done rebooting. In one instance, the PC died while it was processing some TRIM operations on hard drives late at night, but it hasn’t been doing that during the others as far as I can tell.

It only ever happens when the PC has been left idle for long enough for the monitors to go off – never during use, including during heavy use when I was streaming some games at 1440p awhile back. That usually translates to “after I go to bed, between 2 and 9 AM,” but it rebooted while I was taking a midday nap today, for instance, so, no real consistency there.

There haven’t been any major hardware changes leading up to these reboots, nor any major software changes that I’m aware of.

How the heck do I diagnose a thing that isn’t leave traces of its cause?

Hmmm, was able to get WinDbg to read through the most recent Memory Dump – a 1.4gb horror show, hah.

Microsoft (R) Windows Debugger Version 10.0.19041.685 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Windows\MEMORY.DMP]
Kernel Bitmap Dump File: Kernel address space is available, User address space may not be available.

Symbol search path is: srv*
Executable search path is: 
Windows 10 Kernel Version 19041 MP (16 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 19041.1.amd64fre.vb_release.191206-1406
Machine Name:
Kernel base = 0xfffff807`32c00000 PsLoadedModuleList = 0xfffff807`3382a230
Debug session time: Tue Sep 28 15:31:57.334 2021 (UTC - 4:00)
System Uptime: 0 days 16:29:19.965
Loading Kernel Symbols
...............................................................
.............Page 1c1f29 not present in the dump file. Type ".hh dbgerr004" for details
...................................................
................................................................
..............
Loading User Symbols
PEB is paged out (Peb.Ldr = 00000000`00464018).  Type ".hh dbgerr001" for details
Loading unloaded module list
..................................................
For analysis of this file, run !analyze -v
8: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 0000000000000008, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, bitfield :
	bit 0 : value 0 = read operation, 1 = write operation
	bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff80732e09f1c, address which referenced memory

Debugging Details:
------------------


KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.Sec
    Value: 3

    Key  : Analysis.DebugAnalysisProvider.CPP
    Value: Create: 8007007e on THE-ACCORD

    Key  : Analysis.DebugData
    Value: CreateObject

    Key  : Analysis.DebugModel
    Value: CreateObject

    Key  : Analysis.Elapsed.Sec
    Value: 11

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 80

    Key  : Analysis.System
    Value: CreateObject


BUGCHECK_CODE:  a

BUGCHECK_P1: 8

BUGCHECK_P2: 2

BUGCHECK_P3: 0

BUGCHECK_P4: fffff80732e09f1c

READ_ADDRESS:  0000000000000008 

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

PROCESS_NAME:  vpnui.exe

TRAP_FRAME:  ffffcf0dd50bc000 -- (.trap 0xffffcf0dd50bc000)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000001
rdx=ffffe18d8225e690 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80732e09f1c rsp=ffffcf0dd50bc190 rbp=ffffcf0dd50bc1f9
 r8=0000000000000000  r9=0000000000000001 r10=ffffe18d8225e5e0
r11=ffffe18d7f007d90 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na pe nc
nt!ExpAcquireResourceExclusiveLite+0x41c:
fffff807`32e09f1c 8b4708          mov     eax,dword ptr [rdi+8] ds:00000000`00000008=????????
Resetting default scope

STACK_TEXT:  
ffffcf0d`d50bbeb8 fffff807`33009169 : 00000000`0000000a 00000000`00000008 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
ffffcf0d`d50bbec0 fffff807`33005469 : 00000000`00000000 ffffcf0d`d50bc0b1 ffffe18d`85a94080 fffff807`32e0b743 : nt!KiBugCheckDispatch+0x69
ffffcf0d`d50bc000 fffff807`32e09f1c : ffffe18d`7f007df0 ffffe18d`7f007d90 ffffcf0d`d50bc1f9 fffff807`32f5fe20 : nt!KiPageFault+0x469
ffffcf0d`d50bc190 fffff807`32e0902c : fffff807`32f63170 00000000`00000001 00000000`00000001 00000000`00000000 : nt!ExpAcquireResourceExclusiveLite+0x41c
ffffcf0d`d50bc260 ffffcacb`4023b56a : 00000000`00000000 00000000`00989680 00000000`00000001 ffffe18d`859d4520 : nt!ExEnterCriticalRegionAndAcquireResourceExclusive+0x3c
ffffcf0d`d50bc2a0 ffffcacb`40f37823 : ffffca92`05b0f900 ffffca92`05b0f920 00000000`00000001 00000000`00000000 : win32kbase!EnterCrit+0x8a
ffffcf0d`d50bc3c0 ffffcacb`40e4bc9f : ffffca92`05b0f920 ffffca92`05b0f920 00000000`00000000 00000000`1000bc00 : win32kfull!LeaveEnterCrit::~LeaveEnterCrit+0x17
ffffcf0d`d50bc3f0 ffffcacb`40e4b865 : ffff731a`ef46e200 ffffca92`05b00000 00000000`00000001 ffffcacb`00000000 : win32kfull!xxxRealSleepThread+0x36f
ffffcf0d`d50bc510 ffffcacb`40e49ded : ffffcf0d`d50bcb80 00000000`00000000 ffffcf0d`d50bca78 ffffca92`05b0f920 : win32kfull!xxxSleepThread2+0xb5
ffffcf0d`d50bc560 ffffcacb`40e48eb2 : ffffcf0d`d50bca78 00000000`0066e400 00000000`00000000 00000000`00000000 : win32kfull!xxxRealInternalGetMessage+0xcfd
ffffcf0d`d50bca30 ffffcacb`405d6275 : ffffe18d`85a94080 00007ffe`d2e33970 00000000`00000020 00000000`00466000 : win32kfull!NtUserGetMessage+0x92
ffffcf0d`d50bcac0 fffff807`33008bb8 : 00000000`0000000a ffffcf0d`d50bcb80 ffffcf0d`00000000 ffffcacb`405d6c88 : win32k!NtUserGetMessage+0x15
ffffcf0d`d50bcb00 00007ffe`d2e40014 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x28
00000000`0066e3f8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffe`d2e40014


SYMBOL_NAME:  win32kbase!EnterCrit+8a

MODULE_NAME: win32kbase

IMAGE_NAME:  win32kbase.sys

STACK_COMMAND:  .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET:  8a

FAILURE_BUCKET_ID:  AV_win32kbase!EnterCrit

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {d56d29ce-2532-972b-bc6d-31106c6c0955}

Followup:     MachineOwner
---------

So something tried to access RAM that didn’t exist?

I’d turn off auto restart after system failure, you should then get to see the exact windows error on screen the next time it happens, in theory. ;)

Oh you know why its crashing, lol , I just saw your 2nd post.

Here I googled that fail code:

Oh damn, I didn’t even realize that was a thing! Nice. Okay. Feeling better prepared already.

The first few times this happened late at night, it was happening in the same nebulous “partner and I are both sleeping” timespan that our brand new Samsung microwave keeps turning on its over-stove light, and I was worried we were getting some kinda weird power surge that was killing the PC and fucking with the microwave but leaving clocks intact. Didn’t even occur to me after disproving that (when the PC rebooted not at the same time the microwave light turned itself on) that I should be able to see the damn bluescreens.

Do you not provide UPS backup for the precious ?

Blue screen view can read those too

Got another psu to test with? Able to “bread box” it to troubleshoot?

Sadly not. I’ve been thinking of getting one.

Currently no, sadly. May have a friend who has one, but I think he already tossed his old system, ugh.

Some interesting tidbits here. Wonder if maybe there was a graphics card update I clicked through without thinking X days ago?

I’m trying some of the other things like filesystem repairs/scans, just in case. If it keeps up tonight, will dink around in graphics drivers.

Er, it should be “bread boarding” not “boxing”

Revert the drivers at the least.

In my experience, IRQL_NOT_LESS_OR_EQUAL errors have always had bad RAM as the root issue, though of course there can be other causes. Personally I’d throw a memtest at it first before ripping apart my computer. Time consuming? Sure. But the physical effort is minimal.

Sigh, yeah, that’s on the list. It didn’t reboot last night, mercifully.

Fuckin microwave light still came on, though.

That’s the type of thing that’s usually a failing power supply.

If you didn’t change anything then yes I would guess a bad PSU. But run a memtest first as that’s easy to do.

^This right here - every time I’ve had random reboots it’s been one of two things: failing PSU or a bad stick of memory.

You can also pull a stick out and see if it happens again, and if so, put that back in and pull another and repeat until you’ve either ruled out the memory or found the culprit.

When I started getting random reboots a couple years ago it turned out to be a secondary spinny hard drive that was about to fail. Might want to run a check just to eliminate that from the list of possibilities.

I used CrystalDiskInfo to diagnose, but there are probably dozens or hundreds of similar tools.

That’s pretty unusual. Failing hard drive, usually you get weird freezes, not crashes/reboots. But anything is possible.

Yeah, I agree, but the evidence was pretty clear:

  1. PC starts randomly rebooting at various times
  2. Check lots of things, including hard drives, find that the 3TB spinny disk is throwing errors
  3. Replace HD
  4. No random reboots since

Maybe the faulty drive messed up 12v power delivery or something when it progressively kicked the bucket. That could do it. I wouldn’t think 3.3v would cause reboots. Maybe 5v.

I mean I do have some spinning disc hard drives from the early 00s in here. I actually disabled one in windows cuz it was falling and causing long hangs when anything tried to access it.

One other piece of weird behavior. Netflix and twitch videos occasionally hang the video for 5 seconds or so while audio continues, then skip forward to catch up. I figured that maybe having 80 open tabs across 3 browsers was finally catching up with my 16gb of ram, but even knocking that down to 20 or less, it’s still happening, and has been going on a similar amount of time to the reboots.

Speaking of which, past 24 hours now without one after doing a few of the software fixes suggested in one of the links above. Man it would be super neat if that were it. I really, really, really don’t wanna replace a power supply, ugh.