My SFF gaming PC had been having some minor problems in the past few days, mostly CTDs when saving in Baldur’s Gate 3. I’d assumed it was a BG3 problem.
Last night, though, it started BSOD-ing either during the boot sequence, or within about 30 seconds of logging into Windows. The BSODs were not consistent. Among the ones I got:
critical process died
irql not less or equal
unexpected kernal mode trap
Kmode exception not handled
Given the variety of BSODs, I’d assumed either corrupt Windows files or a memory error. But trying to reinstall Windows, and removing one DIMM at a time, didn’t help. (See full list of troubleshooting steps tried at the end of this post.)
All I can figure at this point is that it’s a failing power supply, given the random nature and timing of the crashes.
Can anyone think of any other likely candidates for causing this?
TIA!
System:
Cooler Master NR200P Max case (includes CPU/case cooler and power supply)
Removed video card and booted with integrated Intel graphics
Tried removing one DIMM at a time and booting with just one
Ran memory test (no issues reported)
Deleted existing partitions and attempted to reinstall Windows on each of the two SN850X drives (BSOD, “KDMODE Exception Not Handled” at differing points in the installs)
Checked all cables/DIMMs for firm insertion
Checked temps in BIOS
Updated BIOS to latest version
Turned off XMP on the RAM and ran at 4800 MHz 1.25V
But what the heck. Since I’m going in the case with a screwdriver anyway, I was planning on adding a cheap PCIe 3.0 m.2 SSD in the one open slot at some point, and the Intel 670p was $64 with same-day delivery, so… We’ll eliminate the SSDs from the equation as well!
I just went through this, and this is my guess as well. Unless it’s a PC you carry around and bang up and down, reseating memory or failing memory probably isn’t an issue. It’s very rare. Your SSD, however, could easily be the cause of a number of different issues that show up differently. BG3 at least prior to patch, was especially known for its larger save files and if you play like me, I save constantly. This could exacerbate a problem if it was SSD related.
Back up as much as you can. When you get the new SSD, hopefully you’ll be up and running again without issue.
One thing that helped for me was that I loaded HWinfo64 and monitored both my M.2 slots for heat. The one that failed is closest to the bottom of my video card and I’m pretty sure based on high heat watermarks on the new one that the old one failed due to heat issues over time. I plan on an extra fan to hopefully bring those temps down, but there is no way I’ll get them as low as my (thankfully) boot drive which is in the other M.2 slot. All I lost was my game data/Steam drive.
Idle temp in BIOS for the CPU is 32 and it looks like the cooler is running.
Tried swapping out RAM from another PC. Same issue.
Tried a replacement power supply. Same issues.
I’m really at a loss. Going to swap out the SSD next – just got the cheap Intel drive. But there are two identical SSDs in there and the issue happened no matter which one I installed WIndows on, so I doubt that’s it.
We’ll know once my frustration level subsides a bit and I have the patience to go back in the case and put in the new SSD if y’all are right!
Kind of leaves the motherboard or CPU by process of elimination, if the SSD swap doesn’t help.
This is where I saw my M.2 temps but note that you can also check quite a bit about the motherboard if you look at those sections (for me that’s the ASRock Z490)
System is too unstable to see event viewer, logs, etc. Was BSODing during load, so I decided to wipe Windows and reinstall. Best I’ve done is made it through the first part of the reinstall.
Gonna try a different ITX motherboard and see how that goes. Hardware’s still under warranty, luckily.
Any chance the cpu temp reading is faulty and this is as simple as needing to reinstall the cooler? I guess you are swapping mobo anyway, so maybe moot…
I think it’s got to be a failing component rather than something as simple as CPU cooling. The PC hasn’t moved or had anything changed since I first set it up.
My money is on failing power supply. You can double-check the connections from the PSU to the motherboard though. You could also try underclocking the CPU, see if the system becomes more stable in the short-term at least.
I hate to suggest it, but a dramatic fault show coming on suddenly could also be a latent build error like a loose screw caught between motherboard and case, or an xtra or misplaced standoff, or a pinched cable?
If you try just the motherboard, cpu/cooler, minimum ram, ssd, psu sitting on cardboard on a table, and its still unstable, then its got to be mb/cpu.
You’ve done great isolation troubleshooting, now it’s just up to having new hardware to test from there. To come on so suddenly though, something is wrong, for sure. I’m wishing you luck, man. Nothing worse than your game system being out of action on a 3-day weekend. Especially so these days since there aren’t any places to go get components at retail, or at least that’s the case around here.
I’ve been having a similar crisis over the last week or so. Computer crashed for the first time last week (since I replaced the video card that was DOA). Then began to freeze, reboot or bluescreen at increasing intervals. I haven’t even ever overclocked it, just the settings it was set to from the factory. Tried a bunch of stuff including memtest, which I thought had nailed it since it did come up with one error on the first pass. I uninstalled one chip, crashes. Put the first chip back and remove the other, and then it booked and was stable for a while. Replaced both chips with new ones and things seemed fine for a bit too. Gamed for a few hours, but then I noticed later when the computer was sitting idle it froze up again. Rebooted and it worked fine nearly all day, though some hours of Starfield, then crashed again and is not stuck in a crash on boot cycle. Sometimes it boots, but freezes or blue screens shortly after. I’ve seen just about every blue screen error code in the course of it. It won’t even boot windows off a usb so I don’t it’s not the boot drive. I don’t have a PS to swap out. I’m pretty much out of troubleshooting ideas and may have to take the dreaded step of RMAing the whole machine to CyberpowerPC.
I figure it’s got to be the MB or the CPU at this point. It’s a i9-13900k in an ASUS z790-P.
Swapped the motherboard, RAM, power supply, and SSD, and took out the video card, still crashing.
Then I realized… I’m still using the original power supply’s cables.
So, it’s currently in pieces and I’m going to try new power supply cables.
Fingers crossed.
Sorry you’re dealing with that, @Thrag. At least warranty is there as a last-ditch possibility. I wish I could send this to someone else at this point!