The clean driver install didn’t solve my problem. Just rebooted in the middle of playing Skyrim again.

Here’s my reliability chart! Keep in mind I just restored the system so in reality this actually all started on October 5th, not the 29th.

Some of the entries give me stuff like:

Except I don’t see any memory.dmp in the Windows directory.

The majority of them just say “Windows was not shut down properly”. The event viewer is full of critical errors that just say “Event 41, Kernel-Power”.


Well, you could try submitting the dump files to MS. I know there’s a forum where you can do that. I googled it the other day.

The other option is to bite the bullet, do a true clean install after nuking the partition.

If you still have an issue after that, you can be pretty certain it’s a hardware issue.


That particular stop error (0x116) appears to be associated with a graphics driver crash.

The 41 kernel-power always shows up with an unexpected shutdown, no information to be found there.


It certainly could be the video card then. The stop errors that I’m seeing here are 116, 119, and 10e, which are all video card related.


Try to underclock your card, reduce its voltage, and/or increase fan speeds. Use MSI afterburner.


Alright, I ran that reliability thing. It simply lists a hardware error. No other information. Is there some other way I can figure out what happened? This reliability report only goes back to 10/6/16. That day also has a similar report, although it is the only other day with an hardware error report. Both days have multiple instances reported at the same exact time (per day, not that each day’s failure occurred on the same time as the other day).

I just found that there is a bit more detail if it means anything to anyone:

Hardware error

‎11/‎3/‎2016 4:33 PM

Not reported

A problem with your hardware caused Windows to stop working correctly.

Problem signature
Problem Event Name: LiveKernelEvent
Code: 141
Parameter 1: ffffbd846cb59010
Parameter 2: fffff806ebafd15c
Parameter 3: 0
Parameter 4: 4
OS version: 10_0_14393
Service Pack: 0_0
Product: 256_1
OS Version: 10.0.14393.
Locale ID: 1033


Well it runs FurMark without incident (gets up to about 80c with the fan climbing to only 53%), so I figure the GPU itself is fine, but I suspected the video memory so I looked for some GPU memory tester programs and the two I found didn’t produce any negative results.

However, I downloaded eVGA’s OC Scanner tool, and using the “Furry E (GPU memory burner::3072MB)” test, I can get a repeatable reboot once it climbs to 3500~ MB during the loading phase. Does this indicate the problem, or is this a normal side effect of the 3.5 GB + 500 MB memory issue from my GTX 970?

Is the purpose of that to induce a crash or prevent one? It’s definitely not heat or fan related because it often crashes when launching a game when the card is idling at 40c.


I have been doing some research, I found a application called whocrashed. In a way, I am pleased, because it means that my main hardware is fine, but my relatively new NVIDIA card may not be :(

On Thu 11/3/2016 4:33:44 PM your computer crashed
crash dump file: C:\WINDOWS\Minidump\110316-25234-01.dmp
This was probably caused by the following module: nvlddmkm.sys (nvlddmkm+0x962130)
Bugcheck code: 0x116 (0xFFFFBD84676544A0, 0xFFFFF806EC302130, 0xFFFFFFFFC000009A, 0x4)
file path: C:\WINDOWS\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_848dea456d3c865e\nvlddmkm.sys
product: NVIDIA Windows Kernel Mode Driver, Version 375.70
company: NVIDIA Corporation
description: NVIDIA Windows Kernel Mode Driver, Version 375.70
Bug check description: This indicates that an attempt to reset the display driver and recover from a timeout failed.
A third party driver was identified as the probable root cause of this system error. It is suggested you look for an update for the following driver: nvlddmkm.sys (NVIDIA Windows Kernel Mode Driver, Version 375.70 , NVIDIA Corporation).
Google query: NVIDIA Corporation VIDEO_TDR_ERROR

Now, what to do about it? Its odd though since my computer never has issues once it has booted up.


Do the stuff I said. See if that fixes it.


I underclocked the card as much as it allowed and put the fan speed at 75% fixed and…it still crashed at the exact same point. There doesn’t seem to be an option to lower core voltage—only increase it.

So I guess the interesting thing here is that, while this crash is completely repeatable and certainly appears to be an issue with video memory, I’m not sure it explains the other crashes. Because there’s no way that, say, Portal is using 3.5 GB of video memory during start-up (or ever). So maybe that means it’s just trying to utilize the “bad” memory addresses? If that’s the case, then why don’t the 1 GB and 2 GB stress tests ever fail? Why only the 3 GB test?

The card is still under warranty at least, so if I can rule out a driver issue I can send it back and get a replacement.


I loaded up GPU-Z and had it create a sensor log and then I ran Skyrim until it crashed but it doesn’t seem to indicate anything. It was at 77c, 52% fan speed, and 2444 MB video memory usage. It had been at those levels, or higher, for some time prior to rebooting, though.


Stupid question because you probably already did this - have you unseated/unplugged and re-seated/plugged in the card? Perhaps one of the 12v cables isn’t firmly in place.


Yep I’ve done that. I think I’ll try to track down an nVIDIA driver from earlier this year when I wasn’t having issues and see if installing that does anything.


