Help with diagnosing consistent hardlock

I’m so over this computer upgrade holy shit.

I upgraded from a 3570k to a Ryzen 5 1600X, mostly for better virtualization support and better multicore performance. Since it’s a complete socket and generation change I upgraded my motherboard and RAM at the same time (DDR4 2600Mhz). Graphics card stayed the same, a Sapphire Nitro R9 Fury.

Since building this machine, Unigen Heaven benchmark would consistently lock the whole computer up in almost always the same spot (about 2 minutes after starting the benchmark). When this happens I would lose all video through my Fury (screen completely black) and I can tell the OS is no longer working. There are no windows event logs outside of it complaining about unclean shutdowns.

Running the benchmark on low and uncapped framerate did not seem to trigger it.

Other games (like Grim Dawn) would trigger it after about 15 minutes or so of gametime. I could not get it to happen with 3dmark stress tests for whatever reason.

I found out my Ryzen had the compile segfault bug, so I did an RMA with AMD hoping that would fix it. Got my replacement and verified I no longer have the compile segfault bug and same thing keeps happening.

It’s not thermal, as running the 3dmark stress tests show the CPU never getting above 50C and the GPU never getting above 55C.

I swapped my Fury for the RX 550 I was about to send back, and lo and behold everything works perfectly and I have not (yet) been able to get a hardlock like before. Unfortunately we are talking about a benchmarked FPS difference of 76 vs 21. It’s also a 2Gb card vs 4Gb HBMM.

This seems to me like either a power supply issue (can’t fund all the juice for both ryzen and Fury) or my GPU is now bad.

The power supply is 730W PSU bought back in 2013. I’d be shocked if that’s not enough power for it but the TDP is now 95W compared to the 3570k’s 77W.

The GPU was working perfectly fine in the old build before the swap, but I guess anything can happen in the process of a new build.

At this point I’m at a total loss of what’s going on, nor do I have a good way to diagnose it. I could totally dismantle this build and put back in the 3570k but that’s a royal pain in the ass that I would like to avoid.

Can anyone think of any way to figure out what’s going on?

I’m inclined to think it’s your power supply. I just swapped out my 980 for a 1080 and my computer wouldn’t start and my total tdp had gone down significantly since I changed from a amd 9370 to an intel 7700k a month earlier. After the motherboard and CPU upgrade everything work fine until I changed gpu’s. Even put the old gpu back in and it still didn’t work. Most of the advice I got from here was bad power supply. So I got a new psu and everything’s been working great. My psu was about as old as yours is, so I would get another one and see if that’s your problem.

This isn’t even remotely helpful, but I did not think the last part of that title said “lock”

This sounds like a bad stick of RAM to me. Make a boot disk with Memtest and see if you can identify a bad stick of RAM.

https://www.memtest86.com/

Run memtest overnight, then run prime95 overnight. That is table stakes to even begin troubleshooting.

Memtest ran for 7 hours, no errors.

Stupid idea, but is there an update available for your motherboard?

Already on the latest bios :-/

I still think you should pull one stick of ram out until it happens again, and then swap sticks and test again.

EDIT - Sorry, got on the phone with someone, didn’t get to finish my thought - I had this happen to me several years back, I’d come back to my PC and it would be locked up, took me months of trouble shooting and I finally pulled a stick of RAM out and after a few entire weeks of no problems, just threw the stick out and bought a new one, no problems since (my son is using that PC now and it continues to have no problems).

Some observations…

  • Your temperatures seem a little low, I wouldn’t trust 3dmark to report those properly especially the CPU.
  • On the subject of your GPU being bad, you were pairing it with an older CPU & motherboard. Its possible this is the first time you’ve been able to stress it fully. Try FurMark for stress testing it.
  • PSU performance will degrade over time regardless. And its a bit more complicated that total Wattage. Your results are quite consistent though? If it couldn’t provide enough power, then you’d expect to learn that immediately. And if it couldn’t deliver power consistently, then your results might be a more random. Having said that, the PSU is one of the cheaper items to replace.

What’s a reliable tool to monitor temperatures? Previously I had always used OpenHardwareMonitor but it apparently can’t see temperature sensors for my Ryzen.

HWinfo
https://www.hwinfo.com/

See a doctor if it’s been consistent for more than four hours …

What about prime95?

I had not had a chance to do an elongated prime95 run but unless two different cpus are bad in the exact same way I’m not expecting prime 95 to show anything.

Does not matter you need to do it otherwise you are not testing right.

Yeah my R9 Fury must be shot :(

Furmark instantly brings the whole machine down (it doesn’t even render the first frame). Hwinfo logging shows my CPU and GPU never get above 52C and 65C respectively before failure, yet 3dmark time spy benchmark gets temps up to 61C and 68C respectively, so it’s not stopping due to heat related issues. 3dmark is also showing 248W pull from the Fury where last logged event I got from the unigene heaven benchmark is a 235W pull, so if it was a PSU not providing enough power I’d expect both to fail at that time.

Furmark works perfectly fine with my RX 550. So some very specific operation seems to trigger the R9 Fury to just shut down. I’m going to try to bring my Fury to work tomorrow so my friend can try running the benchmarks in his machine to at least get another option.

If anyone is curious, raw hwinfo data is here. Note that the tctl thermal can be ignored (or subtract 20C from it) as apparently the 1600X, 1700X and 1800X all read that value higher than it really is for some reason.

I would try a different GPU. Get a 1060 or something. And run Prime95 overnight.

I have an Rx550 that is working flawlessly. I thought I mentioned that. Prime 95 is running now.

Brought my R9 Fury into work and put it in my coworker’s Alienware Graphics Amplifier. That thing is pretty impressive actually.

But Furmark and Heaven benchmarks worked flawlessly with no issues at impressive FPS (considering ti’s a U processor, even at 1440p). So I’m back to either power supply wonkyness or maybe the motherboard is an issue. I don’t know anymore sigh…

Edit: and yes an all night Prime95 run passed.