Vista x64, 8 GB, system purchased 2009.

Recent symptoms:

Corruption in data:

Also seen a few corrupted web pages - it’s as if some brackets so missing so HTML code ends up as plain text
Most noticeable in 100+ MB executables (through firefox)
Further tests with downloading with Impulse showed failed MD5

Tests I’ve done:

Changed ethernet cable and port on switch
Downloaded stuff through a laptop on same place, seemed ok
Replaced onboard with new Intel NIC
Ran Vista’s memory test overnight (no errors reported)
chkdsk /r did not report errors.

What else can I test? Luckily I do most of my work on a VM machine but if I can’t track it down I’ll have to build a new machine to work from, which requires many hours of annoying customization.

I am thinking some way to isolate where the error is. If I copied 300 GB files from one hard drive to another and somehow ran some sort of MD5 I could remove ethernet from the equation? Maybe winrar/7zip has built in CRC checks, but I don’t know if they trip often enough.

Why do you think it’s an Ethernet error? I’ve never seen such a thing. Have you tried a different physical medium, such as wireless?

If it’s possible to remove your router/switch from the equation and connect directly to whatever your ISP provides, try that. It’s possible that packets are getting corrupted as they’re being routed, as that’s one of the vulnerable points where it can evade the link-level checksums. (There’s a TCP-level checksum as well, but it’s pretty weak, so corruption could potentially slip through if there’s enough of it going on.)

Otherwise, you could try more thorough memory tests like memtest86 or Prime95, or like you suggested, copying a lot of data around in archives (the checksums they use are pretty strong).

Thanks guys,

I suspected the wiring it seemed the cursory tests on all other options seemed okay. There’s been a lot of remodeling, a couple of rodents, and a big storm the last couple of weeks.

Good call on the Prime95. It’s been running torture test with no errors yet.

I did the DSL modem test a few days ago - it kept disconnecting (3 times within 5 minutes). Verizon claimed they found a switch issue somewhere upstream (big storms recently) I’ll have to repeat that after hours. Note error occurred on entirely different PC.

It would be funny if there were actually errors both on my desktop and the DSL connection.

Is this what people use Ethereal for?

Ethereal is superseded by Wireshark. If you suspect ethernet issues then check your NIC statistics and see if there are dropped packets or any of the counters for problematic conditions.

software firewall? ram? cpu?

The only time I’ve seen issues like this was due to a bad router. Ethernet cable problems result in flaky on/off (or nonexistent) connections. It’d be rare for actual data corruption.

I’d be surprised if it’s a problem with your layer 2 (ethernet) network but this isn’t a bad methodology. If you zip up a large file, transfer it across and then unzip it, 7Zip should tell you straight away whether or not there’s a problem with the file. Any data corruption or problem of that nature with the file will result in 7Zip (or any zip program) being unable to extract the archived files. Look at this as well - I haven’t used it but if you run it against your file before and after transfer then you should get the same MD5 sum. Different sum == corrupt data.

I’m leaning towards a problem with your router. Specifically, I’m thinking that since your router is examining and altering every packet that enters its WAN port (it has to, not only is it routing packets but it’s also performing NAT and thus altering the headers), for some reason it’s dropping some as well - possibly due to some dodgy memory or something. What kind of router is it?

Previous tests: chkdsk /r and basic Vista memory test clean last night.

Equipment:

Verizon-provided DSL modem
D-Link DGL-4100 “gamerlounge”. It’s seen reliable use for many years.
Main switch is a 24 Port DLINK DES 1026G (Gigabit, unmanaged)
Switch downstairs is a netgear GS108 (Gigabit, unmanaged)

Router statistics (reset a few days ago)


LAN Statistics

Sent : 	11635046 	        Received : 	7691442
TX Packets Dropped : 	0 	RX Packets Dropped : 	0
Collisions : 	0 	       [b]Errors : 	41[/b]

WAN Statistics

Sent : 	6989335 	           Received : 	12217772
[b]TX Packets Dropped : 	231[/b] 	   RX Packets Dropped : 	0
Collisions : 	0 	                   Errors : 	0

Are those errors too high? Note I am in not an IT professional, just
a hobbyist stuck fixing stuff.

Current test - I am doing all tests within the LAN to take DSL out of the equation.

I plugged a laptop upstairs and I’m 7zipping files back and forth.
Prime95 torture. I am thinking this test does not involve the router in any way, it should go from my NIC, through 2 switches, to laptop.

I am also downloading big files upstairs to see where problem could be.

Still have not retested a direct DSL connection, hope I can get it done tonight (for the intermentency I had before) Of course some lame tropical storm is coming so who knows.

P.S. I appreciate all the feedback, it makes everything less overwhelming.

It’d be nice to know what kind of errors they are, but still that error rate is pretty low. I don’t think it’d be the source of your problems. And the drops on the WAN side are pretty low as well.

Is there another computer available to do tests with? Download the same files with the same cable and see if the problem lives in the machine or not?

41 errors in almost 20 million LAN packets. I wouldn’t call that worrisome.

This is a pretty handy tool I usually install for getting file hashes: HashTab (for Windows or Mac)

It’s much more likely for the server to be sending you corrupt data than an ethernet issue. Each packet has a checksum, each page has a content size. You could use fiddler to check and see that the content size matches. It’s also odd that you mention this happens in executables. You might want to run a virus scan.

It is either this or (more likely) I agree with dermot, et al that something in the route something is abruptly closing connections, either your router or some infrastructure upstream.

Hence checking the http page size. From the page header:

Content-Length The length of the request body in octets (8-bit bytes) Content-Length: 348

This could be a lot of things but it sounds like you already isolated the network as NOT being the cause in your troubleshooting, unless I misread you. You changed cables and NIC on the bad PC without fixing the problem. You tested successfully with a laptop on the same port? Meaning, the Vista machine is the only one having the issue, and only on larger downloads? If I’m summing that up correctly, I would take a first guess at one of the following:

  1. Memory failure or hard drive corruption.
  2. Virus.
  3. A mangled tcp stack due to VPN install/uninstall.
  4. Incorrect MTU/RWIN settings modified from default on the PC, namely the former.
  5. Installation of anything that directly affects the network on the PC.

Most of those are easy to check. If what I read is not correct, and a secondary machine shows the same symptoms, let me know.

EDIT: Adding TCP Chimney Offload, aka TCP Offload Engine (TOE) to that list. Although unlikely to be the issue due to your version of windows, people are known to sometimes click things on that really don’t need to be on.

memory or cpu is my wager, having been through this before. are you overclocking at all?

There was one error copying LAN files. This threw me off, but I can’t replicate. I’m going to toss a 100 GB image later, if it works, I won’t think about it anymore.

Pending repeat tests (my internet is slow), I am 98% certain it’s just this desktop (yay!)

Summary:

Desktop A succesfully moves files to several other computers on the LAN
Desktop A succesfully copies many files from own hard drive to own drive (this writes off CPU/memory issues I think)
Desktop A unsuccesfully downloads files from the internet. (Linux ISO failed MD5 all 5 times, different hashes each time.)
Laptop B succesfully downloads same files from the internet.

  1. Prime 95 has not complained (5 days so far)
  2. Virus scans done, but comprehensive test (pulling drive into another PC) not done yet.
  3. Mangled stack could be possible.

This #3 option is very interesting. The last month I installed two sets of VPN software (different locations). I also VMware server and Vsphere client installed.

*Uninstall both VPN’s
*run netsh winsock reset
*Test connection
*Install VPN’s one at a time, testing in between.

After many, many, many tests it was fixed with a winsock 2 reset: “netsh winsock reset”. The automated Vista diagnose/repair command didn’t help.

The good thing of this mess is I finally got around to checking Ubuntu out (I used it to check the hardware) and I am now confident in the robustness of the LAN cabling. Thank you all for the help.