RAM damaged or not?

vbimport

#1

Hi :slight_smile:

I have a mainboard that is acting crazy by a long time, and at last :o I found money to replace it.

When I use the computer 24/7, the system crashes randomly after a variable amount of time. I never was able to find what is the real culprit, but every time the system crashes, running the memtest after a forced reboot I get a ton of errors. Sometimes I was able to notice a problem before a BSOD or a crash, and rebooting the machine to run memtest I get errors all the time.

The strangest thing is that shutting down the PC and waiting for about half hour, memtest gives no errors at all :eek:

Then the machine can run 24/7 again for some time (longest time without crashing was 7 days, but most of times a crash comes after 3-4 days; sometimes after less than 1 day :doh:).

I changed RAM three times on this mainboard (a Gigabyte P35-DQ6); I also changed the PSU, but the problem is still here.

So I’m assuming that the culprit is a faulty mainboard.

My question is: if I get a new mainboard, can I assume that RAM are still good or I should get new RAM too? Can I be sure that RAM are not damaged even if I get errors every time that the machine crashes but errors are not more present after half hour??

Because of sometimes I need to leave the machine running 24/7, there are some things I should check before to buy a new mainboard? Any suggestions about a specific brand/model or any desktop board is good to run 24/7?

What about the PSU?

Is it possible that I damaged the RAMs or the mainboard itself running the machine 24/7 for a too long time?

Any suggestion is welcome :bow:

Thanks :slight_smile:


#2

So a few things to try. First update the Bios and set the factory defaults. Try memtest86+ again and see how it goes. Next thing is to check the ram settings, both voltage and timmings it may be setting something too aggressivly for example it might be set to ‘auto’ for the voltage and may set it too high, so try setting it manually. Start at a low voltage on the ram and then test it with memtest86+ until its stable. Another thing to try is clean the memory sockets with an old toothbruth (make sure its fully dry and clean :wink: ) also make sure you unplug the PSU when doing this.

edit: since its ddr2, try setting the timmings manually to 5-5-5-15.


#3

Thanks for answer :slight_smile:

I already did all these things. Bios is already latest version (I updated it as first thing when I installed the mainboard and no new version was released).

RAM voltages are set correctly (Thanks to [B]Dee[/B] that helped me for this :bow:), and the system is not overclocked at all.

I also cleaned the RAM slots, I swapped the RAM between slots, but the random crashes are still here :doh:

There is a way I can check if RAM are damaged or not? I’m pretty sure that the problem is in the mainboard, and I’m ready to buy another one, but I’m not sure if I have to get a couple of RAM sticks too (I’d like to save €100 if possible :iagree:)


#4

Use just one stick of ram at a time and try again with Memtest86+, be sure your using the program from this site: Link. then try different slots, one stick at a time, maybe its just the first or second slot thats bad.


#5

The memtest I’m using is the one provided in the Ubuntu live CD, I think that is reliable :slight_smile:

I already tried all the RAM slots, but memtest is still error free, except when after some time the PC randomly crashes :doh:


#6

Try removing the Northbridge heatsink and clean and re-apply thermal paste on it, just a small dab in the middle, then squish it down.

Its probably the MB, and its up to you if you want to try and check the components feeding power to the memory slots. Also check for cracked solder points in the area surrounding the memory slots.


#7

I’m not sure that I’ll be able to remove the heatsink without damaging something :o

I can check for cracked soldering points next time the PC will crash.

I forgot to mention that I checked for overheating problems opening the case just after a crash, and both the chipset heatsink and RAM heatsink are not hot. My case has a 20cm side fan, and it keep all the hardware pretty cool :slight_smile:

And also with a good dose of dust :bigsmile:


#8

The thing is with the heatsinks, if the microchip is not making good contact with the heatsink, then the heatsink will not get hot :slight_smile:

Is this one of those super big heatsinks thats connected to the cpu’s power supply? Depends on how the manufacturer decided to attach the sticky parts, some come off very easy and others are a glue like adhesive. If you find the northbridge is difficult to un-stick then I would just leave it on.

Another thing to try would be to use Prime95 and use the torture test that doesn’t stress the memory, this way you might be able to rule out the cpu.

So were you saying that Memtest86+ is always error free? If you can run it for an hour or so and there are no errors then the problem is not the memory. You might want to dl the version that I linked to just to make sure you have the latest version. There’s a version that will let you make a bootable CD.


#9

memtest gives errors only just after a crash. If I wait for about half hour and run again memtest there is no error at all. It is very strange indeed :doh:

I’ll check the heatsink asap, but I think that I’ll get anyway a new mainboard, just to be safe :eek:


#10

Try bumping the voltage for the ram by ~.1V and run memtest86+ (not the one in linux) for at least two full passes.


#11

[QUOTE=geno888;2503475]I’m not sure that I’ll be able to remove the heatsink without damaging something :o[/quote] Don’t remove it then. It’s not worth it.

I can check for cracked soldering points next time the PC will crash.
I doubt you will find any

I forgot to mention that I checked for overheating problems opening the case just after a crash, and both the chipset heatsink and RAM heatsink are not hot.
But are they warm enough?

And also with a good dose of dust
Too much dust means no airflow. No airflow means overheating, even if the coolers are luke warm.

Get some pressurised air (or a air compressor), test the force of the air pressure and then remove all dust using the compressed air. Don’t use your Axe Deodorant. :slight_smile:


#12

Heatsinks are not hot; even if I don’t have a suitable thermometer, I can say that temperature is not higher than 35-40C because components are not hotter than my fingers.

I do periodically cleanings with compressed air (once in two months approximately) of fans, CPU heatsink and case internals. There are some dust filters over the fans, so the amount of dust internally is not really huge :slight_smile:

I already found a replacement mainboard. I’ll run again memtest using latest version just to be sure that RAM are working good :slight_smile:

Thanks for answers :bow:


#13

Before you buy try what I just mentioned :wink:


#14

[QUOTE=eric93se;2503898]Before you buy try what I just mentioned ;)[/QUOTE]

I did it yesterday… ehm actually today (I ran tests until 2 AM :o).

No errors on RAM. According to CPU-z, both sticks are running at correct voltages (based on specifications released at manufacturer website [Crucial]).

I still wonder what can be the cause or the random crashes :doh:

The only components I did not changed in this machine are HDDs and VGA. And the mainboard of course. All other components were swapped to run tests.

I can assume that the CPU is not the culprit :eek: nor a CPU overheating for three main reasons:

  1. if the CPU is damaged, the machine will not start at all

  2. there is no overclock set in BIOS, and I installed an oversized CPU cooler (Thermalright Ultima-90)

  3. every time I was able to see a crash (i.e. when a crash happened with me at the keyboard) the CPU was not running 100%

Just as a side-note. I installed an over-sized CPU cooler because in this way I don’t need to use a noisy fan. Even if I never tried it, probably with that cooler I can use the CPU (an Intel E8400) with a completely passive cooling :eek:


#15

Is the HDD old? You can try an error scan with HD tune.


#16

HDDs are all recent (lesser than 1 year old), all WD drives :slight_smile:


#17

Did you say the machine is Prime95 stable?


#18

Uhm… Aren’t all these tests a bit backwards… Since you do get memory errors there’s something going on we just need to figure out what obviously. :slight_smile:

First of all grab a fresh memtest86+ iso and burn it
http://www.memtest.org/

Disconnect everything that isn’t necessarily which means that you’ll end up with video card, cpu and memory and an optical drive. Since the computer experiences BSODs I wouldn’t recommend you to upgrade BIOS even though it may cure some issues. First clear CMOS to reset bios. Do not make any adjustments regarding RAM etc unless you use some non standard stuff. Since it’s probably not the RAM sticks unless you’ve changed for the same model all times which may be incompatibility related pull out all and keep one in slot 1.

Run memtest86+ for lets say 6 hours or so, passes ok?
Run a LiveCD with something CPU intensive for 6 hours and if it passes insert another stick of RAM and start from the beginning.

Start there until you’ve tried with all your memory sticks running both memtest86+ and something cpu intensive. The reason why I’m suggestion such a long testing time is because it may be heat related and it takes some time before those issues are noticed.
//Danne


#19

Thanks for jumping in Danne, buying a new MB would be giving up too soon IMO :wink:


#20

It’s kinda pointless since it may be caused by something else…
//Danne