voltage stress an margin test of system stability

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Another interesting experience with my gigabyte-ga-ma78gm-s2hp mobo
and amd phenom x3 8650:

The system has locked up twice in the last four days, and I am
contemplating trying some accelerated testing to find out what is
going on.

So far I have tried several   programs that are on Ultimate Boot CD
(UBCD), such as memtest86+, cpuburn and mersenneprime, and all have
passed several hours of testing (7+ hours in the memtest86+ case).

Nevertheless, the system has locked up again, so I am wondering what I
might do to provoke the failure again for debug purposes.

In chip testing, it is common to "margin" the chip by essentially
turning down the supply voltage until the chip starts failing in an
obvious and frequent manner.

My question is: What is the collective experience with such tests at
the system and motherboard levels? One of the problems I run into is
that the BIOS only seems to permit turning the voltages (cpu,
memory, ...) UP rather than down.

I suppose I could also try and stress the system by overclocking it,
but somehow I'd feel more convinced if I could do some voltage margin

Any ideas or experiences that pertain to this matter?

Re: voltage stress an margin test of system stability

On Mon, 2 Feb 2009 23:40:58 -0800 (PST), reikred@gmail.com

Quoted text here. Click to load it

What operating system?  Normally, something low-level, like
a driver, is to blame for complete lockup versus a (windows)
bluescreen or crash, or a hardware problem like PSU problem
or mobo capacitors (power) instability.

Is there any (windows?) OS log that shows a fault, like in
Event Viewer... though Event Viewer in windows will show
lots of errors, it would be something seemingly coinciding
with the moment of the lockup.

Quoted text here. Click to load it

This makes it seem less likely a power problem and moreso a
driver problem.

Quoted text here. Click to load it

Perhaps, but if a chip has a voltage spec, there is no
guarantee it will work below that spec.  I suggest the
opposite, if your board has options to increase voltage, do
so in a small amount (within chip and cooling margins) and
see if that reduces the problem rate.

Quoted text here. Click to load it

See above comment, although it's limiting you can still
effectively check this by increasing voltage.

Quoted text here. Click to load it

Personally, I do test by overclocking (though I often
overclock anything with reasonably margin to do so).  I
overclock till I reach a determination of the threshold for
instability, at any particular bus speed, timings, or
voltage (whatever appears to be the limiting factor), and
after finding that max threshold, for normal use I reduce
the speed or increase the voltage as applicable to retain a
margin between max stable settings and target operating
parameters.  Normally, (rare if ever exceptions like if a
motherboard or PSU had bad capacitors, or if the system was
unattended and so it didn't have dust cleaned out in a
timely fashion) this strategy has worked well.

Quoted text here. Click to load it

Lockups tend to come from hardware failure or drivers.  More
common items to fail include video card, motherboard, PSU.
Try to isolate each if you have spare parts, or check
temperatures, voltages, and that fans are functioning.  As
for drivers, try newer ones if they are not current.

Try to find a commonality in these freezes, if particular
things are running (apps), or system functions are being
used.  Note the interval and whether it depends on time
(like if system had sat long enough to go into a lower power
managed state).

Re: voltage stress an margin test of system stability

reikred@gmail.com wrote:
Quoted text here. Click to load it

Creating the voltage versus frequency curve, is what
overclockers (or underclockers) do. For example, on
my latest purchase, I know that an extra 0.1V on Vcore,
allows a 33% overclock. By proceeding in small steps of
frequency, and adjusting Vcore for the "same level of
stability" for each test point, you can produce your
own voltage versus frequency curve. On an older
processor (Northwood), I got to see the "brick wall
pattern", where at a certain point, all the extra
(safe) voltage that could be applied, didn't allow
any higher overclock.

In terms of features, AMD and Intel have Cool N' Quiet (CNQ)
and Enhanced SpeedStep (EIST). Depending on OS loading,
if these features are enabled, the voltage and frequency are
changed dynamically, at up to 30 times per second. The
multiplier might vary between 6X and 9X say, with some
small difference in Vcore applied to those two conditions,
according to the manufacturer's declaration of what is
enough to make it work.

So if you are having stability issues, your first step is to
disable CNQ or EIST. The purpose of doing that, is not to
blame those features for the stability issue (as they're not
likely to be the problem), but to make the test conditions
a stable, known quantity. You want just one frequency involved,
when doing a test case, as you're attempting to do

On my processor, I believe the Vcore setting is policed by the
processor. My Core2 has VID bits, to drive the Vcore regulator.
And by using tools that can control the multiplier setting, and
drive out new Vcore values while the system is running, the
processor seems to have an upper limit set, as to what bit
pattern it will allow to be passed on the VID bits. That
prevents any useful level of overvolting on my newest system.
Previous generations of systems, used things like overclock
controller chips, to allow "in-band" VID changes.

On some motherboards, you may notice the nomenclature "+0.1V"
for a Vcore setting. Rather than a more direct "1.300V" setting
in the BIOS. I interpret this to mean, the motherboard design has
a feature to bump Vcore, independent of the VID bits. So the
"+0.1V" thing is meant to imply an offset applied in the
Vcore regulator. I had to do something similar to my motherboard
with a soldering iron. I now have a socket, where I can fit
a 1/4W resistor, and by varying the value, I get a voltage boost.
My motherboard is unlike some other brands, in not offering
any out-of-band voltage boost feature. So I had to implement
my own, using instructions from other users who did the
analysis before me. You likely won't have to go through this.
I'm explaining this, in case you cannot reconcile what is
happening while you're testing (setting says one thing,
measured value is some other value). If the set value and
the measured value don't match, part of that difference is
due to "droop", and part can be because of a boost which is
applied independent of the VID bits.

As Kony says, a driver could be responsible for the problem.
The Mersenne Prime95 test is pretty good at finding bad
RAM, and since you've run that for a few hours, that
helps to eliminate bad memory. Prime95 can only test the
memory which is separate from the portion used by the OS,
so it is possible there are still some areas of the RAM
that have not been tested as thoroughly.

Other things that might freeze, might be a misadjusted
bus multiplier, like what is used for Hypertransport between
processor and Northbridge. Or a SATA or IDE clock which
is too far from nominal. So clock signals to other
hardware parts in your system, could give a freezing
symptom. Data or some transaction to the processor
could be frozen, and the processor might still be

Another comment - I've noticed on my older overclocking
test projects, that the processor would crash on an
error. My current Core2 system tends to freeze, rather than
giving an old-fashioned blue screen. So there can be
some differences from one generation to another, as
to what part of the processor is failing, and whether
the system runs long enough to splatter something
across the screen.


Re: voltage stress an margin test of system stability

reik...@gmail.com wrote:

Quoted text here. Click to load it

I've seen many memory modules pass MemTest86+ but fail MemTest86.
Similarly, many modules passed GoldMemory ver. 6.92 but failed ver.
5.07.  OTOH every module I've tested that failed GM ver. 5.07
eventually failed MT86 ver. 3.xx, and vice-versa.

Re: voltage stress an margin test of system stability

Quoted text here. Click to load it

That is interesting information. I have a somewhat superficial
knowledge of memory testing, pattern sensitivities and such. I wonder
what is the difference between the programs, especially considering
that the NEWER versions appear to be less stressful than the older
versions in some cases.

On a related note, memory errors are sometimes (perhaps often?)
transient. Do any of the programs keep and save a bad-address-list so
that one can go back and retest the specific address (or regions)
where the failure occurred? At least some of the programs run very
much standalone with little OS support...

Re: voltage stress an margin test of system stability

reikred@gmail.com wrote:
Quoted text here. Click to load it

A wise mentor once told me the secret to debugging.  I'll pass it on
to you if you promise to keep it a secret.

Every time you type a question mark, stop and ponder the question.
EXACTLY what are you going to do with the answer?
Plot yourself a decision tree, if only in your head.
If the answer is yes, I'm gonna do this.
If it's no, I'll do that.
If it.s > 3.4, I'm gonna do the other thing.

After you've done this for a while, it will become obvious
that most questions (tests done) don't need to be asked.
If you're gonna do the same thing no matter what the answer,
skip it and move on.

Another thing that happens is that most of the branches lead
nowhere.  If you can't hypothesize a set of results leading
something you can actually fix, you need a new plan.
If a set of answers leads nowhere, you don't need any of the
intermediate results.

Pondering the range of possible answers to your question
leads you to much more efficient debugging.
This is a process that will give you better than average
debugging results...but you won't find the cure for cancer with this

So, back to your question...
You turn down the volts and it fails.
Now what?
Can you be sure it's the same failure?
How much lower is enough lower?
And what are you going to do to fix it?

Some questions can't be answered with technology you can afford.
Even if it is a voltage problem, you won't be able to
measure it with a voltmeter.  You'll need a VERY fast digital
storage scope and a set of probes you'll have to mortgage your
house to buy.  It'll be a voltage droop when during a dma
transfer while the disk is seeking and the video memory
crosses a certain memory address and all the address lines
change at once...on tuesday when the moon is full.

You do this kind of debugging on prototypes and subsystems.  For failed
customer units, you throw them away.

Re: voltage stress an margin test of system stability

On Fri, 06 Feb 2009 06:41:59 -0800, spamme0

Quoted text here. Click to load it

Considering the impedance and capacitance in the circuits
involved, a relatively slow cheap scope would find such a
problem, even a good multimeter with high/low hold value
feature probably would.

Re: voltage stress an margin test of system stability

Quoted text here. Click to load it

Point taken, in my case I was trying to determine whether it really
was bad hardware or (as some suggested) a software bug. Right now I am
leaning toward the latter, as the system has not frozen since I
upgraded the video driver (knock on wood).

Re: voltage stress an margin test of system stability

On Feb 6, 12:14=A0pm, reik...@gmail.com wrote:
Quoted text here. Click to load it

Well, the knock on wood was not enough. It froze twice Friday night
and later overnight, then has been running ok since then.

Site Timeline