Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Some Crafty 16.19 results on my XP 2.44GHz

Author: Keith Evans

Date: 20:38:06 02/19/03

Go up one level in this thread


On February 19, 2003 at 18:12:03, Robert Hyatt wrote:

>On February 19, 2003 at 17:29:28, Aaron Gordon wrote:
>
>>On February 19, 2003 at 16:30:14, Robert Hyatt wrote:
>>
>>>Sure there are.  And they take _years_ to run.  You have to run all four billion
>>>possible
>>>values thru every possible instruction, in every possible sequence, with every
>>>possible
>>>pipeline delay, at every possible temperature, with every possible voltage
>>>variance with...
>>>
>>>You get the idea...
>>
>>No one tests this way, not AMD, Intel, IBM, etc.
>
>
>Of course not.  The engineers look at the gate delays, add them up, and set the
>clock rate
>just longer than the longest max delay they found.
>
>You are doing the opposite, and you have to test it in a different way to see if
>_your_
>processor has faster gates.

I totally agree with Hyatt about overclocking. For something simple like a
standard cell ASIC the vendor typically gives builds a lot of margin into their
libraries. Say you pick-up a delay cell it could easily vary from 1 ns to 3 ns.
You design with a well characterized library and you perform static timing
analysis (see PrimeTime) on the design to guarantee that it will work at a given
frequency across a range of PVT (process, voltage, temperature.) The ASIC vendor
builds special process monitor circuits onto the die and uses those to monitor
the process - if it falls out of the acceptable range then they would just trash
the chips. If not then they typically run some scan vectors to tests all of the
gates (this is not a functional test) and then some special tests to test PLLs,
I/O pins,...

Now with these ASICs most likely you're not seeing parts at one end of the
extreme all of the time, so you can probably overclock them. But you're not able
to do a static timing verification - you depend on dynamic verification in a
system which is problematic. It's very difficult to guarantee that you cover all
of the paths when running dynamic tests. That's why ASIC designers like to do a
static timing analysis - basically follow all of the paths through all of the
gates between flip-flops and calculate the delays. Most ASIC designers couldn't
do a decent dynamic verification - hell they usually can't even exercise all of
the gates which is why many chips sacrifice 20% of the die are for scan test.
(See - fault grading,...)

That previous example covers parts which don't need to be speed binned. I'm not
sure what the state-of-the-art is for testing full custom parts which need to be
speed binned, but I'm pretty sure that they don't test them at speed for
extended periods of time. They might do something like put nand trees the circle
around the I/O pads and use those as a measure of the inherent chip speed, and
then run scan vectors plus some simple functional tests. Maybe with an expensive
chip you could spend a lot of time on the tester, but it would be quite a trick.

>
>> It would be a waste of time to
>>do so, too. You make the silicon and see what range it can do and you make the
>>chips at or below the minimum clock speeds attainable by those chips.
>
>I don't believe anybody does this.  I believe they _know_ before the first die
>is cut into
>chips, how fast the thing should be able to run.  Because they know very
>accurately how
>fast each of the gates can switch, and how many there are in a single path.
>
>
>> For
>>stability testing a Prime95/Memtest86 combo is all thats needed. If you want to
>>take videocards into account (high AGP speed, etc) you just run 3DMark2001 or
>>2003. Newer Nforce2 boards lock the AGP speed at 66mhz (or have it adjustable)
>>so you don't have to worry about it (same with the PCI speed).
>
>Fine, but suppose there is _another_ instruction with a longer gate delay and
>prime95
>is not using it.  BSR for example.  Then all your testing shows that the thing
>works but
>it fails for Crafty.
>
>That has happened...
>
>Prime95 doesn't test _all_ instructions, with exceptions thrown in at random
>points to further
>stress things...
>
>>
>>>Possibly, but that's "business".  But they weren't producing 2.8's that could
>>>run reliably at
>>>3.06, which is the topic here...
>>
>>They could, infact. They were clocking up to 3.2Ghz consistantly. I doubt we'll
>>even see 3.2Ghz for a while due to the heat those chips put off. As I mentioned
>>before a P4-3.06GHz is 110 watts.
>>
>>
>>Also, I'm not sure if you're aware but temperature/voltage helps a lot with
>>overclockability. If you get the CPU cold enough (-120C to be exact) you could
>>effectively run 2x your max ambient air temp and it would NOT be considered
>>overclocking. Here's a small graph I grabbed from Kryotech (makes Freon cpu
>>coolers). ftp://speedycpu.dyndns.org/pub/cmoscool.gif
>
>You are mixing apples and oranges.  One result of overclocking is having to ramp
>up the
>supply voltage to shorten the switching times, which produces heat that has to
>be removed
>or the chip will be damaged.  Another result is cutting the base clock cycle
>below the base
>settling time for some odd pathway, so that particular operation doesn't always
>produce
>correct results.  Two different issues...
>
>Just because you can cool it _still_ doesn't mean it can switch fast enough.
>
>
>
>>
>>Also when overclocking you need to use a bit of common sense. Lets say 1.6GHz is
>>stable at 1.6 volts, 2.0GHz is stable at 1.75v and perhaps the upper limit of
>>the theoretical chip I'm speaking of is, say.. 2.2GHz at 2.00v. If you test at
>>2.00v and it failed in prime95 after 1 hour and they drop the MHz down to
>>2.1GHz, you don't think it'll be completely stable? Of course it would be. I've
>>seen some servers fail at prime95, NON overclocked. It's that sensitive. If you
>>drop the clock speed THAT much (100mhz) from an already almost fully stable
>>setup it will be completely stable. It's still overclocked, yes, but being so
>>doesn't automatically warrant diepeveen hand-waving. :)
>
>No, but it is a risk.  As I said my first bad experience was a couple of years
>ago with a
>machine that passed all the overclocker tests around, but failed after 8-10
>hours on a
>particular data mining application...  When we dropped the clock back to
>specification,
>the problem did _not_ appear.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.