Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Some Crafty 16.19 results on my XP 2.44GHz

Author: Robert Hyatt

Date: 14:40:10 02/22/03

Go up one level in this thread


On February 22, 2003 at 01:18:07, enrico carrisco wrote:

>On February 22, 2003 at 00:46:39, Robert Hyatt wrote:
>
>>On February 21, 2003 at 22:44:22, enrico carrisco wrote:
>>
>>>On February 21, 2003 at 09:55:56, Robert Hyatt wrote:
>>>
>>>>On February 21, 2003 at 06:47:48, enrico carrisco wrote:
>>>>
>>>>>On February 20, 2003 at 11:55:42, Robert Hyatt wrote:
>>>>>
>>>>>>On February 20, 2003 at 09:36:24, Jeremiah Penery wrote:
>>>>>>
>>>>>>>Prime95 is a real-world application.  It does very intense mathematical
>>>>>>>calculation, testing several-million-digit numbers for primality.  I don't
>>>>>>>believe there's another program that will detect CPU problems faster.
>>>>>>>
>>>>>>
>>>>>>The problem is that it won't detect _any_ floating point problems.  Nor problems
>>>>>>with unlikely instructions such as BSF/BSR, or fiddling with O/S issues like
>>>>>>cache flushing, fiddling with the memory type and range registers, and so forth.
>>>>>>
>>>>>>There is a _lot_ of the chip that such an application simply doesn't touch, and
>>>>>>when
>>>>>>you use such a test to say "it works" it is like flipping a coin.  If all you do
>>>>>>is use
>>>>>>the same instructions, you may well have a winner.  But if you use something
>>>>>>that your
>>>>>>test didn't exercise, who knows?
>>>>>>
>>>>>>I don't have time for those kinds of random problems.  If you do, that's
>>>>>>certainly up
>>>>>>to you to choose overclocking.
>>>>>>
>>>>>>
>>>>>>>I overclocked my CPU for a while, and it appeared to be completely stable.  I
>>>>>>>could run Crafty for days with no problems, and I never had a crash or bug in
>>>>>>>any other application.  I ran Prime95 for a while, where a calculation error was
>>>>>>>soon detected.  Of course, when I clocked back to the normal level, the error
>>>>>>>went away.
>>>>>>
>>>>>>Unfortunately your testing is backward.  You assumed it was good because it ran
>>>>>>without "crashing".  But are you _sure_ crafty never computed a bad score?  Or
>>>>>>hosed
>>>>>>the hash signature?  Or generated a bogus move?  No way to know.  And if prime95
>>>>>>runs with no errors, are you _sure_ all the floating point stuff works?  MMX
>>>>>>stuff
>>>>>>works?  Oddball things like bsf/bsr?
>>>>>>
>>>>>>That's the flaw in this...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>I was running at somewhere near the maximum rated speed for that particular
>>>>>>>core, which had about zero headroom to begin with, so the errors weren't all
>>>>>>>that surprising to me.  Had I bought a slower chip, I could have overclocked it
>>>>>>>to the speed of my current chip very safely, as the core obviously has the
>>>>>>>ability to run at that speed.  Overclocking becomes particularly unsafe when one
>>>>>>>tries to run at a speed above the normal ability of the core.  Otherwise, it's
>>>>>>>not much more than what the manufacturers do by taking chips from the same
>>>>>>>silicon wafer and splitting them into different CPU speed bins, as those chips
>>>>>>>should be theoretically _identical_.
>>>>>>
>>>>>>Note that we are not talking about buying 2.0ghz xeons and overclocking to 2.4.
>>>>>>We are
>>>>>>talking about buying the fastest chips made and overclocking _those_.  That is a
>>>>>>completely
>>>>>>different issue, and that is what is being done in the cases being discussed...
>>>>>
>>>>>How do you know what the maximum _planned_ speed of a certain core is?  Until
>>>>>you know that, the whole discussion is an endless loop.
>>>>>
>>>>>-elc.
>>>>
>>>>
>>>>When I did TTL design years ago, I simply took the published gate delays for
>>>>every circuit I used.  NAND gates, NOR gates, 16-1 mux, 1-16 demux, an ALU,
>>>>you name it.  I added up the gate delays, plus the published tolerances, and
>>>>started testing somewhere longer than that and shortened the clock to the
>>>>actual number computed by the longest-path analysis.
>>>>
>>>>The engineers _know_ what the max speed is.  I hope you don't think they lay
>>>>the thing out, build it, then see how fast it will run?
>>>
>>>I am not speaking for the engineers at all.  I'm simply stating that you can't
>>>support your statement that overclocking "the fastest chips available by both
>>>manufacturers" is unreliable and leads to trouble until you know the limits of
>>>that respective core.
>>
>>The point is, that limit _is known_.  It is the speed stamped on the thing
>>based on engineering specifications.  :)
>
><sigh>  Same line, same core...
>>

So you think that if a line produces a processor at 2.4ghz today, and at
2.8ghz in 6 months, that the 2.4ghz parts will _also_ run at 2.8ghz?

There are stepping changes.  There are fab changes.  There are process
improvaments that change neither.



>>
>>>
>>>Hence, without knowing those limits we can debate this all the way up until a
>>>new core is produced.
>>>
>>>_Main point_:  If I overclock a P4 3.06GHz to 3.3GHz and P4 releases the 3.2 and
>>>3.3GHz a few months later -- do you retract your statement that my overclock
>>>could have been unreliable?  There's really no way to know.  There's no way to
>>>know running that chip at the posted speed is reliable either -- if you're going
>>>talking about such minute failures that are undetectable by such real world
>>>testing programs as Prime95 and others.
>>
>>
>>No I wouldn't retract it.  I would say that any 3.06 chips produced on that
>>same fab line _after_ the 3.3 is released would probably overclock to 3.3
>>just fine,   _maybe_.  Remember that there will _always_ be surface defects in
>>a chip that may or may not affect the clock frequency it is stable at.  So
>>even if a chip comes off the 3.3 line, it might not work reliably over 3.0,
>>which happens.  And as the 3.3 process is refined, that non-working percentage
>>will probably drop.  And eventually to get parts at lower clocks, they might
>>just pull 'em off the 3.3 line and stamp them 3.0.
>>
>>I don't buy "testing to see how fast it is reliable."  I buy "let the engineers
>>and the silicon compiler add up the gate delays and _compute_ rather than
>>_guess_ how fast it should run."
>>
>>
>>
>>
>>>
>>>This is beginning to become a hair-splitting contest...
>>>
>>>-elc.
>>
>>No, it is becoming an urban-legend type of discussion, where someone "heard
>>it was done like this..." and the legend is born.  But ask the engineers at
>>Intel or HP or whatever.  They know what the parts should run at before the
>>mask is made and fab starts.  EE is an _exact_ science.  Not smoke, mirrors,
>>guesswork and trial-by-error...
>
>Perhaps I should ask you how much testing you think is done on every individual
>chip prior to its sale?  Seriously, a problem chip is what a warranty if for.



Not a lot. Because the engineers know what the chips will run at reliably,
before they roll off the line.  They might do quick speed tests for QC,
but that's all, otherwise the cost would be too high.




I
>would be willing to bet that if I ran Prime95 on a new chip for 2 days straight
>I'd be giving it a more rigorous test than _every_individual_chip_ coming off
>the production line receives.



Wouldn't argue.  And I'd bet it would not fail a single time either.  Until
you push the clock beyond what the engineers set the limit at.



>
>I, too, would like to see some engineer input on this discussion because I think
>you are greatly overestimating the extent of testing that goes into each
>individual chip.  The type of testing that each chip endures according to your
>account would cause there to be a two month waiting list to receive a CPU (or
>longer.)



What "testing" have I suggested goes on?  I don't think there is much
testing at all _after_ the chip has been routed, laid out, masks made,
process verified, and the prototypes tested to exhaustion to make sure
every known regression test has been done.  Once the line comes up to
normal production, they run quick go/nogo tests, and they pick random chips
and run much more exhaustive testing, and that is it.  The engineers do the
work _before_ fab, not after, which has been my point here.  They know before
the first chip rolls off how fast the things should clock.



>
>Advice:  Next time you see the beginning of an Intel propaganda TV commercial,
>dash for the remote and turn the TV off, and repeat the following sentence three
>times:


I don't watch intel commercials.  But I have multiple long-term friends that
have worked for various chip manufacturers as EEs.  PhD in EE types.



>
>"Resistance is not futile."
>
>Mass production is mass production.  They can't test every individual chip and
>you are placing far too much emphasis on marked performance and reliability.
>The "smart" overclocker tests the completion of one of his new speed settings
>far more rigorously than each individual chip receives leaving the production
>line.
>
>-elc.

Absolutely not.  The testing on the prototypes is a near-exhaustive testing
of everything, prior to the first production run.  Then it is "xerox-time"
and off they come.  I don't see where I have suggested that they do
production line testing that is significant.  They know the clock constraints
for the chip before the first one is ever made.








This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.