Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Multiple processors on one chip...

Author: Eugene Nalimov

Date: 20:36:58 03/04/00

Go up one level in this thread


On March 04, 2000 at 23:17:45, Robert Hyatt wrote:

>On March 04, 2000 at 21:43:04, Tom Kerrigan wrote:
>
>>On March 04, 2000 at 20:27:38, Robert Hyatt wrote:
>>
>>>On March 04, 2000 at 15:48:16, Tom Kerrigan wrote:
>>>
>>>>On March 04, 2000 at 09:34:13, Robert Hyatt wrote:
>>>>
>>>>>>So it makes me wonder... if you made the Pentium's L2 cache as fast as the
>>>>>>PII's, would it achieve parity again? Seems likely to me.
>>>>>It would help...  but without register renaming, it becomes difficult to feed
>>>>>two pipes for long sequences of instructions.  I think the p6 would still keep
>>>>>a significant edge, but better cache would narrow the gap...
>>>>
>>>>Is there a section of Crafty that will run in 16k?
>>>>
>>>>You could do some comparisons with that.
>>>>
>>>>-Tom
>>>
>>>
>>>not that I can think of.  IE even the MakeMove() loop in perft requires a good
>>>bit of data...
>>
>>In that case, I don't think it's possible to use Crafty to compare the processor
>>cores. The TSCP benchmarks give much more accurate data in that regard.
>>
>>-Tom
>
>
>Only for small programs.  What about programs with larger cache footprints?
>
>IE I don't think _either_ TSCP or Crafty is the right benchmark.  The _right_
>benchmark is the program that is important to that buyer...  For simple
>programs, the old P5 core runs well if both pipes can be fed by the compiler.
>Which means no register jams occur in the program.  For more complex programs,
>the renaming logic in the P6 avoids many register jams/spills and does much
>better keeping both pipes filled.
>
>I am surprised any program is faster on a P5 than on a P6, equal clock speeds,
>however.

P6 doesn't like:
(1) 16-bit code - loading of new value into segment register is *very* slow,
(2) Playing with halves of the registers (e.g. when you are trying to use AL and
AH simultaneously). When it sees the second instruction before the first one is
retired is stalls the entire pipeline and restarts it, losing ~10 CPU clocks.

Maybe more, that is just from my memory. But any of that would be sufficient for
some programs to run slower on P6 than on P5.

Eugene



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.