Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Since the CPU is what really count for Chess !

Author: Robert Hyatt

Date: 15:22:57 03/18/03

Go up one level in this thread


On March 18, 2003 at 17:02:22, Aaron Gordon wrote:

>On March 18, 2003 at 16:41:03, Robert Hyatt wrote:
>
>>On March 18, 2003 at 15:42:56, Aaron Gordon wrote:
>>
>>>On March 18, 2003 at 10:12:56, Robert Hyatt wrote:
>>>
>>>>On March 18, 2003 at 00:24:01, Aaron Gordon wrote:
>>>>
>>>>>On March 18, 2003 at 00:01:44, Robert Hyatt wrote:
>>>>>
>>>>>>On March 17, 2003 at 22:59:30, Aaron Gordon wrote:
>>>>>>
>>>>>>>On March 17, 2003 at 18:47:27, Eugene Nalimov wrote:
>>>>>>>
>>>>>>>>I just run the experiment. I used 2 otherwise identical 64-bit systems, one with
>>>>>>>>3Mb of L3 cache, other with 1.5Mb. Machine with bigger cache run Crafty's
>>>>>>>>"bench" comman 12% faster (1 CPU).
>>>>>>>>
>>>>>>>>That means that
>>>>>>>>(1) Crafty's working set don't fit into 1.5Mb,
>>>>>>>>(2) For systems with cache 1.5Mb or less (i.e. for almost all x86 systems) for
>>>>>>>>Crafty memory speed matter.
>>>>>>>>
>>>>>>>>Thanks,
>>>>>>>>Eugene
>>>>>>>
>>>>>>>Those types of systems aren't what people normally use. Most people here have a
>>>>>>>Pentium 3, Athlon, Pentium 4, etc. Here is something I found with Crafty.
>>>>>>>
>>>>>>>Using the Nforce2 chipset I'm able to run the ram at speeds from 50% up to 200%
>>>>>>>(100% being synchronous) of the fsb speed. I tested 200MHz FSB (400DDR) with
>>>>>>>200MHz memory (400DDR) and 200fsb with 100MHz memory (200DDR).
>>>>>>>The difference between ~1.6gb/s memory and ~3.2gb/s memory with craftys 'bench'
>>>>>>>command was 0.14%. Yes, about one seventh of one percent.
>>>>>>
>>>>>>That might well suggest _another_ bottleneck in that particular machine....
>>>>>
>>>>>Another bottleneck? What was the original one?
>>>>
>>>>
>>>>The original one was assumed to be bus speed.  That's where I entered the
>>>>discussion.  But bus speed is not the _only_ issue that can cause problems
>>>>here.
>>>>
>>>>Lack of interleaving is another.
>>>
>>>All modern single cpu computers have 4 way/4 bank memory interleaving. Even my
>>>old dual Celeron box has 4 bank/4 way interleaving...
>>
>>
>>Most do _not_ support interleaving.  I'm _specifically_ talking about four banks
>>to do
>>four consecutive 8-byte reads at once, then you want for the initial 120ns
>>delay, and grab
>>the first 8 bytes, followed by the remaining 24 bytes on the next 3 bus cycles.
>>Repeat to
>>fill a cache line.
>>
>>I am not aware of _any_ single-cpu machines with interleaving.  You have to have
>>a machine
>>with 4 banks, with 4 SIMMS/DIMMS/etc as well.
>>
>>Give me a model number for your celeron and I'll look.  But Unless you have four
>>separate
>>DIMMS in it, it ain't doing 4-way interleaving.
>
>The motherboard is an Abit BP6. There's also 4-way interleaving on the Abit KT7,
>KT7a, BH6, Be6, Be6-2, BX6, BX6-2, etc. Tons and tons of boards support 2 & 4
>way. Also if I recall correctly it treats 1 dimm as "2" banks. Back in the day
>when enabling 4-way interleave with two Kingmax PC150 dimms (256mb per) I saw at
>least a 20% fps increase in Quake3. I still have the KT7a if you want me to run
>any tests.


OK... that says enough.  "two dimms as one bank" simply is _not_ interleaving as
you can't
read 16 bytes from a single DIMM in one cycle.  Which you _can_ do from two
DIMMS in
two distinct banks...

I have no idea why they would call that "interleaving" by any definition of the
word I can
think of.  The idea is to distribute interleaved addresses across banks so that
addresses N,N+7
are in bank 0, n+8,n+15 in the second bank, etc.  Now you have a good reason to
issue
simultaneous reads to two banks since you want at _least_ 32 bytes for the
smallest cache line
in the X86 architecture, up to 128 bytes for the largest cache line I have seen
in X86.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.