Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: But, Re: Questions re P4 3.03 with HT ??

Author: Robert Hyatt

Date: 10:19:24 12/11/02

Go up one level in this thread


On December 10, 2002 at 20:56:47, Matt Taylor wrote:

>On December 10, 2002 at 20:00:38, Vincent Diepeveen wrote:


>>
>>However we must take into account that at 1 point crafty is completely
>>getting toasted. it is toasting itself by doing so much memory references.
>>2 probes in 2 different hashtables. If you remove that probe to the
>>second table and consider that in a single cache line even at ddr ram
>>you already get 64 bytes, so can do 4 probes for free in a single hashtable,
>>then you will realize also with me that after that change crafty will
>>run like the sun dual also at a dual k7.

First, if this conversation is going to continue, how about grounding it in fact
rather
than fiction.  If you do four probes, either you had better be _sure_ that you
probe to
an address that is divisible by 64, or you will _not_ get all the probes for
free.  Cache
doesn't work like that.  If you probe _randomly_ to a specific entry, then you
get the
64 byte cache line (I am assuming AMD, as Intel is 32/128 byte line lengths
depending
on the processor) that contains that hash table entry, but it could be the first
16 bytes
in the line (assuming 16 byte hash table entry) or it could be the last 16
bytes.  With a
64 byte cache line and normal transposition table addressing, you are going to
get two
probes for free, four will cost 2x as much.

>>
>>P4 duals are forever history then and left behind bigtime by the dual k7.

"history"?

:)


>
>I own an AthlonMP and use another at work. They're budget SMP compared to what I
>have heard about Intel's offerings. That's also why they're affordable, but
>price isn't a consideration when you're talking only about performance.
>
>Some of the dual-P4 systems have 8.4 GB/sec memory bandwidth. The hash table
>probe will be much faster on such a system than on either of the dual-Athlons
>that I use, particularly when the hash is interleaved across memory banks. Yes,
>RDRAM might be slower to access, but when you can access 4 locations in
>parallel, the 15 bus clocks latency diminish with respect to equivalent
>serialized access to 10 clock latency DDR modules. This assumes that such
>accesses can be made in parallel, but I'd say that's a safe assumption for a
>hash table.

Also the E7500 chipset interleaves with DDR ram, which is what is on my machine
hree...

>
>>I call that very bad programming. Weak programming. Especially because it
>>was already mentionned by me and others over a year ago to Hyatt.

And you _still_ don't understand the "cache line" issue based on your above
staetments,
so perhaps it is "weak understanding" by you rather than "weak programming" by
me.
My hash table works just fine.  That is the #1 criterion.

>>
>>Also provable is that 4 sequential probes perform better than a 2 table
>>approach. Even in ICCA journal that was already proven years ago.
>>
>>Now it is pretty bad to say that this research done was not accurate.
>>
>>I dislike such excuses. 5 tests of Nalimov proof something. 6 tests
>>of someone else do not. This is very bad science of course.


I'm full of bad science.  I only run tests at _least_ four times each and
average
the results before I post numbers.  You should try some "bad science" yourself,
rather than doing what you currently do.  "xeon 2.8 is not available".  Etc.
All wrong.  All the time.




>>
>>the 2.4Ghz Xeons which nalimov used to report a 20+% speedup or so,
>>i also tried (but single cpu) and couldn't get any speedup at all at
>>them.

Exactly what HT-enabled machine did you run on?  You just said "no processors
yet do hyper-threading" in another post.  Can you not keep up with all the wrong
statements you make and at _least_ keep them consistent with each other?




>>
>>SMT not working simply at them.
>
>You have to enable it in the BIOS.
>


Of course.  But he would have had to have actually run on such a machine to know
this.  He didn't.  So he doesn't...



>>This is legal. Intel only garantuees SMT to work at P4s above 3.0 Ghz
>>and at Xeon MPs (fastest Xeon MP released so far 2.0Ghz which is supposed
>>to run up to 8 processors in contradiction to Xeon which is only running
>>to 2 processors).
>>


And that is _still_ wrong.  In one paragraph you say you tested on a 2.4mhz
xeon.
In another you say they don't so SMT.  I _have_ one and it _does_ to SMT as I
posted above.  I notice you dind't respond (as usual) when someone posts _real
data_
rather than garbage...




>>So intel was correct in their statement from my viewpoint. All stuff i tested
>>didn't get any speedup yet. I didn't test a 3.0Ghz P4 yet. they do not deliver
>>them yet here.
>>


Intel didn't say a thing about xeons not supporting SMT.  The _specifically_
said
that the PIV 3.06ghz would be the first _desktop_ pentium processor to support
SMT.
However, the xeons are another story.  And no, SMT is not only supported in Xeon
MP
processors, it is supported in _all_ current xeon processors.




>>
>>
>>>-Matt
>
>I'm sure you'll see different results if and when you do. Intel rarely lies
>outright; they usually rely on consumer ignorance.
>
>-Matt

In the case of Vincent, that is a good bet.  :)

I think it funny that someone that has not touched a SMT processor is arguing
with
a few that actually have them in their offices (Eugene and myself specifically).
 And
we are talking xeons, not PIV with SMT which I have _not_ tried.  The xeon is a
different
animal from the PIV.  And all I can quote data for is the xeon.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.