Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: But, Re: Questions re P4 3.03 with HT ??

Author: Vincent Diepeveen
Date: 17:00:38 12/10/02
On December 10, 2002 at 12:29:02, Matt Taylor wrote:

>On December 10, 2002 at 09:08:10, Vincent Diepeveen wrote:
>
>>On December 09, 2002 at 16:18:48, Matt Taylor wrote:
>>
>>>On December 09, 2002 at 14:07:16, Christopher A. Morgan wrote:
>>>
>>>>Thanks for the posts.  I do know that the AMD XP line model numbering is not in
>>>>GHz, but is an attempt to be equivalent to the Intel GHz classification of their
>>>>line of P4 processors, and that bus speed is very important in overall speed of
>>>>the processor in all applications.  I must have forgotten that in my post.
>>>
>>>Actually the model number compares to the earlier Thunderbird chips. An AthlonXP
>>>1500 is theoretically equivalent to a 1.5 GHz Thunderbird. (A 1.5 GHz
>>>Thunderbird will mop up a 1.5 GHz P4.) Based on my knowledge of the processors
>>>in question, I don't think this rating system is at all accurate. (A 1.6 GHz
>>>AthlonXP 1900 is equivalent to a 1.6 GHz Thunderbird in most cases.)
>>>
>>>>That being said, the difference in speed, AMD processors being faster, is still
>>>>considerable for chess it seems.  This is in contrast to the standard bench
>>>>tests done by Tom’s hardware comparing the latest AMD XP and Intel P4
>>>>processors.  There seems to me to be a disconnect somewhere.  Why would XP be so
>>>>much faster in nps compared to P4 in a chess program, but be slower in almost
>>>>every other bench test comparisons?
>>>
>>>There is also considerable evidence that Tom's Hardware is either biased or
>>>stupid. (I've for years claimed the latter.) Most hardware sites do a poor job
>>>overall of benchmarking, mostly because the people who run them don't understand
>>>how a processor works. The best I've seen is a poor regurgitation of diagrams
>>>and schematics that Intel and AMD release.
>>>
>>>Additionally, most synthetic benchmarks show better P4 results than you get in
>>>the real world. Most benchmarks get optimized by Intel engineers. AMD as a
>>>company does some of the dumbest things, one of which is that they don't extend
>>>their hand into such matters. As a result, the benchmarks are going to show
>>>excellent P4 performance because they're optimized for P4. Most optimizations
>>>required for P4 also help the Athlon, but it is still possible to extract even
>>>better performance out of the Athlon.
>>>
>>>I would have to question the relationship between fps in Quake and nps in chess.
>>>I see none, and I fail to see how Quake demos can possibly benchmark anything
>>>other than Quake performance.
>>>
>>>In the real world, AthlonXP at a given rating is faster than the P4 at the
>>>equivalent clock speed on the same bus. That was a complicated sentence, so
>>>here's an example:
>>>
>>>AthlonXP 2800 (133 MHz FSB) is faster than P4 2.8 GHz (133 MHz FSB)
>>>AthlonXP 2800 (166 MHz FSB) is much faster than P4 2.8 GHz (133 MHz FSB)
>>>
>>>The tests -I- would like to see include the following:
>>>1. P4 3 GHz (133 MHz FSB) vs. AthlonXP 2800 (166 MHz FSB)
>>>2. P4 3 GHz w/HT (133 MHz FSB) vs. AthlonXP 2800 (166 MHz FSB)
>>>3. Dual-P4 3 GHz w/o HT (133 MHz FSB) vs. dual-AthlonMP 2400 (133 MHz FSB)
>>>4. Dual-P4 3 GHz w/HT (133 MHz FSB) vs. dual-AthlonMP 2400 (133 MHz FSB)
>>>
>>>These are all stock configurations, and they represent the best offerings from
>>>Intel and AMD. It is quite expensive to build systems with those configurations,
>>>but it should be possible to extrapolate the results given enough tests on other
>>>systems.
>>>
>>>-Matt
>>
>>Matt i don't know it for crafty or other crap products. Crafty as we
>>see in test needs less nodes when running MT=2, so is no good of a standard
>>here. Also it is doing 2 probes in 2 different hashtables which i cannot
>>do even in DIEP (too slow for me) i do 8 probes in 1 hashtable sequential
>>(so a good bandwidth is helping diep more than it is crafty for example).
>
>We're talking about chess engines in general.
>
>>My own testing at the machines you mentionned, with exception of the
>>3.0Ghz P4, i found that for the newer generation P4s the speed
>>difference is only 1.5 for 133Mhz versus 133Mhz bus.
>>
>>Of course comparing 166Mhz bus is no good idea, as i do not know a single
>>dual that can run 166Mhz bus.
>
>Neither do I, but that wasn't the point. The point was to test the high-end
>offerings from both companies and make a decision. The first two tests are
>single-CPU systems to determine whether the P4 3 GHz is faster than AthlonXP
>2800. The AthlonXP 2800 does run on a 166 MHz FSB, and the tests would be biased
>if they -didn't- use it.
>
>As a scientist, I place little merit in assertions. I, too, would assert that a
>K7 is faster than a P4. However, I want to see data. I would rather know the
>truth than debate opinions.
>
>>Obviously this is without possible wins by SMT, but as we see, even a buggy
>>crafty only profits 13-16% from it. Not the 33% by nalimov. We do not know
>>when nalimov's chip gets on the market. Perhaps in 2005. He is having probably
>>beta versions. The 2.4Ghz Xeons here do not have SMT at all. His ones have,
>>so my conclusion is he has a beta version.
>>
>>He reports like 30%+ speedup or something. All tests indicate not even 15%
>>on average for the buggy crafty.
>
>Nalimov's chip came to market in 2002. The 2.8 GHz Xeon w/HT has been available
>for some time...
>
>And yes, I only see 13-16% as well, but Dr. Hyatt seems to believe some minor
>changes may increase these numbers. It is only fair to humor that idea until we
>see the resulting data.
>
>Also, I am not sure why you're calling his version of Crafty "buggy".
>
>>I conclude he has a newer SMT build in into the CPUs.
>>
>>This could be true, because the sold P4s initially didn't have any working
>>SMT at all (see my own and tests of others).
>>
>>The current 3.0Ghz P4 on paper has it and so far the testreport posted
>>with it here is the only test i have with it as i didn't get myself a
>>chance yet to test several programs at it.
>>
>>For DIEP the new P4s are a lot faster than the old ones. I concluded that
>>DDR ram mattered a lot. DDR ram as we know has a 2 times faster latency
>>than quadpumped 100Mhz RDRAM (which is sold as 400Mhz).
>>
>>Of course also 1066 RDRAM won't matter much. this is quad pumped 533/4 =
>>133 Mhz ram. At most 33% faster than the 2 times slower 100Mhz RDRAM.
>>
>>But whether it is improvements in the chip or ddr ram, the difference is
>>'only' 1.5 now.
>>
>>that means the clockspeed of a K7 you must multiply by 1.5 to get
>>the equivalent P4 for me.
>>
>>So a 3.06Ghz P4 when run single cpu will perform like a
>>3.06 / 1.5 = 2.04Ghz K7
>>
>>So you can test till you are blue and yellow. You don't need to
>>test at all.
>
>Actually I do need to test. Without data I can only make broad claims that are
>probably false. That is something that I will not do.
>
>>Even SMT giving 10% or so won't get that P4 faster. 33% faster RAM
>>won't get the cpu 33% faster.
>>
>>It is trivial that the AMDs will clock when they are 0.13 to nearly
>>the same speed like the P4s are.
>>
>>And the good thing from those AMD processors is that i can put them
>>in my dual K7 most likely (the MP versions of it), whereas for a
>>P4 dual Xeon i need to buy a completely new system.
>>
>>RDRAM 1066 is pretty expensive here.
>>Let me check:
>>  256MB PC1066 RDRAM = 239 euro at www.informatique.nl
>
>Yes. Intel CPUs are more expensive than AMD CPUs, and RDRAM is about twice as
>expensive as DDR SDRAM. The question is not price, however. The question is
>performance. I find it rediculous to squeeze out 5% more performance for $2,000
>USD, but some people will still make the tradeoff.
>
>>Next time i buy RAM i don't want 256MB though. I want 3 GB ram.
>>
>>I do not see how much 2 x 512MB + 2 x 1 GB RDRAM dimms cost and
>>for dual Xeon i need to buy also ECC registered RDRAM i bet.
>>
>>Meaning probably a lot more than the quoted 1 euro a MB dimms.
>>
>>Now let's look to DDR ram. Even cas2 DDR ram 256MB is like 78 euro
>>here.
>>
>>Over a factor 3 times more cheaper.
>
>In the US it's only about 2 times cheaper:
>pc1066 RDRAM     512 MB - $202 USD
>pc2100 DDR SDRAM 512 MB - $99 USD
>
>>Now i will not complain about a price soon, but if something is faster
>>and cheaper i know what i buy.
>>
>>For me price is not most important simply. But that being faster is.
>>
>>Doesn't take away that intel is doing better than i had thought 6
>>months ago they would do.
>>
>>The difference in performance is a lot less than it was for DIEP.
>>
>>Best regards,
>>Vincent
>
>Yes, but Diep is not representative of every chess engine out there. Also, you
>probably haven't tweaked it to run well on a P4. It is a fact that software
>without P4 optimization does not run nearly as well on a P4 as synthetic
>benchmarks suggest. The P4 loses in almost every benchmark until Intel sends
>engineers over to tweak the code.

That is not true at all.

The K7 in fact earns way more when you tweak for it. However the
code that those intel guys generate then runs slower on the AMD.

There are plenty of examples posted here the past months.

If i compile diep with intel c++ 5.0 it is a lot faster on the k7 than
with intel c++ 6.0.

On the other hand i get a huge speed boost with gcc which has some
code tweaked more objective (and more pro K7 in fact).

A good processor like the K7 you obviously can win more than with the
p4. the 2 weak points of the P4 are trivial: small L1 data cache and
major penalties for branches.

Last 3 years however i already try to avoid branches when i can.
Of course  in a lot of cases i cannot. So i do not avoid it then.
Then the focus is upon preventing misprediction.

That's what i definitely do already for 3 years now. It is meaningless
to say that in the C code itself i hardly can do good readable
optimizations anymore for the P4 at all.

*impossible* simply. Of course i can write some unreadable code in
many patterns which might speed it up like 0.5% after hard work for
over a month or 2 or so.

I am *not* a beginner here.

>Crafty has been tweaked by Nalimov, and this is why Crafty runs as well as it
>does on a P4. I argue that this is proof that it could be tweaked to run yet
>better on a K7, but who would do it? I certainly don't claim to have Eugene's
>level of skill at optimization.

Crafty when compiled for specint is *a lot* faster for K7 than for the P4.

Amazingly with intel c++ 5.0

You see, it is sad of course to know that there is no special K7
compiler. You won't hear me mention that the AMD guys are great
in support and producing things like compiler. Not at all. They suck
there incredible.

Intels only weapon they have is their own compiler. Very good programmers
do not need them do.

I must find the first chessprogram which is faster on a P4 than on a
K7.

Even a default compile which is not optimized for a k7 at all, is already
a lot faster on a K7 than at the fastest p4.

However we must take into account that at 1 point crafty is completely
getting toasted. it is toasting itself by doing so much memory references.
2 probes in 2 different hashtables. If you remove that probe to the
second table and consider that in a single cache line even at ddr ram
you already get 64 bytes, so can do 4 probes for free in a single hashtable,
then you will realize also with me that after that change crafty will
run like the sun dual also at a dual k7.

P4 duals are forever history then and left behind bigtime by the dual k7.

I call that very bad programming. Weak programming. Especially because it
was already mentionned by me and others over a year ago to Hyatt.

Also provable is that 4 sequential probes perform better than a 2 table
approach. Even in ICCA journal that was already proven years ago.

Now it is pretty bad to say that this research done was not accurate.

I dislike such excuses. 5 tests of Nalimov proof something. 6 tests
of someone else do not. This is very bad science of course.

the 2.4Ghz Xeons which nalimov used to report a 20+% speedup or so,
i also tried (but single cpu) and couldn't get any speedup at all at
them.

SMT not working simply at them.

This is legal. Intel only garantuees SMT to work at P4s above 3.0 Ghz
and at Xeon MPs (fastest Xeon MP released so far 2.0Ghz which is supposed
to run up to 8 processors in contradiction to Xeon which is only running
to 2 processors).

So intel was correct in their statement from my viewpoint. All stuff i tested
didn't get any speedup yet. I didn't test a 3.0Ghz P4 yet. they do not deliver
them yet here.



>-Matt
Re: But, Re: Questions re P4 3.03 with HT ?? Matt Taylor 17:56:47 12/10/02
- Re: But, Re: Questions re P4 3.03 with HT ?? Robert Hyatt 10:19:24 12/11/02
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.