Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: needing advice on new dual processor computer

Author: Vincent Diepeveen

Date: 12:20:15 07/07/03

Go up one level in this thread


On July 07, 2003 at 12:59:07, Omid David Tabibi wrote:

>On July 07, 2003 at 11:34:11, Vincent Diepeveen wrote:
>
>>On July 07, 2003 at 10:36:00, Joachim Rang wrote:
>>
>>>On July 07, 2003 at 09:54:44, Vincent Diepeveen wrote:
>>>
>>>>On July 07, 2003 at 06:27:01, DANIEL JOHNSON wrote:
>>>>
>>>>>I PLAN ON BUILDING A NEW DUAL PROCESSOR COMPUTER WHEN THE NEW .09 CHIPS COME OUT
>>>>>WITH 1 MEG CACHE MEMORY AND HAS 800 MHZ FSB ABOUT 3GB MEMORY, DUAL SATA RAID 0
>>>>>HDD, SWIFTECH WATERCOOLING, BUT WHICH CPU WILL BE BEST AND FASTEST FOR CHESS
>>>>>OPTERON OR XEON
>>>>
>>>>Opteron will be trivially in 0.09
>>>>
>>>>The question will be however whether the Xeon is there first in 0.09 or Opteron.
>>>>
>>>>I bet at Xeon in 0.09 being there first.
>>>>
>>>>As you can see at specint crafty at 64 bits opteron is 1562 rated. this is just
>>>>1.8Ghz opteron.
>>>>
>>>>I expect for other programs when compiled 64 bits and with 16 GPRs and all the
>>>>other features to be equally fast (having same speed difference). Fritz is
>>>>assembly. In order to run in 64 bits at opteron Frans Morsch must now already be
>>>>working at that as for every new processor he has to rewrite nearly the entire
>>>>program. Must be hard worker that guy! I do not know whether he is busy with
>>>>that now. I can ask if you want to.
>>>>
>>>>For most of the others it is just a simple recompile though. The work i need in
>>>>diep to profit from having more bits in each register (the other advantages you
>>>>get for free or are compiler issues) is not little either. It will give a few %
>>>>though.
>>>>
>>>>When opteron is there in 0.09 it will be way faster, the chip is actually
>>>>designed to be 0.09. It is however interesting to see what speed the prescott
>>>>core will have in 0.09, i guess they will be sooner there.
>>>>
>>>>Best regards,
>>>>Vincent
>>>
>>>Vincent,
>>>
>>>will you use an opteron machine in Graz this year?
>>>
>>>regards Joachim
>>
>>I would if the government didn't provide me with Europes fastest machine for
>>computerchess available: 1440 processor TERAS machine. 1024 MIPS R14000
>>processors and 416 processor Itanium2 1.3Ghz Madisons.
>>
>>I'll be running hopefully at 500 processor partition (500 x 500Mhz = 250Ghz).
>
>What NPS do you expect to get on that machine?

I can't give out data yet for 2 reasons, from which 1 you can guess for sure,
but it will be good. When i yesterday told a betatester of mine what i expect as
a minimum nps and what i hope to achieve he said "HOLY SHIT".

Keep in mind diep at my dual K7 2.1Ghz gets around 180k - 220k nps (depending
upon position of course) at 2 processors. The clock speed of a K7 compares to
the much older R14000 processor. In fact the R14000 would be dead slow because
of the slow LOCAL memory of course, would it not be the case that the OFF chip
L2 cache of 8MB is making up a lot for local memory accesses.

The latency at this partition is amongst the best supercomputers of such
magnitudes deliver (512 processors in 1 partition from which 500 usable !!!).

But still it is around 4 microseconds for a single random cache line. That might
sound terrible bad, but try the fastest network cards ($1500 myrilnet cards)
which take around 20 microseconds. Even the big IBM supercomputers which do very
well (and very cheap) do not even come close. I guess they're also around 16
usec latency because they probably use the same cards.

So if you compare that then the machine is really great. Zugzwang ran if i
remember well at a 500Mhz Alpha at 500 processors in 1999. That machine is very
similar to this one in performance (i do not know latency but knowing it is a
Cray T3E which is the same SGI company it probably is doing great). Zugzwang of
course had very dubious forward pruning (Fail High Reductions) which can be
proven incorrect very easily using the hashtable. Zugzwang you can compare with
gnuchess of course very well. DIEP is doing way more in its evaluation than
Zugzwang. If i remember well Zugzwang was around 1 - 2 million nodes a second in
world champs. I was watching Zugzwang - Lambchop and it was peeking there at 2
million nodes a second in endgame against Lambchop.

So a simple gnuchess program that's like 500k nps at a single cpu K7 2.127Ghz
when well optimized. Versus a knowledgeable diep.

Then take another thing into account. This is very important. Zugzwang with all
its dubious forward pruning had a good branching factor. Let's be clear about
that. I remember it in endgame starting at 13 ply after half a minute then going
on to 14 and sometimes 15.

Note that world champ i searched 20 ply but it was with DIEP 1999 which was very
stupid in endgames and i was also doing dubious pruning that version.

Basically DIEP searched that deep that world champs in far endgames when at Bobs
quad 400Mhz Xeon PII, because it had a huge hashtable (400MB with 8 probes) and
it had a very stupid endgame eval.

Fritz at a quad PIII 500Mhz reached 17 ply in middlegame already, for the same
reason (simple eval) and Junior also had 17-20 iterations in middlegame in 1999
also at a quad 500.

Shredder was doing 14 ply back then at a 550PIII. A year later in london2000
shredder reached 12 ply at a K7 1Ghz. Fritz also something like that and Junior
even in 2002 was getting 17 iterations.

In short all the programs progressed and it has become harder to search
efficient thanks to the improved evaluations. I won't be saying that the task of
Zugzwang was an easier one than i have with DIEP.

I try nevertheless to do better than Zugzwang did in 1999 in all respects. That
is trivially goal 1.

A very important thing in which i try to be better is being fair and open. All
the logfiles of diep from world champs 2003 will be published. Whatever place i
get.

You can easily check at home then with a diep version so to speak.

With Zugzwang and all these supercomputers like Cilkchess and P.Conners and Cray
Blitz, all their logfiles have dissappeared without a trace.

Now from Cray Blitz we can know why also that was like 1986 or something.
Zugzwang played in 1999 still.

That's just 4 years ago. Idem cilkchess.

Not a trace they left. History should forget them soon IMHO for this reason.
Also the deep blue logfiles are world champion in not giving away statistical
data. Not a single searches a second number is visible in any of the logfiles i
saw.

It got 125 MLN nodes a seconds is last 2001s claim. I do not believe that at all
knowing how they calculated the 'speedup' figure.

Deep Blues speedup is based upon extrapolation. They measured at the 100Mhz IBM
cluster 1 node with 1 hardware processor versus 1 node and a bunch of hardware
processors. Speedup there was 15% 'so the whole machine has a 15% speedup'.
Crap. The hardest part (getting something to work over a cluster) with 30 slow
latency nodes was hidden simply.

Not a single datapoint we have from deep blue nodes. If he would be open and
fair there, he would show outputs simply of 1 processor versus 480.

Now where IBM had major commercial considerations not to do so, for the
scientific programs there is no excuse. They burnt for millions of dollars/euros
worth of system time and leave no trace!

Not a logfile even posted *ever*!

Cilkchess is first slowed down 30 times in order to claim a 'decent' speedup (i
remember a claim of a figure 50% but that wasn't a publication). A few months
ago i emailed Don Dailey after some logs.

Look this thing is from years ago. Logically that it looks bad now. Same for
deep blue and all the others.

Hyatt is not interesting to list here because he just toyed till 16 processors.
Good for Bob, but not interesting from mass parallellism viewpoint. there is
quite some difference between 16 processors and 500. We must not forget that
where Bob committed major fraud, that still the single cpu speed of Cray Blitz
was very good. Cray Blitz simply did well single cpu when compared to these
massive parallel supercomputers. I mean cilkchess was like 5-10k nps a second
single cpu with cilk and without it 100-200k nps. I saw it with my own eyes!
Zugzwang not a hair better. I know why however, as soon as you start using MPI
then you're lost simply.

Then we have the mysterious P.Conners. Trivially i cheer for the experiment in
itself, namely if you do something else than alfabeta then you face the jury
normally spoken. And the jury being the 'audience' is very mean usual. Give him
a bit of space. He's the only guy trying CNS2!

Regrettably he's also the only guy who can read his own papers, my emails to him
when i tried implementing for a month or 2 CNS, were very poor answerred. Not a
single result i got from him from P.Conners and except some very outdated
testset which he tried to solve with 200 processors or so, we do not have
anything of it left. The only thing i remember is the games i played and my
memory of the games i played is not so bad.

It was getting around 1 million nodes a second at 180 cpu's against DIEP.
Note that this is without hashtables at a cluster with a 16 usec worst case
random latency to get an integer (so the one way pingpong from the farrest spot
at the machine is like 8 usec).

So that is 4 times slower in latency than what i toy with.

I have forgotten now which CPUs last time were in when i played P.Conners but i
can't be far off when it was PIII500Mhz cpu's. Without global hashtable getting
1 million NPS without YBW is however quite a bit easier, let me tell you that.

So we speak about around 1 mln / 180 = 5.6K nps a cpu which is good in itself if
you consider the circumstances.

Further there is a few interesting tries by other authors, but that's so
prehistoric that we will forgive them trivially doing the hard pioneering work
back then. Some of them are however very well documented.

The best compare trivially is with Zugzwang in this respect that it ran at
similar hardware using the same parallel basic principle (YBW).

So DIEP and Zugzwang both face the same problems of communication and getting
the thing to search.

I will be however one of the first massive parallel researchers posting all the
logfiles from DIEP. The 400 pages of paper i have written to get this system
time i have *clearly* stated that i would do this. I definitely feel that both
SGI and especially my biggest sponsor the NWO (which owns the machine) support
me in that attempt and they should be given big credit for it.

I remember so many saying: "i get 50% speedup". Crap, if i'll get 50% speedup
it'll be because at 1 processor i cannot test with 150GB hashtables but with 500
i can. But such speedups you won't see soon. I garantuee you that. If i could
sign in now for 15% speedup i would sign blindfolded. 15% from 250Ghz = 37.5Ghz.
For a program that ended divided 5th of the world in 2002 and improved a lot
when sincethen, that'll make them sweat now already.

We must be realistic however that from that 250Ghz i already start losing the
vaste majority in 1 dang to communication. Then the biggest loss after that will
be YBW because getting a branching factor of around 3.0 is very important.

I did measure other search algorithms very well. For example the easy algorithm
to just let 16 processors run and search and only fill a common shared
hashtable, is not working so well at all.

I also combined it in july 2002 (during world champs 2002) with a 16 processor
normal search. So 44 processors just filling hashtables and 16 processors doing
a normal parallel YBW search.

That all worked horrible. In itself just 16 processors searching independant
from each other with a shared hashtable give something like a 2.2 speedup.
Around that speed.

2 processors give for diep there 1.2 speedup.

We couldn't test SOS at the machine either because spawning 59 processes is
pretty awful each move. It takes 10 minutes. If you additionally allocate 50GB
hashtables you need another 1.5 hours for that table to get attached to all the
processors. Note that i did that last experiment not with 60 processors but with
130 (so 129 tried to attach) and it was performed a few weeks ago.

To just list a few of the normal pitfalls that can happen...



>
>>
>>Thanks to the dutch government and NWO, NCF, computermanufacturer SGI, WGS,
>>IKAT, University of Maastricht and another number of organisations i forgot to
>>mention here (but not deliberately) i am sure diep will try to do its utmost
>>best in world champs 2003.
>>
>>Best regards,
>>Vincent



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.