Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Zappa Hardware Upgrade

Author: Vincent Diepeveen

Date: 09:50:38 12/24/05

Go up one level in this thread


On December 23, 2005 at 15:28:31, Joachim Rang wrote:

Actually, Diep had a great speedup from the supercomputer rounds 8-11
at world champs 2003.

First few rounds, like against Shredder, it had a great speedup first
few moves of the game, but collapsed later. That issue was fixed
within a few days.

After that of course the supercomputer was busy crashing, and was a full
day offline, so i had to play at a dual k7 machine the round 6 and round 7
i still suffered from that.

Fritz was very lucky there in 2003. After that of course in 2004 i beated it at
a quad.

They are delicate machines those giant monsters and very unreliable,
the bigger they get.

The bestcase performance of those machines is not in question, it is
marvelous. Chess is however a worst case game. You cannot have a machine
that is crashed for a full day during a world championship.

Please take into account that the machine Anthony toys at is 100% the same like
mine, only it has way faster processors. As he and i have chatted a lot, i hope
he doesn't need to struggle to get the same information what works great at
those machines and what doesn't.

That's what science is for.

That's something i really missed from previous computerchess supercomputer
programs. They gave further like 0 relevant information what would work and what
wouldn't, and which algorithm would be great and how to implement it.

Diep has set a new standard there. I have very openly discussed the problems and
the advantages of big supercomputers.

Additionally don't forget diep had to set a new standard. In far past, when you
showed up with a supercomputer, you also had the fastest single cpu.

I ran on single cpu's which were 6+ times slower than the single cpu where
Shredder, Junior and Fritz ran at.

Additionally there is a huge hashtable problem at such machines, something where
hydra suffers even more from of course.

Despite all those problems, it only gave problems the first 7 games. Some games
more than others. For example the first 3 games it was real real ugly
what happened there.

Yet thanks to me publicly posting about this, others have been warned now and
hopefully won't make the same mistakes.

It is this information sharing which other researchers, despite me shipping them
100 emails with questions, have never shared.

They just praised those machines, meanwhile they first slowed down their program
factor 40. If 1 cpu is 6 times faster than your opponents, then slowing down
your program a factor 40 first in order to run better at such a machine is not a
problem. If your cpu is 6-10 times slower than the cpu's
your opponent have, then losing a factor 40 is not possible.

As that means you're 240-400 times slower in scaling already than a single cpu.

For example, cilk and zugzwang were very simplistic programs, which single cpu
now would get 2 million nodes a second or so.

But at a 512 processor 500Mhz machine they got a few million nodes a second at
most.

Diep at a single cpu A64 2.4Ghz gets 150k nps or so. But at a 512 processor
500Mhz machine i hit 9.99 million nodes a second against Hydra.

Who scaled better, they or me?

I *had* to have such a good scaling. the scaling was simply real good.

1 processor of 500Mhz could get 20+ k nps. At 460 processors i hit 5 million
nodes a second up to 10 million nodes a second. So diep scaled very very well.

Not a single supercomputer program is better than that as of now and i doubt
ever anyone will improve upon that.

Yet it is obvious that 1 cpu of a supercomputer now and in the future will
unlikely be as good as a pc processor is.

That is the huge difference with the past we are confronted with now.

Anthony has the luck now that he has 1.6Ghz itanium2 processors.
Diep had the same machine, but a previous generation, featuring 500Mhz
washing machine chips.

Of course in the year 2000 when this machine was build, this 500Mzh processor
was ok. But when i used it in the year 2003, nearly 2004, it was already so so
outdated processor.

Yet the real challenge then is to get a speedup out of that and that has been
achieved. That's what matters from scientific viewpoint.

I did start a comparision run to compare the speedup of the 512 processor
partition with a single cpu.

This comparision run was brutal stopped by the system administrators, as the
budget of 90000 hours was crossed and that meant i couldn't logon anymore
into the system at all.

So speedups are not very accurate to give. I have tried recently a comparision.

It shows clearly that the supercomputer version in 2003 of diep was comparable
to a quad opteron dual core 2.4Ghz of today the first moves out of book,
later in the game it was no compare, the supercomputer wins bigtime there.

Considering the great shared memory inside a quad opteron and the HUGE
difference in speed between a MIPS washing machine chip (with OFF chip L2 cache
even) and an opteron 2.4Ghz, that is IMHO a formidable achievement.

Seymour Cray:
  "If you were plowing a field, which would you rather use?
  Two strong oxen or 1024 chickens?"

From ease viewpoint Seymour Cray is right, but it is obvious i succeeded in
showing that 512 chickens can work together very nicely.

I hope you guys realize that Anthony will have a harder job in letting Zappa
scale well at 512 cpu's. The memory latency hasn't improved much since 1999
at those sporthal filling machines. It was 280 ns at the MIPS architecture,
which is a formidable achievement if you realize that they already achieved that
early years 90. The local memory TLB trashing latency at those itanium2 chips
still is 280 ns.

For Diep an itanium2 1.6Ghz is equal to an opteron 1.8Ghz, as diep is a 32 bits
program that is well generic written. Yet if your sequential speed
gets really a lot faster. At least 5 times faster than the 500Mhz MIPS chips i
ran at, it is obvious that the sharing of the hashtable problem still remains
the same problem.

A shared hashtable is crucial. For his 512p setup Anthony will have to solve
that problem and i wish him a lot of luck doing that. Good for him is that he
can test a lot at that machine. I basically lost an entire world championship
because i couldn't test at such a machine.

With the moderated setup now and already quite some testing,
Zappa will do fine in paderborn of course.

That will really take care he's ready for world champs 2006 in Turino and not
only scale well, but also have a well tested program there.

Now that the current diep version is optimized so well for a supercomputer,
i would give up my Ferrari to run at that machine of Anthony at world champs
2006.

It's simply a bigtime improved version of what i ran at in world champs 2003.
A CPU that's 6 times faster than what i had!!

Vincent

p.s. Zappa won't win Turino, since end of the 80s no supercomputer has won a
world title anymore.

>On December 23, 2005 at 13:26:42, Zappa wrote:
>
>>So when I went to UIUC as a newly minted World Champion, they were interested
>>and after some negotiations I managed to procure some time on NCSA's Cobalt
>>supercomputer, an SGI Altix:
>>
>>http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/SGIAltix/TechSummary/
>>
>>At Paderborn Zappa will run on 128 CPUs as a bit of a warmup; at Turino I hope
>>to use 512.  I haven't really had enough time to seriously optimize Zappa for
>>this machine, and I have been somewhat disappointed by the Itanium2 CPU, but the
>>results are still reasonably impressive.  For example:
>>
>>r1b2r2/p1q1ppk1/6p1/3p3p/7P/5P2/PPPQ2P1/2K1RB1R b - - 0 9
>>
>>1... Ra8-b8 2. g2-g4 h5xg4 3. h4-h5 Qc7-b6 4. b2-b3 Qb6-f6 5. Kc1-b1 Rf8-h8 6.
>>h5-h6 Kg7-g8 7. f3xg4 Bc8xg4 8. h6-h7 Kg8-f8 9. Qd2-h6 Kf8-e8 10. Bf1-d3 e7-e6
>> = (-0.49)      Depth: 17/45    00:03:37.11     5532448kN (25481 KN/s, 3558608
>>splits, 294631 aborts)
>>
>>r1b2rk1/pp3ppp/1nn1p3/q2pP3/2pP1P2/P1P2NPB/2PB3P/R2QK2R w KQ - 0 8
>>
>>1. Ke1-g1 Nc6-e7 2. Nf3-h4 Qa5-a4 3. Qd1-b1 Bc8-d7 4. Qb1-b2 Qa4-c6 5. Ra1-b1
>>Qc6-c7 6. Nh4-g2 Ra8-c8 7. Kg1-h1 Ne7-f5 8. Ng2-e3 Nf5xe3 9. Bd2xe3 Rf8-d8 10.
>>Rb1-a1
>> = (0.15)       Depth: 20/46    00:04:33.59     7300957kN (26686 KN/s, 6497504
>>splits, 503423 aborts)
>>
>>Single CPU Zappa on the I2 there gets about 300 knps, so that is an nps speedup
>>of 80-85.  I'll probably get a bit less in Paderborn, as this is essentially the
>>World Champs version of Zappa, and I've been busy de-optimizing it since :)
>>
>>This is really only 1-2 ply deeper than my quad, but wait until Turino when I've
>>had some time to optimize things a bit . . .
>>
>>anthony
>
>
>Hi Anthony,
>
>so that are the big news you were talking about. I knew already the rumor so I
>awaited this announcement. Congratulations this is very fascinating. I hope they
>will give you really enough time to fully utilize all the processors at least in
>Turino so I hope you can avoid the faith of Vincent in Graz where he had 500
>CPUs but no real speed-up.
>
>It is now clear why Hydra is avoiding to play in Paderborn they probably knew
>the rumor as well. Hydra is for me the disappointment 2005, there was the boring
>Match against Adams and basically that's it what you could hear about Hydra this
>year.
>
>From a competitional point of view I'm glad that in the meantime there appeared
>Rybka so that we don't need to be annoyed about your "unfair" hardware advantage
>which prevents us again from clinching a title. I suppose even on one processor
>Rybka will be the co-favourite in Paderborn.
>
>regards
>
>Joachim



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.