Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: parallel scaling

Author: Vincent Diepeveen

Date: 11:20:01 10/29/03

Go up one level in this thread


On October 28, 2003 at 23:21:55, Robert Hyatt wrote:

>On October 28, 2003 at 18:12:16, Vincent Diepeveen wrote:
>
>>On October 28, 2003 at 09:48:52, Robert Hyatt wrote:
>>
>>>On October 27, 2003 at 21:23:13, Vincent Diepeveen wrote:
>>>
>>>>On October 27, 2003 at 20:09:55, Eugene Nalimov wrote:
>>>>
>>>>>On October 27, 2003 at 20:00:54, Robert Hyatt wrote:
>>>>>
>>>>>>On October 27, 2003 at 19:57:12, Eugene Nalimov wrote:
>>>>>>
>>>>>>>On October 27, 2003 at 19:24:10, Peter Skinner wrote:
>>>>>>>
>>>>>>>>On October 27, 2003 at 19:06:51, Eugene Nalimov wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>I don't think you should be afraid. 500 CPUs is not enough -- you need
>>>>>>>>>reasonable good program to run on them.
>>>>>>>>>
>>>>>>>>>Thanks,
>>>>>>>>>Eugene
>>>>>>>>
>>>>>>>>I would bet on Crafty with 500 processors. That is for sure. I know it is quite
>>>>>>>>a capable program :)
>>>>>>>>
>>>>>>>>Peter.
>>>>>>>
>>>>>>>Efficiently utilizing 500 CPUs is *very* non-trivial task. I believe Bob can do
>>>>>>>it, but it will be nor quick nor easy.
>>>>>>>
>>>>>>>Thanks,
>>>>>>>Eugene
>>>>>>
>>>>>>
>>>>>>If the NUMA stuff doesn't swamp me.  And if your continual updates to the
>>>>>>endgame tables doesn't swamp me.  We _might_ see some progress here.  :)
>>>>>>
>>>>>>If I can just figure out how to malloc() the hash tables reasonably on your
>>>>>>NUMA platform, without wrecking everything, that will be a step...
>>>>>
>>>>>Ok, just call the memory allocation function exactly where you are calling it
>>>>>now, and then let the user issue "mt" command before "hash" and "hashp" if (s)he
>>>>>want good scaling.
>>>>>
>>>>>Thanks,
>>>>>Eugene
>>>>
>>>>That's why i'm multiprocessing. All problems solved at once :)
>>>
>>>
>>>And several added.  Duplicate code.  Duplicate LRU egtb buffers.  Threads
>>
>>Duplicate code is good. Duplicate indexation egtb tables is good too (note the
>>DIEP ones do not require 200MB for 6 men, but a few hundreds of KB only).
>>
>
>wanna compare access speeds for decompression on the fly?  If you make
>the indices smaller, you take a big speed hit.  It is a trade-off.

Not really, I need compressed around 500MB for all 5 men. Nalimov 7.5GB.

What's more compact?

Idem 6 men. I might need more entries (5.22T in total) against nalimov a bit
less, but i would never store each entry in 2 bytes an entry.

So the direct savings is bigger already.

I proposed to Nalimov a scheme where you just store 'mate, mate in 15, mate in
16 etc.

So storing all mates in 0..12 and -1..-12 like 'mate' or '-mate'.

But the EGTBs compress a lot better then. This where any engine can calculate a
mate in 12 to 15 without problems (diep can at least).

>
>>Everything that's done local is better of course. By starting the entire process
>>local at a cpu you have that garantuee very sure. With multithreading you never
>>know what surprise hits you :)
>>
>>>are not necessarily bad here.  We're hitting 6.75M+ nodes per second on a quad
>>
>>That's *very* good.
>>
>>>opteron at 1.8ghz.  That's not bad.  I'll post some output when everything is
>>>cleaned up and finalized, particularly allocating the hash tables.
>>
>>Please do so. Would be cool to know some speedups as well using say a 400MB
>>hashtable.
>>
>>Also which kernel do you use?
>
>Eugene ran the tests, so you _know_ which kernel he used.  Windows.

Ah and a better compiler. That explains your speed.




>>Default linux kernel at quad opteron sucked ass when i applied latency tests to
>>it. There is a few special patched kernels. Not default patched but by certain
>>manufacturers.
>>
>>Like SGI.
>>
>>If you find a kernel that's real good NUMA keep us up to date here in CCC which
>>one.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.