Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: MTD(f)

Author: Vincent Diepeveen

Date: 12:44:53 07/04/03

Go up one level in this thread


On July 04, 2003 at 11:38:09, Andrew Williams wrote:

>On July 04, 2003 at 11:18:58, Vincent Diepeveen wrote:
>
>>On July 03, 2003 at 13:57:02, Dann Corbit wrote:
>>
>>>On July 03, 2003 at 12:28:05, Ralph Stoesser wrote:
>>>
>>>>Dear chess programmers,
>>>>
>>>>What are your personal experiences with the MTD(f) search introduced by Aske
>>>>Plaat some years ago?
>>>
>>>It does not work for me as well as it does for some others.
>>>
>>>I think success will depend very much on your particular engine.
>>>
>>>Andrew Williams has a successful implementation.
>>
>>Claims to have a successful implementation is more near the truth.
>>
>
>I am a bit surprised to read this. I sincerely hope you're not claiming that I'm
>lying about my implementation?

No i just said that you *claim* to have a successful implementation.

I didn't say a word more or less than that. I would be the last in the world to
suggest you are a liar as everyone knows you are honest.

I did imply however that i doubt that your implementation of MTD will use less
nodes on average when all the participants of the world champs 2003 would get
rid of the PVS that they use and use MTD instead.

I do know that it cannot work at supercomputers because of hashtable latencies
though and we all agree about MTD that if you can't quickly do lookups in global
hashtable that you are fried for sure because a research at ply 15 in middlegame
for sure is not anymore in L2 cache.

>>Hopefully getting a position from hashtable doesn't eat 4
>>microseconds with him.
>
>I've no idea how long it takes PM to get a position from the hashtable.

About 400ns at a dual machine or 280 ns at a single cpu K7/P4 with DDR ram
(which is the actual time it takes to do a random lookup).
Faster if it still was in L2 cache or in L1 which is unlikely when you
hit > 11 ply and do another research.

The origin2000 where Cilkchess ran at, it is worse than the origin3800.
At the origin3800 i measure around 4 microseconds to get a single 64 bits
quadword from memory.

Note that the 460 ns SGI gave to me in june 2002 and again in februari 2003, is
a bit optimistic representing the truth.

The only one that correctly gave me an indication of what latency he measured
himself was Johan de Koning.

Whereas Hyatt and Kerrigan and other dudes here just quote the commercial
SEQUENTIAL bandwidth numbers, he actually measured the times it took to get a
random cache line from memory. He told me 300 ns for a K7 (SDRAM at the time) he
measured.

That is very accurate to what i measure.

But it is far off from what Hyatt & Gordon quote here: 125 ns to do a lookup in
P4 dual Xeon memory? Under 60ns to get a random cacheline from memory from the
K7?

All bullshit.

A single cpu P4 can't even get under 280ns here!

Yes i wrote those tools myself. Otherwise i would be still publicly quoting the
460 ns crap and not know how to get diep to work at that machine.

Though i searched an entire day online, i could find nothing that could measure
for me the latency.

The best test actually that i found online that could measure for supercomputers
the latency is one way pingpong. If you multiply those results by 2, then you
get a vague idea about what the latency is already.

However that MPI stuff doesn't work too well at PC's and i could not execute it
at the 1024 processor Origin here. Also i'm not using MPI but i'm using OpenMP
so i was forced to waste time writing myself a test.

For those who want it, just email me.

The code is not a secret. It measures to be precise:
  the random average memory latency using shared memory for a buffer n
  and number of processors i.

Initially it was meant to measure for supercomputer but to my amazement it could
measure at the PC also pretty efficient.

Average because if something is still in L2 cache or L3 cache then that chip is
plain lucky.

I hope you realize that a number of 280 ns random read speed means that you can
lookup at most 3 million integers a second in memory.

Latency at dual machines is more like 400ns depending upon chipset and memory of
course.

That means you can lookup at most 2 to 2.5 million cache lines a second at a
dual P4 Xeon.

So a MTD program in advance at latest hardware has a disadvantage which is huge
and will only get bigger in time.

Yet when we talk about supercomputers, and cilkchess sure ran at several, that
is already for years the case.

So how could they say that MTD worked well for Cilkchess?

>Andrew



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.