Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: MTD(f)

Author: Robert Hyatt
Date: 21:25:52 07/04/03
On July 04, 2003 at 15:44:53, Vincent Diepeveen wrote:

>On July 04, 2003 at 11:38:09, Andrew Williams wrote:
>
>>On July 04, 2003 at 11:18:58, Vincent Diepeveen wrote:
>>
>>>On July 03, 2003 at 13:57:02, Dann Corbit wrote:
>>>
>>>>On July 03, 2003 at 12:28:05, Ralph Stoesser wrote:
>>>>
>>>>>Dear chess programmers,
>>>>>
>>>>>What are your personal experiences with the MTD(f) search introduced by Aske
>>>>>Plaat some years ago?
>>>>
>>>>It does not work for me as well as it does for some others.
>>>>
>>>>I think success will depend very much on your particular engine.
>>>>
>>>>Andrew Williams has a successful implementation.
>>>
>>>Claims to have a successful implementation is more near the truth.
>>>
>>
>>I am a bit surprised to read this. I sincerely hope you're not claiming that I'm
>>lying about my implementation?
>
>No i just said that you *claim* to have a successful implementation.
>
>I didn't say a word more or less than that. I would be the last in the world to
>suggest you are a liar as everyone knows you are honest.

You certainly _implied_ he was a liar, however.  As usual, that which you
can't do yourself is impossible, and anyone else claiming to do it successfully
is lying.


>
>I did imply however that i doubt that your implementation of MTD will use less
>nodes on average when all the participants of the world champs 2003 would get
>rid of the PVS that they use and use MTD instead.
>
>I do know that it cannot work at supercomputers because of hashtable latencies
>though and we all agree about MTD that if you can't quickly do lookups in global
>hashtable that you are fried for sure because a research at ply 15 in middlegame
>for sure is not anymore in L2 cache.
>
>>>Hopefully getting a position from hashtable doesn't eat 4
>>>microseconds with him.
>>
>>I've no idea how long it takes PM to get a position from the hashtable.
>
>About 400ns at a dual machine or 280 ns at a single cpu K7/P4 with DDR ram
>(which is the actual time it takes to do a random lookup).
>Faster if it still was in L2 cache or in L1 which is unlikely when you
>hit > 11 ply and do another research.
>
>The origin2000 where Cilkchess ran at, it is worse than the origin3800.
>At the origin3800 i measure around 4 microseconds to get a single 64 bits
>quadword from memory.
>
>Note that the 460 ns SGI gave to me in june 2002 and again in februari 2003, is
>a bit optimistic representing the truth.
>
>The only one that correctly gave me an indication of what latency he measured
>himself was Johan de Koning.
>
>Whereas Hyatt and Kerrigan and other dudes here just quote the commercial
>SEQUENTIAL bandwidth numbers, he actually measured the times it took to get a
>random cache line from memory. He told me 300 ns for a K7 (SDRAM at the time) he
>measured.
>

More ignorance on your part.  I _never_ quote "sequential bandwidth" numbers.
I _always_ quote random access numbers and give the _measured_ latency for
such numbers using something like lm_bench, for example.  Which is pure
random latency.  I don't know why you make such broad and incorrect statements
all the time.



>That is very accurate to what i measure.
>
>But it is far off from what Hyatt & Gordon quote here: 125 ns to do a lookup in
>P4 dual Xeon memory? Under 60ns to get a random cacheline from memory from the
>K7?
>
>All bullshit.

I can do a hash probe in 150ns on my dual xeon.  _measured_.  Not "guessed"
as you like to do.  Anytime you want to measure my machine, let me know.  It
is _not_ difficult to find the _exact_ amount of time required to access
128 bits of data from RAM.  Or with the current code, 3 X 128 bits, which is
a small piece of a L2 cache line.

>
>A single cpu P4 can't even get under 280ns here!

Then you don't know how to measure it.  Just run lm_bench.

>
>Yes i wrote those tools myself. Otherwise i would be still publicly quoting the
>460 ns crap and not know how to get diep to work at that machine.
>
>Though i searched an entire day online, i could find nothing that could measure
>for me the latency.

lm_bench does it _perfectly_ and has been validated on hardware all over
the world, with technical journal papers to back it up.

>
>The best test actually that i found online that could measure for supercomputers
>the latency is one way pingpong. If you multiply those results by 2, then you
>get a vague idea about what the latency is already.
>
>However that MPI stuff doesn't work too well at PC's and i could not execute it
>at the 1024 processor Origin here. Also i'm not using MPI but i'm using OpenMP
>so i was forced to waste time writing myself a test.
>
>For those who want it, just email me.
>
>The code is not a secret. It measures to be precise:
>  the random average memory latency using shared memory for a buffer n
>  and number of processors i.
>
>Initially it was meant to measure for supercomputer but to my amazement it could
>measure at the PC also pretty efficient.
>
>Average because if something is still in L2 cache or L3 cache then that chip is
>plain lucky.
>
>I hope you realize that a number of 280 ns random read speed means that you can
>lookup at most 3 million integers a second in memory.
>
>Latency at dual machines is more like 400ns depending upon chipset and memory of
>course.

No it isn't.  It seems to peek at 150ns for registered DDR ram, and can be
faster.  My laptop with SDRAM clocks in at 125ns.  Aaron's wildly overclocked
box clocked in at just over 60ns which was _way_ impressive.

>
>That means you can lookup at most 2 to 2.5 million cache lines a second at a
>dual P4 Xeon.

Your numbers are wrong.  Again, run lm_bench and you will get _real_
numbers.  2.5 x 128 = way less than a gigabyte per second.  Which is
way wrong.  Taking the measured 150ns on my dual xeon turns into something
way faster than that although not as fast as pure sequential reading of
course.


>
>So a MTD program in advance at latest hardware has a disadvantage which is huge
>and will only get bigger in time.

That makes absolutely no sense to me.  My PVS probes at every node.  MTD(f)
does the same.  There is no difference in the hash probe speed requirement
for one over the other.  Both depend on the hash probe speed equally.

>
>Yet when we talk about supercomputers, and cilkchess sure ran at several, that
>is already for years the case.
>
>So how could they say that MTD worked well for Cilkchess?

Because it _did_?


>
>>Andrew
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.