Computer Chess Club Archives




Subject: Re: Source code to measure it - there is something wrong

Author: Robert Hyatt

Date: 14:48:20 07/17/03

Go up one level in this thread

On July 16, 2003 at 16:45:18, Vincent Diepeveen wrote:

>On July 16, 2003 at 14:41:45, Dieter Buerssner wrote:
>You can easily modify my code using the pointer idea and do next.
>Fill the entire hashtable with values. The value INSIDE the hashtable gives the
>next entry which you read, which of course is not a closeby entry but something
>at least a 100KB away *and* any entry in the past 50M bytes were in the same
>cache line. the result you xor also.
>So you get rid of the RNG *and* the mod.
>The only thing which would be needed is a clever system to jump through the
>Yet from my viewpoint things are clear. The latencies at supercomputers are
>quite a bit slower for random lookups.

Too broad a statement.

The latency at a Cray T90 or X1 is _not_ slower for random lookup.  But they
don't have virtual memory.

The latency for a NUMA machine is variable.

The latency for a plain SMP box is static, but a random memory read can have
a variable time penalty due to the virtual-to-real address translations.

So don't say "latencies at supercomputers are quite a bit slower for
random lookups."  That's wrong.  "latencies at NUMA parallel machines
are quite a bit slower..." might be o.k.

However, there are supercomputers, and there are SUPERCOMPUTERS.  What you
are running on is the former, _not_ the latter.

>So RASML measurements will show different values, way slower than the
>manufacturer gives them.

It depends too much on circumstances.  A manufacturer probably is not
going to quote worst-case.  Nor best-case.  But "average case".  and for
the normal applications run on big machines, this is correct.  IE the TLB
problem on my xeon has no effect on numerical applications that stream
through a large array.  Occasionally there is a TLB miss with a extra
cycle or two of delay, but then the next 1K reads are just at raw memory
latency speeds.

Hardly anything is truly random access.  A chess program certainly isn't as
the hash probe is a tiny part of the total memory referencing done.

>At PC hardware we still talk about a bit more than a factor2, but at
>supercomputers we really talk about 10 to 20 times.

Depends on _which_ "supercomputer" you talk about.  Not all use virtual
memory.  Cray, CDC, etc have never done that.

>>Just few comments about the thread.
>>An interesting test would be, to do lmbench type linked list test with Vincent's
>>idea of real random access. I may try it out later. No PRNG calls will be
>>needed. The linked list will be initialized "pseudo randomly". In this case, it
>>would mean, that it will not be too close to real random, because in one cycle
>>every memory adress will be read once. (This could easily happen anyway, with
>>not so decent PRNGs).
>>An perhaps interesting comment from lmbench source:
>>       /*
>>        * First create a list of pointers.
>>        *
>>        * This used to go forwards, we want to go backwards to try and defeat
>>        * HP's fetch ahead.
>>        *
>>        * We really need to do a random pattern once we are doing one hit per
>>        * page.
>>        */
>>So, the authors did not seem too confident with the sequential like access? Or
>>did I misunderstand.
>>The PRNG Vincent uses is fine. I will do some tests on it. Lagged Fibonacci type
>>generators don't have problems with mod (often rand() uses a linear congruential
>>generator, which can have severe problem, especially when used with mod. Anyway,
>>for this sort of test, I think even very bad PRNGs would do well. There is no
>>way, the hardware can guess the access pattern.
>A requirement is a big hashtable though. therefore the precalculated jumping
>through the hashtable would need to maximize distance from the previous read.

This page took 0.14 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.