Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k

Author: Robert Hyatt

Date: 19:44:16 08/22/04

Go up one level in this thread


On August 22, 2004 at 21:13:49, Tom Kerrigan wrote:

>On August 22, 2004 at 18:15:55, Robert Hyatt wrote:
>
>>On August 22, 2004 at 17:12:05, Tom Kerrigan wrote:
>>
>>>On August 22, 2004 at 11:10:33, Robert Hyatt wrote:
>>>
>>>>Simple.  I access a batch of random attack entries.  I then access a _lot_ of
>>>>other stuff before I come back to the attack entries.  Your 256K cache has 4K
>>>>lines.  I don't know what the set associativity is, as AMD has lots of options
>>>>there in recent history.  But that further reduces the number of "buckets" to
>>>>stuff stuff in.  My xeon claims 16-way set associativity, with 128 byte lines.
>>>>That turns into a paultry 256 sets.  It is very easy to get a bad physical
>>>>memory layout where you don't even use all of those sets, and where some sets
>>>>get badly overloaded.
>>>
>>>This is a bunch of nonsense. You make it sound like associativity somehow
>>>decreases the amount of cache you have. Really, associativity has no place in
>>>this discussion, except maybe to note that it reduces the behavior that you're
>>>complaining about, namely random accesses evicting important data from the
>>>cache.
>>
>>Then you don't understand set associativity.  It _does_ influence what goes
>>where.  Physical memory maps directly to a set.  A program doesn't necessarily
>>use _every_ set in cache due to poor physical memory layout decisions by the
>>O/S.
>
>The hardware textbooks I have say that the lower order bits of an address
>determine which set the cache line goes to, so if you use a certain amount of
>contiguous data (e.g., 64KB for an Opteron), it WILL make use of every set,
>regardless of where the OS has located your program.

Wrong.  The low order bits of the _physical_ address.  There is a _big_
difference.  Look up "page coloring memory allocation" for details on why this
is an issue.  I wrote a patch for linux 2.2 a couple of years back to solve
this.

Have you ever been to a chess event where people run their program, note the
NPS, say "too low", then reboot, run some odd application, then run their
program again, and repeat until they get the NPS they expect?  They are using
luck to get them into a good physical memory layout that maps to cache
optimally.

With 2K pages, you lop off the right 11 bits of the address and throw that out.
The next N bits (N=8 if you have 256 sets) map to the set.  How many pages are
there?  Lets take a 2 gig box.  1 million.  How many pages map to the same set?
one million divided by 256.  four thousand.  Think it is possible to grab the
_wrong_ 4000 pages and use just _one_ set in cache?  Of course it is.

Next topic please...


>
>Do you have reason to believe this isn't the case for the processors in
>question?

I _know_ it isn't the case.  Again, look up "page coloring" (and it has nothing
to do with crayola).



>
>>>But let's say you do randomly access your working set. How about you explain how
>>>performance isn't increased going from 256k to 512k cache?
>>
>>I believe I already answered.  I _can't_ explain it because I _can't_ reproduce
>>it.
>
>Why do you have to reproduce it to be able to explain it? How does reproducing
>it get you any closer to an explanation anyway? I didn't even produce the data
>in the first place (Anandtech did) and I explained it.


Because unlike you, I'm not clairvoyant.  If I can't reproduce it, and I have
data that is completely contradictory, then I conclude something is wrong with
the data _I_ didn't produce.  Until I am aware of an issue, I see no reason to
try to form hypotheses to explain the behavior, when I can't even produce the
behavior in the first place.





>
>>>large percentage of your maximum possible working set... Windows reports that it
>>>only allocates 5MB for Crafty, including hash tables, tables that never get
>>
>>Strange number.  Linux reports 20M.
>>How can it use 5mb when the default hash sizes total 4 megs?
>>IE start crafty and type "hash" and "hashp".  You get almost 4 megs for those.
>>There are 128 TREE blocks also, but they get malloc()'ed.  That is way over a
>>meg total...
>>5mb has to be wrong...
>
>I have Volker Pittlik's build of Crafty 19.7. It's on the web. Download it, run
>it, bring up task manager, and look at the number in the column on the right.

>
>Declaring that it "has to be wrong" just makes you look more like an idiot when
>you figure out that it's actually right.

Again, do you believe that the default code, with 4 megs of hash/phash, _really_
runs in 5M of RAM?  I don't.  I did run it under linux and got something that
seems more reasonable, namely 20M.  my executable file is over 1mb, and that
doesn't include all of the dynamically allocated RAM.

Who looks like an idiot???

The one who really _knows_ the program or the one who makes wild guesses about
the program???

BTW when you tested that did you _run_ the program?  It defers some mallocs
until it actually does a _search_...

Aha.  OK.  I just ran a test with (a) no egtables, (b) no compiled in threads
and I get a running size of about 5.75mb, with 3.75mb for the hash/hashp tables,
that leaves 2.0mb for crafty and data.  How much is unused?  I don't know.  But
then I don't run a stripped-down no-thread version either, so perhaps we are not
comparing apples to apples...



>
>-Tom



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.