Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k

Author: Sune Fischer

Date: 01:46:00 08/21/04

On August 20, 2004 at 21:36:57, Robert Hyatt wrote:

>On August 20, 2004 at 19:09:20, Sune Fischer wrote:
>
>>On August 20, 2004 at 13:03:28, Robert Hyatt wrote:
>>
>>>On August 20, 2004 at 12:05:59, Tom Kerrigan wrote:
>>>
>>>>On August 20, 2004 at 05:10:10, Tony Werten wrote:
>>>>
>>>>>On August 20, 2004 at 04:33:07, Tom Kerrigan wrote:
>>>>>
>>>>>>Now that AMD is selling two processors that are identical other than L2 cache
>>>>>>size (Sempron has 256k, Athlon 64 has 512k) we have proof of Crafty's working
>>>>>>set size:
>>>>>
>>>>>Or that it's >> 512
>>>>>
>>>>>Basicly, this only proves that it's not between 256 and 512.
>>>>
>>>>That's not how cache works. By that rationale, you might as well not have any
>>>>cache at all. See my reply to Bob above.
>>>>
>>>>-Tom
>>>
>>>
>>>Tom, that is _exactly_ how cache works for some reference patterns.
>>
>>I don't believe this pattern is an accurate model for a chess engine.
>
>Believe what you want.  I spent a fair amount of time making this so for Crafty
>to improve cache utilization.  IE I _want_ to access memory serially to take
>advantage of pre-fetching through cache line fills.  And I re-ordered chunks of
>data to make this happen.

But you tend to go over the same nodes again and again, i.e. when at the leaf
and you do a two ply search, you will have a lot of evaluations of almost the
same position and your tree stack will be going back and forth on the same tiny
dataset.

>>Then let's design a way :)
>>
>>For instance, say Crafty has a working set of 200 kB, try and add a dummy table
>>of 1 kB to your working set by looping through this at every node.
>>
>>Now run the test again with increasing table sizes until you get hit with a big
>>slowdown. That's when you start serious trash caching.
>>
>>Of course it assumes there's no trashing to begin with, but since Crafty is a
>>pretty fast searcher I don't think there are any big cache issues.
>
>
>"You don't think there are any big issues..."  "I _know_ there are".  I've
>actually measured it, analyzed it, and worked on it both with folks at Microsoft
>through Eugene, and through folks at AMD...

There is trashing and then there is trashing.

I think it matters if you do it at every node or only at every 10000th node.

>move generation tables
>extern signed char directions[64][64];
>extern BITBOARD w_pawn_attacks[64];
>extern BITBOARD b_pawn_attacks[64];
>extern BITBOARD knight_attacks[64];
>extern BITBOARD bishop_attacks[64];
>extern BITBOARD bishop_attacks_rl45[64][64];
>extern BITBOARD bishop_attacks_rr45[64][64];
>extern BITBOARD rook_attacks_r0[64][64];
>extern BITBOARD rook_attacks_rl90[64][64];
>extern int bishop_mobility_rl45[64][64];
>extern int bishop_mobility_rr45[64][64];
>extern unsigned char bishop_shift_rl45[64];
>extern unsigned char bishop_shift_rr45[64];
>extern int rook_mobility_r0[64][64];
>extern int rook_mobility_rl90[64][64];
>extern BITBOARD rook_attacks[64];
>extern POSITION display;
>extern BITBOARD queen_attacks[64];
>extern BITBOARD king_attacks[64];
>extern BITBOARD king_attacks_1[64];
>extern BITBOARD king_attacks_2[64];
>extern BITBOARD obstructed[64][64];
>extern BITBOARD w_pawn_random[64];
>extern BITBOARD b_pawn_random[64];
>extern BITBOARD w_knight_random[64];
>extern BITBOARD b_knight_random[64];
>extern BITBOARD w_bishop_random[64];
>extern BITBOARD b_bishop_random[64];
>extern BITBOARD w_rook_random[64];
>extern BITBOARD b_rook_random[64];
>extern BITBOARD w_queen_random[64];
>extern BITBOARD b_queen_random[64];
>extern BITBOARD w_king_random[64];
>extern BITBOARD b_king_random[64];
>extern BITBOARD stalemate_sqs[64];
>extern BITBOARD edge_moves[64];
>extern BITBOARD enpassant_random[65];
>extern BITBOARD castle_random_w[2];
>extern BITBOARD castle_random_b[2];
>extern BITBOARD wtm_random[2];
>extern BITBOARD clear_mask[65];
>extern BITBOARD clear_mask_rl90[65];
>extern BITBOARD clear_mask_rl45[65];
>extern BITBOARD clear_mask_rr45[65];
>extern BITBOARD set_mask[65];
>extern BITBOARD set_mask_rl90[65];
>extern BITBOARD set_mask_rl45[65];
>extern BITBOARD set_mask_rr45[65];
>
>
>Those are just a few random tables,

They aren't random at all, that's my point.

Take for instance b_knight_random, I bet you can easilly go 10000 nodes without
having to access b_knight_random[0]. And when the knights are off you don't need
them at all.

> not counting the actual stuff in my TREE
>structure that gets "swept over" many times in a search..

I guess it doesn't matter much, as long as you don't do it at an ultra high
frequency.
Maybe there is a cache miss when going from ply 20 to ply 0, but the search
doesn't do that very often, most of the time is spent going from ply 15 to 20 or
some such.

>bitboard programs are not very small.  But the disadvantages of size are offset
>in other ways, particularly as cache gets bigger and bigger.  This ignores
>hashing and repetitions and everything else.

You're ignoring the facts here, it seems that larger caches (today) has no
influence on performance even for Crafty.

>testing says bigger hashes help.  At least for the testing I reported, and for
>the test results Eugene reported.

You ran a test but you did not isolate the problem first so what you saw does
not interpret easily.

I think at some point you begin to cache part of the hash. Since you revisit the
same nodes again and again in iterated deepening, if you could cache some of
that (not just hash it but also cache it!) it might give you a measurable
speedup.

With caches that large there is an effect from this I'm sure, how big it is I
have no idea but it makes it hard to say what you're measuring.

>I can't say more but I can say what I have actually _measured_.

You saw what you say yes, but it's your interpretation of the experiment I
question.

-S.

Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Robert Hyatt 07:46:13 08/21/04
- Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Sune Fischer 09:18:15 08/21/04
  - Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Russell Reagan 18:14:57 08/21/04
    - Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Sune Fischer 01:57:56 08/22/04
      - Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Robert Hyatt 08:14:40 08/22/04
        
        Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Sune Fischer 10:20:08 08/22/04
        
        Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Robert Hyatt 15:04:31 08/22/04
  - Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Robert Hyatt 14:48:08 08/21/04
    - Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Tom Kerrigan 00:11:59 08/22/04
      - Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Robert Hyatt 08:18:43 08/22/04
        
        Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Tom Kerrigan 14:20:08 08/22/04
        
        Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Robert Hyatt 15:24:44 08/22/04
        
        Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Tom Kerrigan 18:22:30 08/22/04
        
        Re: Sempron vs. Athlon 64: Proof that Crafty's working set is < 256k Robert Hyatt 19:50:47 08/22/04
        
        Re: Sempron vs. Athlon 64: (more) Robert Hyatt 08:35:47 08/22/04

This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.