Computer Chess Club Archives

Search

Terms
Messages

Subject: the usual cache line length discussion

Author: Vincent Diepeveen
Date: 10:10:20 04/01/03
On March 31, 2003 at 13:46:09, Robert Hyatt wrote:

>On March 31, 2003 at 12:28:48, Vincent Diepeveen wrote:
>
>>On March 31, 2003 at 11:46:19, Robert Hyatt wrote:
>>
>>>On March 31, 2003 at 09:57:34, Vincent Diepeveen wrote:
>>>
>>>>On March 30, 2003 at 22:03:05, Robert Hyatt wrote:
>>>>
>>>>Hello,
>>>>
>>>>Apart from node count and search depths it is appreciated to post entire
>>>>log.* files.
>>>>
>>>>Without trying Crafty yet at anything (precondition) i'll pick
>>>>a few middlegame positions (in endgames things are clear enough
>>>>anyway in favour of hashtable):
>>>>
>>>>2rq1rk1/pb3ppp/1p2pn2/4N3/1b1PPB2/4R1P1/P4PBP/R2Q2K1 w - - 0 2 d5 pos.27 bs2830
>>>>d4d5
>>>>2r1r3/p3bk1p/1pnqpppB/3n4/3P2Q1/PB3N2/1P3PPP/3RR1K1 w - - 0 2 Rxe6 pos.5 bs2830
>>>>e1e6
>>>>rqr3k1/pp1bppbp/2np1np1/8/2BNP1P1/2N1BP2/PPPQ4/2KR3R w - - pos.0 Uri Blass corr
>>>>a1a1
>>>>r1bqr1k1/pp1p1ppp/5n2/1N2n3/2P1PB2/6P1/PP1Q1K1P/4RB1R b - - 0 1 (d7d6) Uri Blass
>>>>d7d6
>>>>
>>>>The last position is to verify things. It should matter there. 4 positions
>>>>will do here as the result is deterministic.
>>>>
>>>>With just 4 positions it is possible to calculate next:
>>>>a graph hashtable speedups versus searchtime.
>>>>
>>>>For the 4th position assumption is it will give big speedup as there is
>>>>more transposition possibilities and more king side checks there than
>>>>in the other positions. On the other hand there is other problems
>>>>there as well.
>>>>
>>>>As verification it is interesting to run the same 4 positions with 2 or 4 search
>>>>threads to compare.
>>>
>>>I am running the test now.  But I am _not_ going to run 2 or 4 threads.  That is
>>>not
>>>the point of this discussion.  It is about "how much RAM is needed for hash
>>
>>You are afraid of course that we see your speedup at the average complex
>>middlegame :)
>
>I'm not afraid of anything.  If you want to compare speedups with a set of
>positions, I'll pick N, you pick N, and lets send our engines to a 3rd party to
>do the tests?  I don't mind at all.
>
>However, in _this_ thread, parallel performance is _not_ being discussed, only
>"how big does the hash table need to be?"
>
>As usual, you tend to obfuscate knowledge rather than add to it, for reasons
>only
>you can understand.  Hash efficiency and parallel search are _not_ related to
>each
>other.  Because you can have a hash table _without_ having a parallel search,
>and
>you can have a parallel search _without_ having a hash table...
>
>
>
>>
>>>table in
>>>a current chess engine?"  Nothing more, nothing less.  I gave a pretty simple
>>>formula
>>>as a first approximation, and I gave some data to support that.  I'll run your
>>>positions
>>>but I suspect the hash results are going to be exactly the same.  I'll post
>>>_both_ again
>>>to see if "my positions were bad" as you claim...
>>
>>>-
>>>
>>>> My own findings is that a big hashtable size is very
>>>>important for parallel search whereas at 3 minutes a move or so it doesn't
>>>>matter much for a single cpu search (diep using 8 probes).
>>>
>>>My own finding is that hash table size is important for either sequential or
>>>parallel searching, with no particular advantage for either.  It's easy to
>>>verify
>>>of course.
>>
>>Hashtable is only important if either it is way too small the replacement
>>algorithm sucks or if too little probes are getting done. At your beloved P4 you
>>can do in the 128 byte cache lines like 8 probes without getting penalized.
>
>If there was a performance gain, I might.  But what I have works just fine and
>so
>I don't.  Simple story, really...  I tried multiple probes early on, because I
>did it that
>way in Cray Blitz.  The Belle approach was about as efficient and didn't trash
>memory
>as badly.
>
>And note that the PIV only has 128 bytes in a L2 line, not L1.  And since I
>don't write code
>for a specific processor, I hardly want to write for 128 byte lines when the PIV
>is the only
>box that is using that in the X86 world.
>

Remember that the slow thing for hashtables is to get the bytes out of the RAM,
not from L2 or L1 cache. Those caches are very fast on the P4.

Actually i do assume 64 to 128 bytes with DIEP. 128 i use for hashtable.

Even then betting on 64 bytes is safe as that's what DDR ram gives at K7.

Currently you use 32 bytes.

within a few years trivially all manufacturers will move up to 128 bytes at
least, simply because that gives a bigger bandwidth and though big bandwidth for
99% of all applications is not the biggest issue (latency more important of
course) the market only knows the word 'bandwidth'. So they demand big
bandwidth.

>
>>
>>Of course there is other influences as well. Like move ordering too much
>>dependant upon hashtable move. In diep move ordering at cutoffrate improves in
>>general exactly 1% when i compare move orderings done without hashtable versus
>>move orderings done with hashtable.
>>
>>If your results indicate a major difference with my findings, then the move
>>ordering quality is most likely the best explanation.
>
>Or the testing approach...
>
>
>
>
>>
>>>
>>>
>>>>
>>>>>On March 30, 2003 at 15:09:19, Vincent Diepeveen wrote:
>>>>>
>>>>>>Bob of course didn't proof anything as usual and all significant counts are not
>>>>>>shown.
>>>>>
>>>>>_nobody_ "proofs" anything by running test positions and giving the
>>>>>results.  That is a "demonstration" and _not_ a "proof".  However, the
>>>>>"significant counts" _are_ given.  Total time to run the test, total
>>>>>nodes searched, and the hash size.  What else do you want?
>>>>>
>>>>>
>>>>>> To start with also the solutions of positions at BT2630 are found within
>>>>>>a second or so, so are no good positions to use for such hashtable tests.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>The solutions don't matter.  What matters is the search space, which is large
>>>>>whether it finds the correct move or not.   And hashing helps whether the right
>>>>>move is found or not.
>>>>>
>>>>>
>>>>>>I need to note however that where when DIEP runs single cpu my results are very
>>>>>>similar to what you presented here, when you are parallel, things change a lot.
>>>>>>
>>>>>>For programs that search in parallel it is very important to have a big
>>>>>>hashtable. I see clear differences there in speedup when using big hashtable
>>>>>>versus small hashtable.
>>>>>>
>>>>>>That difference is likely to explain the assumption from Bob versus your
>>>>>>'proof'.
>>>>>>
>>>>>>Trivially the testpositions you used are better than the first 6
>>>>>>positions shown below:
>>>>>>
>>>>>>C:\engine\fen>type bt2630.fen
>>>>>>rq2r1k1/5pp1/p7/4bNP1/1p2P2P/5Q2/PP4K1/5R1R/w
>>>>>>f5g7
>>>>>>6k1/2b2p1p/ppP3p1/4p3/PP1B4/5PP1/7P/7K/w
>>>>>>d4b6
>>>>>>5r1k/p1q2pp1/1pb4p/n3R1NQ/7P/3B1P2/2P3P1/7K/w
>>>>>>e5e6
>>>>>>5r1k/1P4pp/3P1p2/4p3/1P5P/3q2P1/Q2b2K1/B3R3/w
>>>>>>a2f7
>>>>>>3B4/8/2B5/1K6/8/8/3p4/3k4/w
>>>>>>b5a6
>>>>>>1k1r4/1pp4p/2n5/P6R/2R1p1r1/2P2p2/1PP2B1P/4K3/b
>>>>>>e4e3
>>>>>>
>>>>>>Note that i am always amazed at how Bob picks his testpositions.
>>>>>
>>>>>How could you be "amazed" when you have no idea how I do it?  I just
>>>>>picked those as I had used them earlier and had the files set up to
>>>>>run them.
>>>>>
>>>>>Feel free to suggest any 6 positions and I'll run 'em to show that _they_
>>>>>behave exactly the same for Crafty...
>>>>>
>>>>>You like to pick on test positions.  You always claim yours are better.
>>>>>When I run them (the deep blue positions come to mind) and get results
>>>>>similar to results I get on my own positions, you run and hide and ignore
>>>>>the results.
>>>>>
>>>>>So pick 6, post the FEN, and I'll run em again, same way.  One thread,
>>>>>with hash sizes from 48K to 384M, and post the same numbers again.  Total
>>>>>time used, total nodes searched, hash size used...
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>Best regards,
>>>>>>Vincent
>>>>>>
>>>>>>On March 30, 2003 at 14:09:33, Tom Kerrigan wrote:
>>>>>>>On March 30, 2003 at 10:50:44, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>Now I hope you will choose to dump that "this disproves the hyatt claim"
>>>>>>>>stuff, you clearly didn't disprove _anything_...
>>>>>>>
>>>>>>>I was of course referring to:
>>>>>>>
>>>>>>>"The simple rule is that the hash table needs to be at _least_ large enough to
>>>>>>>hold the entire tree."
>>>>>>>
>>>>>>>Don't you think the word "need" is a little strong in this situation? I mean,
>>>>>>>chess programs work fine without huge hash tables, so maybe they don't "need"
>>>>>>>them.
>>>>>>>
>>>>>>>I notice that you didn't present any data on how much of the search tree was
>>>>>>>being stored in the hash tables, and without that data you obviously can't point
>>>>>>>to a significant performance increase when the entire table is stored, so I
>>>>>>>don't see that you even touched on the issue, much less proved it or disproved
>>>>>>>it.
>>>>>>>
>>>>>>>-Tom
Re: the usual cache line length discussion Robert Hyatt 10:58:49 04/01/03
- Re: the usual cache line length discussion Vincent Diepeveen 18:32:35 04/01/03
  - Re: the usual cache line length discussion Robert Hyatt 21:18:24 04/01/03
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.