Author: Robert Hyatt
Date: 07:58:34 09/04/03
Go up one level in this thread
On September 03, 2003 at 23:56:50, Matthew Hull wrote: >On September 03, 2003 at 19:57:23, Robert Hyatt wrote: > >>On September 03, 2003 at 18:57:06, Jeremiah Penery wrote: >> >>>On September 03, 2003 at 13:06:34, Robert Hyatt wrote: >>> >>>>The point for the "Crafty algorithm" is that I rarely share things among >>>>_all_ processors, except for the transposition/refutation table and pawn >>>>hash table. >>>> >>>>Split blocks are shared, but explaining the idea is not so easy. But to >>>>try: >>>> >>>>When a single processor is searching, and notices that there are idle >>>>processors, it takes its own split block, and copies the data to N new >>>>split blocks, one per processor. For all normal searching, each processor >>>>uses only its own split block, except at the position where the split >>>>occurred. There the parent split block is accessed by all threads to get >>>>the next move to search. That is not a very frequent access. And there, >>>>there will be penalties that are acceptable. But for the _rest_ of the >>>>work each processor does, I used a local split block for each so that they >>>>ran at max speed. That was the main change... >>>> >>>>Without that "fix" it ran very poorly. There was so much non-local memory >>>>traffic that performance was simply bad. With the fix, things worked much >>>>better. >>> >>>That's how I assumed it always worked anyway, with each processor using only its >>>own split block, so that there wouldn't be very many non-local accesses. From >>>that perspective, there are very few non-local accesses (as you say), and NUMA >>>doesn't cause much problems. >> >>It _could_ work that way. IE right now I have split blocks that are in a >>big array. They don't have to be. They could allocated locally on each >>processor, so that the first N are local to processor 0, the next N are local >>to processor 1, etc. Then the problem goes away. Unfortunately I didn't >>design it like that, but the change is not very difficult to do... But there >>is no real benefit until I get my hands on a real NUMA box (again) to play >>with... >> >> >>> >>>I guess my assumption was wrong about that, and I've been arguing from that >>>position. Thanks for the explanation. > > >Would this also help people with dual Athlon boxes to get better speedup? > >MH I don't believe so. The athlon still uses the same bus approach as all the intel processors...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.