Author: Robert Hyatt
Date: 14:35:09 08/25/03
Go up one level in this thread
On August 25, 2003 at 17:22:52, Sune Fischer wrote: >On August 25, 2003 at 16:50:08, Dan Andersson wrote: > >> The issue is the same. Because you can't guarantee that copying will be in >>cache. And you can't guarantee that other data structures won't be close or >>aligned in such a way that it won't trash the cache. The impact might not be >>great but it will be there. So the net cache bandwidth will be lower or even >>much lower than the simple linear relationship. Thus the slow main memory >>bottleneck will appear. > >For the whole picture goes, probably yes, but it is not easy to figure that >since there are many factors. > >I know this is down to hair splitting now, but IMO the reason that unmaking is >faster than uncopying isn't the one Bob gave, and I quote: > >" >>>I was thinking more about how silly it is to copy the empty bitboards for each >>>ply. If you update the boards that are active, they will stay in the cache. >>>Those that are not used might drop out, unless they are copied once every micro >>>second. >>> >> >>That is a reasonable rate for a program that searches 1M nodes per second. I'm >>going at 2.4M so make that about once every 400 nanoseconds. :) Suddenly it >>begins to add up in a big way. :)" > >As though the 2.4 Mnps was the reason. No. the 2.4M nps simply gives a frequency, roughly 400ns. Which is _my_ programs frequency on my dual 2.8ghz box. That gives me a _specific_ time per node, and it is pretty easy to estimate that copy/make is going to be a significant part of that... I was not saying that 2.4M nodes per second is the reason it fails for me, particularly. I simply said that I search a node per 400+ ns, which means I have to do a copy/make every 400+ ns. That's a lot of bandwidth. That the PC doesn't really have. The dual actually makes this worse than a single cpu, as I said, due to two caches, snooping writes, and invalidating things in their own cache that the other processor just modified in the other cache. >The reason is that double stacks increase memory trafic _between CPUs_, but that >is _not_ what he said if you follow the thread, and this thread wasn't about SMP >at all, so if that was his point I'm not sure how it related to the discussion. It was about crafty and copy/make. As I said, if I run at 2.4M nodes per second, I have to do a copy/make every 400 ns. Whether I have one processor or 1024 processors, that won't change. And, in fact, on the dual it is harder to do that than on a single because of snooping. > >He gave numbers of 25%, nobody can confirm those numbers (I get ~10%), but I >figure now that he was talking 25% in SMP search, or what? No. Crafty version 9 was not SMP. The first SMP version was 15.0. The copy/make was dropped in version 9, and it produced a 25% speedup, no more, no less. For Crafty specifically. That's all I can say with any certainty. And I don't claim it is 25% for _other_ programs. Only that it was 25% for Crafty, and that was the _only_ change to the program. Going from Copy/Make to Make/Unmake. > >Suddenly the whole thing is rather confusing because his numbers doesn't compare >with non-smp numbers, and I believe Johan was talking strictly non-smp. I don't think there are _any_ differences between SMP and non-SMP in this regard, other than possibly SMP is worse, rather than being better as you suggested (dual caches, etc). > >But anyway, this is getting silly. ;-) > >-S. >>MvH Dan Andersson
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.