Author: Robert Hyatt
Date: 11:01:42 08/25/04
Go up one level in this thread
On August 25, 2004 at 13:43:52, Tom Kerrigan wrote: >On August 25, 2004 at 10:52:23, Robert Hyatt wrote: > >>My issue was _never_ "cache misses". It was _always_ "how big does cache have >>to be to contain enough of Crafty to not have many misses?" > >Sure, that makes sense. Your issue was cache misses, not cache misses. Real conceptual problems with you it seems. To make it _real_ simple: The issue is _not_ the _number_ of cache misses. The issue is "how big does cache have to be to eliminate the cache misses completely?" Because _that_ represents the size of cache necessary to contain the "working set" of the program, _by definition_. So whether there are one million misses per second or 50 per second doesn't matter. As long as there are misses, there are things being used that are not in cache, hence part of the program's working set is not there. Is that _really_ so hard to understand??? > >>The answer is obvious from _either_ set of data. There are definite "points" >>where cache size increases for either instructions or data stop reducing the >>misses. That is the cache size that contains nearly everything, and it gives a >>pretty good benchmark for the instruction or data working set sizes. > >Now we're getting into a semantic argument. If that's how you want to define >working set, fine. I'm defining it as the point where enough of Crafty is in >cache that main memory performance doesn't make a significant (measurable?) >difference. I define working set as every textbook I have defines it. It is usually used in reference to demand paging, but it fits here as well... "The working set of a program is defined as the set of pages that have to be in main memory to avoid any paging activity." "The resident set of a program is defined as the set of pages that are actually in memory at any time." The optimal resident set size for a program is its working set size. If you want to use another definition, then fine. Define it and we can go from there. But without an * behind the name, I tend to use the normal definition of a term. IE Vincent's incorrect "memory latency" number. Latency is the time to read/write a word of memory. Independent of any TLB/cache issues. access time factors in TLB misses/hits and so forth. But he scrambles the terms and then argues when someone disagrees with his unique definition that is unrecognized anywhere else. If you want to try to define "What is the optimal cache size for Crafty?" that is an interesting question. If you want to define "what is the cache footprint for Crafty?" that is also interesting. And different. I am talking about the cache footprint at the moment. Clearly once you have a cache large enough to hold everything the program uses, making it bigger will _not_ make the program run any faster. Once it is all in L2, memory is out of the picture and more L2 will have no effect. I showed that effect for instructions as being 256K. For data, it is bigger, although there are marked drops at certain points and I would not be willing to pay the extra cost for L2 beyond that which provides acceptable performance gains. A total of 1024 looks pretty good. A total of 2048 looks a bit better although it would not be worth doubling the processor price IMHO. > >>I _knew_ that you would be unable to see the forest for the trees. And that >>posting the numbers would only serve to divert the conversation to a _different_ >>topic, which it has. > >Yeah, the fact that you have a new, contradictory set of data throws a wrench >into the whole "drawing conclusions from data" thing, doesn't it? Seriously, >your new data shows miss rates that are different by factors of > 4 and a >completely different trend between cache sizes. Sorry if this kinda thing >"cramps your style." I don't have any "new and contradictory data." I gave old cache data. Eugene gave more. I just finished producing even more. None of it contradicts anything, except for your 512/1024 test which is different. You originally claimed my working set was < 512K. I don't see how you can continue to claim that after the data I have provided from testing it two different ways. If the working set was less than 512K, then any cache size larger than 512K would not help performance at all, yet it clearly does. As it did a couple of years back when I did the financial analysis to decide which CPU version to buy for my quad xeon when I did the upgrade. In the absence of more data or information, it would seem that this particular argument has reached the end of the road. If you think that the WS size is < 512K, even when 512K of data cache still has cache misses, fine by me... And that ignores the approximately 128K-256K of instructions that are also needed in cache. > >-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.