Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: working set is < 384k

Author: Robert Hyatt
Date: 19:59:50 08/22/04
On August 22, 2004 at 18:40:40, Vincent Diepeveen wrote:

>On August 22, 2004 at 18:00:33, Robert Hyatt wrote:
>
>>On August 22, 2004 at 12:48:12, Vincent Diepeveen wrote:
>>
>>>On August 21, 2004 at 17:29:32, Robert Hyatt wrote:
>>>
>>>>On August 21, 2004 at 08:41:27, Vincent Diepeveen wrote:
>>>>
>>>>>On August 20, 2004 at 11:51:27, Tom Kerrigan wrote:
>>>>>
>>>>>>On August 20, 2004 at 10:51:50, Robert Hyatt wrote:
>>>>>>
>>>>>>>On August 20, 2004 at 04:33:07, Tom Kerrigan wrote:
>>>>>>>
>>>>>>>>Now that AMD is selling two processors that are identical other than L2 cache
>>>>>>>>size (Sempron has 256k, Athlon 64 has 512k) we have proof of Crafty's working
>>>>>>>>set size:
>>>>>>>>
>>>>>>>>Sempron:    1,080,020 NPS
>>>>>>>>Athlon 64:  1,080,230 NPS
>>>>>
>>>>>Did you test in 32 bits mode or so?
>>>>>
>>>>>In 64 bits mode the instruction sizes are bigger (though slightly), but you need
>>>>>less instructions, so the stress is less on the processor then and more on the
>>>>>L2/ main memory.
>>>>>
>>>>>>>>http://www.anandtech.com/linux/showdoc.aspx?i=2170&p=3
>>>>>>>>
>>>>>>>>This should prove once and for all that Crafty's working set is < 256k and
>>>>>>>>therefore that size of L2 cache has no effect on its performance (as long as
>>>>>>>>it's >= 256k) and that main memory speed likely plays a trivial role
>>>>>>>>performance-wise.
>>>>>>>>
>>>>>>>>I bring this up because of all of the long debates that have occurred in the
>>>>>>>>past about the value of L2 cache, the speed of memory, and the working set size
>>>>>>>>of chess programs.
>>>>>>>>
>>>>>>>>I have no doubt that Crafty uses a bunch of memory, but obviously not with
>>>>>>>>enough temporal locality for it to matter one iota.
>>>>>>>>
>>>>>>>>-Tom
>>>>>>>
>>>>>>>
>>>>>>>Your interpretation _could_ be seriously flawed.  IE suppose its working set is
>>>>>>>2mb?  You can't conclude anything if that is true as both the 256K and 512K
>>>>>>>would be thrashing equally.
>>>>>>
>>>>>>How is it they would by thrashing equally? Let's say a cache access takes 5ns
>>>>>>and main memory takes 50ns. Average access times for 2MB working set:
>>>>>>256k cache: (256k/2MB)*5ns + (2MB-256k/2MB)*50ns = 44.37ns
>>>>>
>>>>>A number of things Tom.
>>>>>
>>>>>a1) First of all you need to prove that the L2 caches from both cpu's is giving
>>>>>data in the same number of cycles
>>>>>
>>>>>a2) what type of memory is used with the processors? If 256KB processor has
>>>>>400Mhz CL2 memory and the other one has 266Mhz CL3 memory then the comparision
>>>>>would look odd.
>>>>>
>>>>>b) You forgot to add the L1 cache
>>>>>  128 + 256 = 384KB cache
>>>>>
>>>>>I assume your claim is working set size < 384 KB
>>>>>
>>>>>c) how big did you set the hashtable size to? Some specint2000 comparision where
>>>>>only 2 MB or something similar gets used is not real interesting. We want 400MB
>>>>>memory or so.
>>>>>
>>>>>d) which version of crafty did you use? Nowadays versions do only sequential
>>>>>lookups in 1 table and old specint one is doing 2 lookups in 2 different tables.
>>>>>
>>>>>e) when using a big hashtable the fastest way to get hashentries is around 91 ns
>>>>>that assumes 400Mhz memory and CL2 memory. For example dual opterons it is hard
>>>>>to get under 133 ns.
>>>>>
>>>>>f) you really should do 64 bits tests, we've had enough 32 bits tests already.
>>>>>
>>>>>This all doesn't take away that i agree with the conclusion that for a
>>>>>chessprogram when running SINGLE cpu it doesn't matter whether the L2 cache is
>>>>>256 or 512 or 2048 KB. The real important things are the L1 cache size, the L2
>>>>>cache random lookup SPEED and how fast the main memory can randomly access
>>>>>hashtables.
>>>>>
>>>>>>512k cache: (512k/2MB)*5ns + (2MB-512k/2MB)*50ns = 38.75ns
>>>>>
>>>>>>That's 15% faster. You'd think a difference that big would show up in the
>>>>>>benchmark score but it doesn't. Or are you going to claim that Crafty always
>>>>>>uses memory that it hasn't used for the last ~512k?
>>>>>
>>>>>I agree with the math when the L2 cache speed is the same speed. However you
>>>>>must *prove* those first. There is big differences even from processor line to
>>>>>processor line.
>>>>>
>>>>>I remember that northwood P4 was a lot faster for DIEP than the generation P4
>>>>>before and yet everyone is doing as if it is the same processor even. For diep
>>>>>there was a 20% difference between P3 and P2 speed, for fritz it didn't matter
>>>>>anything. And so on.
>>>>>
>>>>>>>Only _real_ way is to test with larger sizes as well.  I've done that up to 2mb
>>>>>>>and saw improvement from 512 to 1024K and from 1024K to 2048K, on older xeons.
>>>>>>
>>>>>>Sure. And if other people run those tests and don't see a difference, you'd say
>>>>>>a chip needed a 4MB/8MB cache for anybody to be sure.
>>>>>
>>>>>You are correct here. That's how Bob works.
>>>>
>>>>Nah.  That is how _you_ work.  Give one example and claim it 'proofs' your
>>>>point.  One example doesn't prove something.  It can disprove a claim easily
>>>>enough however.  You've been burned enough times that way to understand...
>>>
>>>Try to disprove that crafties working set size is under 384KB.
>>
>>
>>
>>Already did.  Increasing cache size beyond working set won't improve
>>performance.  Yet my results had faster performance in going from 512 to 1024 to
>>2048K.  And Eugene's results produced faster performance with 3M than with 1.5M.
>
>Eugene i do not believe at all and he just posts: "i saw a difference" without
>proof.

What proof is needed.  He gave a performance number for the improvement of the
larger cache.  As to whether you believe him or not, that's totally unimportant.
 Who has the most credibility here, you or him?

Think _carefully_ about the answer.


> Let him show proof. Probably the difference in eugene's tests were either
>caused by bad testing, differences in the 2 cpu's, like the clockheight of the
>cpu. It is logical IMHO that an itanium cpu of 1.5Ghz@3MB L3 cache is faster
>than a 1.3Ghz@1.5MB L3 cache. And the crafty version and hashtable sizes are
>missing.
>
>See www.aceshardware.com very hard proven in a very accurate PUBLIC way where
>Johan de Gelas for DIEP has proven there is zero difference between 512KB P4's
>and 2048KB P4's.

Fine.  I ran myself on 512K 1024K and 2048 xeons and for Crafty there _was_ a
difference.


>
>Johan de Gelas concluded very clearly that there is zero difference in speed,
>with a very high confidence.
>
>The proof from Johan is:
>  a) very accurate done (repeated many times the runs)
>  b) very TRANSPARANT done
>  c) very clear
>  d) very professional
>  e) results are posted
>  f) the hardware description is very clear and the circumstances how the
>     test was done was very clear and very fair
>
>All those 6 points Eugene doesn't have. He only 'remembers' something vague from
>the past. Show hard proof here i'd say Eugene! Not words.
>
>Results. Hardware descriptions, RERUNS. using SAME executables produced by the
>SAME compiler with the SAME options. And doing it on the SAME chips. No
>extrapolated results.


That is what I did.  Exactly.  When trying to decide if I wanted to pay the
extra price.  Later intel dropped the 512K processors when I upgraded to my 700
mhz xeons, and my speedup was more than the clock rate suggested, most likely
attributed to the 2x larger L2 cache as the 700's only came in 1M and 2M
flavors.


>
>So Eugene we must completely disbelieve here until all details are posted. I
>have seen more of such stuff of him here. Like crafty on a 32 processor machine.
>Zero logfiles posted...
>

So?  What log files do _you_ ever post?  Yet you expect us to believe _your_
crap anyway?  I post my log files.  Once I do, you dry up and blow away of
course...



>>That's a trivial proof...
>
>Where is your proof?
>
>I just see words. And i see hard disprove of everything you post here.
>Especially www.aceshardware.com shows very clearly what is the truth.


Yes.  You certainly proved my 3.1 speedup was wrong.  You certainly proved that
I get absolutely no speedup on a dual.  You proved that Intel could not cache
physical ram beyond 256MB.  The list goes on and on.  Great proofs.  Shot down
by actual raw data...

I _never_ see you post anything but words.  No logs.  No numbers.  No speedups.
No nothing but lots of hot air.




>
>By the way, talking about wordgames you play in other postings here. I consider
>it very bad sportmanship from you to quote english mispelling from me, in order
>to let me look stupid. I speak roughly 5 languages from which 3 fluent
>(dutch,german,english), but i wasn't born in USA. I live in The Netherlands. If
>you want to, we can discuss in Dutch too. You won't be able to catch me on many
>spelling mistakes there. How about you?


Learn the word "prove" then.  It has been pointed out to you _many_ times.

But I certainly don't have to point out a spelling error to make you look
stupid.  There are plenty of things that cause that to happen...




>
>>
>>
>>
>>>
>>>>
>>>>
>>>>>
>>>>>>I have easy access to 2GHz Athlon 64s with 512k and 1MB cache... if somebody can
>>>>>>point me to a Windows Crafty executable and tell me what to type, I'll happily
>>>>>>run the test.
>>>>>
>>>>>>-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.