Author: Aaron Gordon
Date: 17:07:13 03/18/03
Go up one level in this thread
Here are some tests that I've run on one of my machines. It's an AthlonXP 1900+ @ 1.6GHz (non-overclocked), 133fsb on an Abit KT7a (KT133A) motherboard. The ram type is regular SDRAM (non-DDR). 3 Dimms used, one 256mb and two 128mb's. So 3 slots in use, 6 banks in use total (0-5). 4-way interleave Sisoft memory test: ALU w/ SSE : 1002mb/s ALU w/o SSE or MMX : 577mb/s CraftyK7 19.3: 1,046,116 nodes/sec 2-way interleave Sisoft memory test: ALU w/ SSE : 1002mb/s ALU w/o SSE or MMX : 547mb/s CraftyK7 19.3: 1,046,116 nodes/sec No Interleaving Sisoft memory test: ALU w/ SSE : 993mb/s ALU w/o SSE or MMX : 517mb/s CrafyK7 19.3: 1,046,116 nodes/sec As you can see interleaving helped nothing for crafty. Doubling ram bandwidth while keeping all hardware, memory timings, etc identical also proved no increase (unless you want to count 0.14% as something out of the margin of error of the benchmark). No, before you claim my interleaving is 'broken' I should mention my Quake3 scores. By enabling 4-way over no interleaving I was able to increase my frames per second on that system by around 20%. Quake3 is extremely memory speed/bandwidth/latency bound. You modify ANYTHING memory related and you get drastic framerate changes. To give you some idea.. if you upgrade your cpu from an AthlonXP 1500+ (1.33GHz) to a 1900+ (1.6GHz) you get about a 9-10% boost in framerate (for Quake3). 9-10% from a 20% boost in clock speed. Again, just by enabling 4-way interleaving it jumped up 20% without changing anything else. So, without a doubt, interleaving DOES work on that board. Ah, almost forgot to mention.. even my old FIC 503+ socket-7 motherboard (VIA MVP3 chipset) supports 4-way interleaving. One thing I did find doing some testing on much older cpus was L1 & L2 cache sizes vs Crafty NPS. I don't have the numbers on-hand but I'll find them if you'd like. Anyway, a 486 with 8K was about half as fast in Crafty as a 486 with 16K L1.. this is at the same MHz. Adding L2 cache further increased performance (something on the order of 60-80%). This was with extremely small L1/L2 cache sizes. On todays modern chips I believe you'll see next to no difference between the same type of cpu with a different cache configuration. Take the Duron (128K L1, 96K L2) vs a Thunderbird (128K L1, 256K L2) for example. Gets identical NPS in Crafty. Also, of course, I'm speaking about normal end-user systems. Not $60,000 itaniums, multi-million dollar cray machines, just normal 32bit x86 computers. So before stating you get some odd percentage increase with such & such cache, yadda memory timings, it's best to tell people, "On such & such Itanium it got x% increase in nodes/sec, YMMV on a 32bit x86 however" or something similar. Point being... someone may go out and get an entirely new board, ram, keeping their same cpu, etc trying to increase memory bandwidth drastically. To their surprise when they run the crafty benchmark all they'll see are the same numbers (or something extremely close). All this leads to is disappointment and partially wasted money (partially because they may use something else other than Crafty that does get a boost from the increased bandwidth).
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.