Author: Robert Hyatt
Date: 10:19:24 12/11/02
Go up one level in this thread
On December 10, 2002 at 20:56:47, Matt Taylor wrote: >On December 10, 2002 at 20:00:38, Vincent Diepeveen wrote: >> >>However we must take into account that at 1 point crafty is completely >>getting toasted. it is toasting itself by doing so much memory references. >>2 probes in 2 different hashtables. If you remove that probe to the >>second table and consider that in a single cache line even at ddr ram >>you already get 64 bytes, so can do 4 probes for free in a single hashtable, >>then you will realize also with me that after that change crafty will >>run like the sun dual also at a dual k7. First, if this conversation is going to continue, how about grounding it in fact rather than fiction. If you do four probes, either you had better be _sure_ that you probe to an address that is divisible by 64, or you will _not_ get all the probes for free. Cache doesn't work like that. If you probe _randomly_ to a specific entry, then you get the 64 byte cache line (I am assuming AMD, as Intel is 32/128 byte line lengths depending on the processor) that contains that hash table entry, but it could be the first 16 bytes in the line (assuming 16 byte hash table entry) or it could be the last 16 bytes. With a 64 byte cache line and normal transposition table addressing, you are going to get two probes for free, four will cost 2x as much. >> >>P4 duals are forever history then and left behind bigtime by the dual k7. "history"? :) > >I own an AthlonMP and use another at work. They're budget SMP compared to what I >have heard about Intel's offerings. That's also why they're affordable, but >price isn't a consideration when you're talking only about performance. > >Some of the dual-P4 systems have 8.4 GB/sec memory bandwidth. The hash table >probe will be much faster on such a system than on either of the dual-Athlons >that I use, particularly when the hash is interleaved across memory banks. Yes, >RDRAM might be slower to access, but when you can access 4 locations in >parallel, the 15 bus clocks latency diminish with respect to equivalent >serialized access to 10 clock latency DDR modules. This assumes that such >accesses can be made in parallel, but I'd say that's a safe assumption for a >hash table. Also the E7500 chipset interleaves with DDR ram, which is what is on my machine hree... > >>I call that very bad programming. Weak programming. Especially because it >>was already mentionned by me and others over a year ago to Hyatt. And you _still_ don't understand the "cache line" issue based on your above staetments, so perhaps it is "weak understanding" by you rather than "weak programming" by me. My hash table works just fine. That is the #1 criterion. >> >>Also provable is that 4 sequential probes perform better than a 2 table >>approach. Even in ICCA journal that was already proven years ago. >> >>Now it is pretty bad to say that this research done was not accurate. >> >>I dislike such excuses. 5 tests of Nalimov proof something. 6 tests >>of someone else do not. This is very bad science of course. I'm full of bad science. I only run tests at _least_ four times each and average the results before I post numbers. You should try some "bad science" yourself, rather than doing what you currently do. "xeon 2.8 is not available". Etc. All wrong. All the time. >> >>the 2.4Ghz Xeons which nalimov used to report a 20+% speedup or so, >>i also tried (but single cpu) and couldn't get any speedup at all at >>them. Exactly what HT-enabled machine did you run on? You just said "no processors yet do hyper-threading" in another post. Can you not keep up with all the wrong statements you make and at _least_ keep them consistent with each other? >> >>SMT not working simply at them. > >You have to enable it in the BIOS. > Of course. But he would have had to have actually run on such a machine to know this. He didn't. So he doesn't... >>This is legal. Intel only garantuees SMT to work at P4s above 3.0 Ghz >>and at Xeon MPs (fastest Xeon MP released so far 2.0Ghz which is supposed >>to run up to 8 processors in contradiction to Xeon which is only running >>to 2 processors). >> And that is _still_ wrong. In one paragraph you say you tested on a 2.4mhz xeon. In another you say they don't so SMT. I _have_ one and it _does_ to SMT as I posted above. I notice you dind't respond (as usual) when someone posts _real data_ rather than garbage... >>So intel was correct in their statement from my viewpoint. All stuff i tested >>didn't get any speedup yet. I didn't test a 3.0Ghz P4 yet. they do not deliver >>them yet here. >> Intel didn't say a thing about xeons not supporting SMT. The _specifically_ said that the PIV 3.06ghz would be the first _desktop_ pentium processor to support SMT. However, the xeons are another story. And no, SMT is not only supported in Xeon MP processors, it is supported in _all_ current xeon processors. >> >> >>>-Matt > >I'm sure you'll see different results if and when you do. Intel rarely lies >outright; they usually rely on consumer ignorance. > >-Matt In the case of Vincent, that is a good bet. :) I think it funny that someone that has not touched a SMT processor is arguing with a few that actually have them in their offices (Eugene and myself specifically). And we are talking xeons, not PIV with SMT which I have _not_ tried. The xeon is a different animal from the PIV. And all I can quote data for is the xeon.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.