Author: Vincent Diepeveen
Date: 17:09:04 12/05/02
Go up one level in this thread
On December 05, 2002 at 15:26:00, Matt Taylor wrote: >On December 05, 2002 at 09:02:16, Vincent Diepeveen wrote: > >>On December 04, 2002 at 18:52:19, Matt Taylor wrote: >> >>>Yes, 1.3% speed increase is significant when dealing with algorithm analysis, >>>and 13% is even more incredible. However, 13% is barely significant when you're >>>comparing the speeds of hardware, and that's what you're doing. >>> >>>The following is how RDRAM works -the way I understand it-. I could have some >>>facts grossly wrong. My interests have been in the AMD platform because I can >>>build a cluster that operates much faster for the same cost. That said, I am >>>-fairly- certain that I have all my facts straight about RDRAM. >> >>How fast is your cluster RAM latency? >> >>Are we talking about a default 1Gbit network >>with milliseconds latency or so really not capable of running >>programs that do some inter process communications, or >>something faster than that? > >Yes, I'm talking about parallel problems that don't need interprocess >communication. Chess isn't the only thing that's computable, and it's not the >only thing I enjoy computing. :-) under 'cluster' i usually imagine that i/o speed == memory speed for the cluster. that the case here too? DIEP runs on cc-NUMA SGI systems easily. that's however having quite some bandwidth. about 1 terabyte/second in case of TERAS machine. Of course that's for 1024 processors. Quite a bit. >>>No RDRAM part operates on a 100 MHz clock. The pc800 part operates at 400 MHz on >>>a DDR bus with a width of 2-bytes. This yield a maximum bandwidth of 400*2*2 = >>>1.6 GB/sec. The pc1066 part operates at 533 MHz on the same bus, and 533*2*2 = >>>2.1 GB/sec. The P4 FSB is a 16-byte DDR bus running at 100 or 133 MHz (100*2*16 >>>= 3.2 GB/sec, 133*2*16 = 4.2 GB/sec). >> >>This is not true for latency. You are quoting bandwidth calculation here. >> >>It operates at 100Mhz internally but it is quad pumped. >> >>That quad pumped increases bandwidth, but not latency. >> >>So for latency you must face the fact that it is 2 times slower. >> >>For latency DDR ram is 2 times faster: it is 133/100 * (15T/10T) = 2.0 >>times faster. >> >>This is why DDR ram is way faster for me than RDRAM at the same >>processor. >> >>Note that the bandwidth i do not believe either, but as i said before >>we can discuss forever here. If you go do some big matrix calculation >>(say from 2 gigabyte) where bandwidth is important, then the testresults >>i see is that DDR ram has bigger practical bandwidth. >> >>theoretical it is clear that RDRAM has more bandwidth. >> >>But i do not want to get into that discussion, it is an endless discussion. >> >>For latency things is very clear. >> >>If you still do not understand it, then test it yourself. > >I do agree about the latency bit, though I am not sure why you use the pc800 >part instead of the pc1066 part. I don't think latency has ever been in dispute. >RDRAM has been criticized for that from the very beginning. > >I am not sure how the RDRAM bandwidth discussion is endless; I have never seen >anyone claim that RDRAM has lower bandwidth than DDR. I'm not talking about >lower performance in some application -- I'm talking about the synthetic tests >that measure the amount of bandwidth of RDRAM vs. DDR. Certainly my own tests >parallel everything I have read. > >A synthetic benchmark is important here because we want to measure performance >of the hardware to use as a predictor for performance in a real-world >application. If we wanted to measure real-world performance of the application, >we would take his end-game database software and run it on two systems and >compare throughput. > >>>Note that the P4 has a higher FSB speed than the single RDRAM chip. This is >>>intentional. What good chipsets do is issue parallel requests. This means that 2 >>>RDRAM modules get twice the bandwidth of a single RDRAM module. Coincidentally, >>>you are required to add them in pairs. The same technique -can- be applied to >>>DDR, but at present I have not heard of this. (This is why I don't have to >>>purchase my DDR modules in pairs -- much to my relief.) >> >>You have pretty old knowledge then. try AMD 760MPX chipset which requires >>also 2 modules. > >I own an AMD 760MPX-based board, and I am currently running off of (1) Samsung >pc2100 1GB CL=2.5 Reg/ECC module. At home I use a Tyan Tiger MPX, and at work I >have an Iwill MPX2. Both are based on the AMD 760MPX chipset. > >I borrowed an Unregistered Samsung 256 MB module and paired it with an >Unregistered Micron 256 MB module that I own, and my bandwidth is within ~10% of >what I had from the single DIMM. My machine at work uses 2 512 MB CL=2.5 Micron >modules, and its bandwidth is within 20-30% of what I get at home. If the >chipset had any sort of mux, I should see more than 20-30% gain. > >The bandwidth calculation is done by copying large amounts of memory, a fairly >standard algorithm. Consistent, repeatable results (within 1%) in addition to >confirmation from other tests reassure me that the test is accurate. > >>>Chip-for-chip, DDR modules may sustain higher transfer rates, but I assure you >>>that empirical data shows RDRAM systems winning the bandwidth war. >> >>No. >> >>This looks like a RAMBUS propaganda talk you write down here. >> >>For DIEP DDR ram, even at the P4, it is 13% faster or something than >>RDRAM. >> >>This where the cpu speed is most important for DIEP. > >DIEP uses hash-tables. DIEP is latency-dependent. DIEP will run a little slower >on RDRAM because it has higher latency. > >>>The P4 Williamette system I used at work for a while had a practical bandwidth >>>of 2.8 GB/sec on pc800 RDRAM, a 100 MHz bus. The faster DDR-based boards use >>>pc2700 which, as I understand, really isn't standard. This is a maximum >>>theoretical bandwidth lower than what I have measured. I have a dual AthlonMP >>>1600 with about 1.3 GB/sec bandwidth. The SMP factor is probably biasing the >>>measurement, but in either case I'm not convinced that there is any DDR system >>>that even matches that old P4 on RDRAM. >> >>Why not buy some chess software and compare a P4 RDRAM system with DDR ram. > >Chess software isn't necessarily an accurate benchmark of bandwidth, which is >why synthetic benchmarks aren't necessarily accurate for chess. The original >question wasn't, "Which type of ram runs chess faster?" The question was, "Which >type of ram performs best in end-game computation?" > >For the computation of the database, the bandwidth is likely the most important >factor, particularly since the WC memory type allows a lazy-commit style of >memory writing. For table queries like DIEP uses, DDR will probably be faster. > >>>-Matt >>> >>>On December 04, 2002 at 18:19:43, Vincent Diepeveen wrote: >>> >>>>On December 04, 2002 at 17:40:13, Matt Taylor wrote: >>>> >>>>i hope you realize that good programmers/designers work months >>>>to get 1.3% speedup. Both chessprogrammers for their program >>>>and hardware designers for their chips. >>>> >>>>13% is really a lot then if you understand that the speed >>>>of DIEP isn't depending only upon memory speed, but even more >>>>upon processor speed. >>>> >>>>So the actual speedup of DDR ram over SDRAM in latency is >>>>more like 100% faster, which is actually true. >>>> >>>>DDR ram needs 10T versus RDRAM 15T. That's already 50%. >>>> >>>>RDRAM initially was clocked 100Mhz and >>>>the DDR ram is clocked 133Mhz. >>>> >>>>Nowadays there is also RDRAM clocked to higher speeds than 100Mhz >>>>(quad pumped of course), but still it is of course 50% slower in >>>>timing than DDR ram. >>>> >>>>So where RDRAM might win it nowadays perhaps on bandwidth (tests >>>>which try to pump actual terabytes of data through the ram suggest >>>>that fastest DDR ram can pump through more than fastest RDRAM, >>>>despite theoretical specifications of the RDRAM versus theoretical >>>>specifications of DDR ram, but i don't want to get in the middle >>>>of a battle there which is getting fought out non-stop; and the >>>>truth is simply that you have to choose to believe either technical >>>>specifications or the actual tested speeds by experts so it is >>>>a forever 'yes' 'no' fight), there is not a single doubt on >>>>what is the better latency. >>>> >>>>DDR ram has over 50% faster latency than RDRAM. This is very clear. >>>>The bus of most of the tested old P4s was 100Mhz, versus K7 soon >>>>already 133Mhz. So also that speed difference we must take into >>>>account. >>>> >>>>If that total of 1.33 * 1.5 = 2.0 times faster latency is >>>>then giving a 13% speedup of DIEP, then that is quite a lot IMHO. >>>> >>>>>On December 04, 2002 at 13:32:01, Vincent Diepeveen wrote: >>>>> >>>>>>On December 04, 2002 at 11:42:17, Matt Taylor wrote: >>>>>> >>>>>>>On December 04, 2002 at 10:43:59, Vincent Diepeveen wrote: >>>>>>> >>>>>>>>On December 04, 2002 at 10:21:08, James T. Walker wrote: >>>>>>>> >>>>>>>>>On December 04, 2002 at 08:00:35, martin fierz wrote: >>>>>>>>> >>>>>>>>>>hi, >>>>>>>>>> >>>>>>>>>>i'm on the lookout for a new PC for endgame database computations. i'll probably >>>>>>>>>>be buying a lot of ram, 2-3GB. i see that there is a big price difference >>>>>>>>>>between DDRAM and SDRAM. IIRC the main difference is that you get a larger >>>>>>>>>>bandwidth, but about the same latency with DDR - so i suppose i'm better off >>>>>>>>>>buying SDRAM for my application. any opinions of the experts? >>>>>>>>>> >>>>>>>>>>thanks in advance >>>>>>>>>> martin >>>>>>>>> >>>>>>>>>For what it's worth: I purchased one stick (256M) of DDR ram to compare to my >>>>>>>>>cheap SDRAM. I found no noticable difference in chess performance (just price). >>>>>>>>> I did not do any extensive testing. I simply compared Fritz marks. I suspect >>>>>>>>>that in the future most motherboards will not accept the SDRAM. >>>>>>>>>Jim >>>>>>>> >>>>>>>>I see a big difference. 64 versus 32 bytes cache lines matters >>>>>>>>a lot for DIEP and all software that doesn't fit within L1 cache. >>>>>>>> >>>>>>>>Best regards, >>>>>>>>Vincent >>>>>>> >>>>>>>Cache line size is a part of the CPU, not the ram. There are a number of >>>>>>>transitional products, both P4 and Athlon, that accept both SDRAM and DDR SDRAM. >>>>>>>(However, I have never heard of anyone happy with these products.) >>>>>> >>>>>>the P4 ended up being a lot faster for DIEP when i tested a p4 with ddr ram >>>>>>isntead of RDRAM. >>>>>> >>>>>>P4 with ddr ram (northwood) is like 1.5 : 1 for a K7 >>>>>>used to be 1.7 : 1 to a k7 with rdram. >>>>>> >>>>>>So 1.7 Ghz P4 rdram == 1.0Ghz K7 for DIEP >>>>>> 2.4 Ghz P4 ddr == 1.6Ghz K7 for DIEP (both ddr). >>>>>> >>>>>>DDR is a big step forward!! >>>>>> >>>>>>i don't know where the processor gets 64 bytes instead of 32 bytes in >>>>>>the design. I just know it gets 64 bytes, versus SDRAM 32. >>>>>> >>>>>>Best regards, >>>>>>Vincent >>>>> >>>>>By your figures, DDR SDRAM speed compared to RDRAM speed on a P4 platform is >>>>>1.7/1.5 = 113%. I wouldn't call 13% a "big step forward." >>>>> >>>>>This also makes the assumption that both the 1 GHz K7 and 1.6 GHz K7 run equally >>>>>fast. The 1 GHz K7 is the Thunderbird chip. The 1.6 GHz K7 is the AthlonXP 1900. >>>>>Thunderbirds report that they are model 4, whereas AthlonXP 1900 may report >>>>>model 6 (palomino) or 8 (thoroughbred). Model 4 and Model 6 are not the same >>>>>thing, and they differ in MORE than just instructions. One change that I have >>>>>observed is that the model 6 L2 cache is slightly faster. Other timings have >>>>>probably changed, too. >>>>> >>>>>I will also mention that a 2.4 GHz P4 is the P4 Northwood. The 1.7 GHz P4 may be >>>>>a Northwood, but I suspect (based on the numbers) that it was probably the older >>>>>Williamette. The major difference is that the P4 Williamette had a smaller L2 >>>>>cache (256KB instead of 512KB). >>>>> >>>>>I will have to agree with Jeremiah, here. If DDR SDRAM is faster, DIEP is >>>>>latency-dependant. If RDRAM is faster, it would be bandwidth-dependant. I have >>>>>measured pc800 RDRAM bandwidth on one of my systems, and it exceeds theoretical >>>>>bandwidth on any standard part DDR SDRAM. (I am not completely sure, but I don't >>>>>think pc2700 is part of the JDEC specification.) >>>>> >>>>>I am not sure what you're saying about 64-bytes vs. 32-bytes, but I assure you >>>>>that SDRAM-based, DDR-based, and RDRAM-based P4s all have the cache line size. >>>>>The information is available from the cpuid instruction. The vector is >>>>>documented in both Intel and AMD literature, but off-hand I don't know which >>>>>vector it is. There are many utilities, especially for Windows, that will give >>>>>this information. I -believe- wcpuid is one such utility, but I usually end up >>>>>writing a program every time I get curious about cpuid information. >>>>> >>>>>If you would like, I will write such a program and post it. >>>>> >>>>>-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.