Author: Vincent Diepeveen
Date: 12:30:47 11/11/02
Go up one level in this thread
On November 11, 2002 at 14:02:34, Bob Durrett wrote: >On November 11, 2002 at 13:39:08, Vincent Diepeveen wrote: > >>On November 11, 2002 at 13:18:12, Russell Reagan wrote: >> >>That's how the OS reports it to you. In fact i have >>different statements here from different OS experts. >> >>One of them said for example: "in console this OS (without quoting >>which OS he referred to) i measured for my application A that >>i saved out up to 10% in speed". >> >>I am no expert here, but at least his words made sense. >> >>If many threads are running at an OS, then i can imagine that switching >>each 0.002 seconds to another thread is going to cause damage. >> >>Context switching of the processor etcetera. >> >>At todays processors however it won't be 10% i bet. >> >>So taking that into account, basically the compiler matters a lot. >> >>Nowadays however compilers work for different OSes. I do not know how >>good compilers are for mckinley/R14000/alpha/Sun when compared to X86 >>compilers. >> >>I get impression that the x86 compilers have made up a lot of terrain. >> >>the alpha compiler was said to be very very good, but it never impressed >>upon DIEP in fact. Speeds at alpha were horrible. On the other hand others >>reported the Sun to be a horrible processor/compiler combination, but >>DIEP ran fine at SUN cpu's. McKinley is very fast for DIEP (1Ghz Mckinley >>like a 1.33Ghz K7 even) but i have no idea how well its compiler is >>compared to other processors. >> >>If i compare the specs from the K7/P3 versus the McKinley, we see >>a major difference in specifications: >> K7 + P3 can do up to 3 instructions a clock >> McKinley is doing 6 instructions a bundle (if i understand well) >> >> K7 + P3 have horrible small L2 cache >> McKinley has *huge* L3 cache 3 MB even >> >> Yet it is only 33% faster or so. With some more fine tuning i might >> get it bigger. I wasn't capable yet to check out what branch prediction >> means for it. >> >>In general i have *no* idea what the OS eats from those processors. I get >>impression however that the OS gets more important at SMP machines than it >>is at single cpu machines. >> >>Basically these compilers which usually only works for 1 or 2 OSes, >>determine what speed you get under that OS, because even if it would be >>an incredible 10% which the OS eats (hard to believe for me it would be >>that high for todays OSes) then that still means peanuts compared to what >>a good compiler can save you out. >> >>>On November 11, 2002 at 13:02:44, Bob Durrett wrote: >>> >>>>Would the engine perform significantly better using that dedicated operating >>>>system? [As compared to using a commercially available OS] >>> >>>You can get an idea of how much time is used by the OS. On my computer I look >>>under Task Manager and it says: >>> >>>Image Name CPU Time >>>System Idle Process 6:19:14 >>>IEXPLORE.EXE 0:02:16 >>>msdev.exe 0:01:22 >>>Explorer.exe 0:00:53 >>>System 0:00:22 >>> >>>And so on. So I have over 6 hours of idle time, and the next biggest chunk of >>>CPU usage time was by Internet Explorer, of a whole 2 minutes. That means there >>>is 99.5% of the CPU time that could have been used by a chess program. So the >>>question is whether or not a 0.5% increase in speed is going to mean >>>"significantly better" results. I think not. > >As a non-programmer, what I am getting out of this is that the choice of >operating system pretty much dictated the choice of compiler [maybe several >compilers would do]. Then the compiler dictated what the operating system would >be doing while the chess engine was running. > >If that is the essence of it, then my original question needs to be modified. >Perhaps the $1,000,000 someone gave to the GURU Chess Programmer had to be not >just for development of a new operating system but also development of a new >OPTIMAL compiler for that operating system. > >Then the comparison would be between a chess engine compiled using the new >optimal compiler and run using the new operating system versus the way it is >done today. > >Bob D. First of all you can't hire guys like Nalimov for a few years of work for just $1000. Secondly, i would not like writing DIEP in assembly knowing that it runs on different CPUs. But basically you hit the essence that the OS dictated pretty much the compiler. Lucky that's easier nowadays. Even gcc works for windows a bit, but the only program that's faster on K7 (didn't measure that exe at P4 yet as the SMP functions do not compile in the gcc cross compile) so far with gcc is DIEP. With all respect for the hard work of compiler creators, you can't expect that a single person like me is going to write its engine into assembly, simply because it might be faster than writing in C. I would have needed versions then in assembly for: - Alpha (21164 + 21264) - SUN (sparcs) - HP (several cpu's) - G4 (OS/X) - x86 (and for each new x86 cpu update the code to new code) - McKinley - R14000 That's impossible to achieve of course for a single person. Also i doubt the program would progress. Programmers like Frans Morsch have to start completely fresh designing a new fritz version. I guess because a new cpu requires a completely new approach and every clock matters for it. Getting rid of on average 1 misprediction a node means already the entire program is 2% faster, and even way more than that at the P4. I wonder how he is going to write assembly for a processor like the McKinley *ever*. Answer is i guess *not*. Note that he still can manage this great effort by keeping the program small. The 2MB source code the DIEP engine is in total (that's excluding interface) by now (growing slowly), would be impossible to rewrite each time of course. One thing is for sure. From search depth viewpoint, Fritz IS getting a minimum nominal depth because of being in assembly, which DIEP doesn't get at all. On the other hand, if i get 500 processors which each can do 6 instructions a bundle a clock, so a potential of say around: 6 * 1Ghz * 500 = 3000 * 1000 000 000 = 3,000,000,000,000 clocks a second = 3 * 10^12 instructions a clock *potentially*. No assembly makes up for portability IMHO :) Best regards, Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.