Author: Robert Hyatt
Date: 07:40:17 12/27/98
Go up one level in this thread
On December 26, 1998 at 06:25:26, Roberto Waldteufel wrote: > >On December 24, 1998 at 19:17:42, Dave Gomboc wrote: > >>On December 24, 1998 at 10:17:41, Robert Hyatt wrote: >> >>>On December 24, 1998 at 06:07:17, Roberto Waldteufel wrote: >>> >>>> >>>>On December 24, 1998 at 01:40:45, Eugene Nalimov wrote: >>>> >>>>>The idea is interesting, but I doubt it can be used. For >>>>>example, I don't have FAT drive - all my drives are NTFS; >>>>>I also know persons who use HPFS at Windows NT. Also, there >>>>>are several FAT formats - e.g. FAT16 and FAT32. Possible >>>>>more to come when Windows 2000 will be released... I don't >>>>>want to compete with MS in supporting new disk formats. >>>> >>>>Aha - this is a pity. Still, how many different formats can there be? I suppose >>>>it might be possible to cater for some presently common FAT types, and resort to >>>>a default normal disc read if the program encounters a format that it did not >>>>recognise, which would allow for support of newer formats to be added later on, >>>>but it certainly makes the whole approach rather problematic. Perhaps as a >>>>safeguard the program could run a test at start-up time where it looks up data >>>>using both methods to ensure that it is reading the same data? >>>> >>>>> >>>>>I don't know what percentage of time is spend inside the >>>>>operating system on non-disk-reading tasks, but I think >>>>>that it's small enough. SQL Server was speed up by only >>>>>several percents when moving from the OS partitions to >>>>>the raw partitions, and I'd bet that yours speedup will >>>>>be even smaller. >>>>> >>>>>OS keeps parts of FAT (or NTFS disk allocation table) >>>>>in it's disk cache. You'll keep data on data location >>>>>directly in the chess program. This way you'll save several >>>>>hundred instructions, maybe several thousand, not more. After >>>>>that you'll wait disk drive (that is rotated with 4,500- >>>>>10,000RPM) for the data. That will take tents of thousand >>>>>instructions. >>>>> >>>>I had a feeling there were going to be more problems to face :-( >>>>However, I think that the disc waiting time might be made use of for a limited >>>>amount of parallelism in the search algorithm, ie while waiting for the >>>>tablebase data to become available from the disc, the program can perhaps be >>>>doing something useful like retracting the move that lead to the look-up >>>>position and generating/making the next move from the ply above. If we are >>>>talking time enough for tens of thousands of instructions, then relatively CPU >>>>intensive tasks might be performed while access takes place. >>>> >>> >>> >>>you can already do this under linux at least, and probably in windows. In >>>linux, we call this "non-blocking I/O" where I can do a read when I want, but >>>my process isn't blocked at that point. Later I can either ask for a signal to >>>be sent to my process when the I/O has been completed, or when I reach the point >>>where I need the data I can block there waiting on the I/O to finish. >>> >>>It doesn't help much. The fastest disk I know of has a 5ms average access >>>time. the xeon/400mhz processor executes a couple of instructions every 2.5 >>>nanoseconds, which translates into 2.5 *million* instructions every millisecond, >>>or (in the case of my IBM 10K drives) 12.5 *million* instructions that I could >>>execute while waiting on that seek/read to complete. I don't have that many >>>instructions to execute, unfortunately... Most programs do maybe 2,000 >>>instructions or so per node. not 2 million... so there is really nothing to >>>do while waiting on the I/O... >> >>Windows NT supports OVERLAPPED_IO (non-blocking), but Windows 95 and 98 do not. >> >>Dave Gomboc > >I can't help myself from licking my chops at the thought of all those CPU >instructions. Suppose we use threads under Windows to simulate multiple CPU's on >a single CPU computer. I guess that entails some non-trivial overhead, but on >the plus side, when we hit a tablebase node, assuming we have non-blocked disk >access capability (ie we either have NT or else have a VxD to provide such a >facility under Win95 or Win98), we could use the Windows Sleep API call to >simply give a time slice back to Windows while the physical disk access actually >takes place. This time slice could then be used by the other active threads, >thereby gaining a large number of otherwise wasted CPU instructions. > >Now the question is whether the saved time at tablebase nodes would be enough to >pay for any inherant overhead in the simulated parallel search. Of course in >positions that have no tablebase hits at all in their trees, we gain nothing at >all to compensate for the overhead, but in an iterative deepening framework, we >could count the number of tablebase hits on each iteration, and only call the >simulated parallel algorithm if a previous iteration had more than some >threshold of tablebae hits. > >I have absolutely no idea how great the overhead of a simulated parallel search >would be. Has anyone ever experimented with using threads in this way? I have a >dim recollection of reading something here several months ago about using >threads to simulate a parallel search in order to develop and debug a parallel >search algorithm on a single CPU machine, but I think the idea was just for >development, with the final version intended for a multiple CPU platform. If the >idea is not practical, please tell me, as I would not want to waste my time on a >big experimental program that was doomed to failure from the start! If the idea >is practical, the my next inevitable question is what is the best reference for >parallel search algorithms, and is there any single algorithm that is considered >the most efficient? The idea doesn't work. Alpha/beta is inherently sequential by design. When I use 2 threads on 2 cpus, I lose about 30% of the second cpu by doing extra work (searching nodes the pure sequential alpha/beta wouldn't search). If you do the above, you will lose about 15% of your machine, because one of the two threads will be doing 30% extra stuff... also I/O is so very slow compared to the cpu speeds, the parallelism of doing reads and searching together doesn't offer very much at all...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.