Author: Roberto Waldteufel
Date: 03:25:26 12/26/98
Go up one level in this thread
On December 24, 1998 at 19:17:42, Dave Gomboc wrote: >On December 24, 1998 at 10:17:41, Robert Hyatt wrote: > >>On December 24, 1998 at 06:07:17, Roberto Waldteufel wrote: >> >>> >>>On December 24, 1998 at 01:40:45, Eugene Nalimov wrote: >>> >>>>The idea is interesting, but I doubt it can be used. For >>>>example, I don't have FAT drive - all my drives are NTFS; >>>>I also know persons who use HPFS at Windows NT. Also, there >>>>are several FAT formats - e.g. FAT16 and FAT32. Possible >>>>more to come when Windows 2000 will be released... I don't >>>>want to compete with MS in supporting new disk formats. >>> >>>Aha - this is a pity. Still, how many different formats can there be? I suppose >>>it might be possible to cater for some presently common FAT types, and resort to >>>a default normal disc read if the program encounters a format that it did not >>>recognise, which would allow for support of newer formats to be added later on, >>>but it certainly makes the whole approach rather problematic. Perhaps as a >>>safeguard the program could run a test at start-up time where it looks up data >>>using both methods to ensure that it is reading the same data? >>> >>>> >>>>I don't know what percentage of time is spend inside the >>>>operating system on non-disk-reading tasks, but I think >>>>that it's small enough. SQL Server was speed up by only >>>>several percents when moving from the OS partitions to >>>>the raw partitions, and I'd bet that yours speedup will >>>>be even smaller. >>>> >>>>OS keeps parts of FAT (or NTFS disk allocation table) >>>>in it's disk cache. You'll keep data on data location >>>>directly in the chess program. This way you'll save several >>>>hundred instructions, maybe several thousand, not more. After >>>>that you'll wait disk drive (that is rotated with 4,500- >>>>10,000RPM) for the data. That will take tents of thousand >>>>instructions. >>>> >>>I had a feeling there were going to be more problems to face :-( >>>However, I think that the disc waiting time might be made use of for a limited >>>amount of parallelism in the search algorithm, ie while waiting for the >>>tablebase data to become available from the disc, the program can perhaps be >>>doing something useful like retracting the move that lead to the look-up >>>position and generating/making the next move from the ply above. If we are >>>talking time enough for tens of thousands of instructions, then relatively CPU >>>intensive tasks might be performed while access takes place. >>> >> >> >>you can already do this under linux at least, and probably in windows. In >>linux, we call this "non-blocking I/O" where I can do a read when I want, but >>my process isn't blocked at that point. Later I can either ask for a signal to >>be sent to my process when the I/O has been completed, or when I reach the point >>where I need the data I can block there waiting on the I/O to finish. >> >>It doesn't help much. The fastest disk I know of has a 5ms average access >>time. the xeon/400mhz processor executes a couple of instructions every 2.5 >>nanoseconds, which translates into 2.5 *million* instructions every millisecond, >>or (in the case of my IBM 10K drives) 12.5 *million* instructions that I could >>execute while waiting on that seek/read to complete. I don't have that many >>instructions to execute, unfortunately... Most programs do maybe 2,000 >>instructions or so per node. not 2 million... so there is really nothing to >>do while waiting on the I/O... > >Windows NT supports OVERLAPPED_IO (non-blocking), but Windows 95 and 98 do not. > >Dave Gomboc I can't help myself from licking my chops at the thought of all those CPU instructions. Suppose we use threads under Windows to simulate multiple CPU's on a single CPU computer. I guess that entails some non-trivial overhead, but on the plus side, when we hit a tablebase node, assuming we have non-blocked disk access capability (ie we either have NT or else have a VxD to provide such a facility under Win95 or Win98), we could use the Windows Sleep API call to simply give a time slice back to Windows while the physical disk access actually takes place. This time slice could then be used by the other active threads, thereby gaining a large number of otherwise wasted CPU instructions. Now the question is whether the saved time at tablebase nodes would be enough to pay for any inherant overhead in the simulated parallel search. Of course in positions that have no tablebase hits at all in their trees, we gain nothing at all to compensate for the overhead, but in an iterative deepening framework, we could count the number of tablebase hits on each iteration, and only call the simulated parallel algorithm if a previous iteration had more than some threshold of tablebae hits. I have absolutely no idea how great the overhead of a simulated parallel search would be. Has anyone ever experimented with using threads in this way? I have a dim recollection of reading something here several months ago about using threads to simulate a parallel search in order to develop and debug a parallel search algorithm on a single CPU machine, but I think the idea was just for development, with the final version intended for a multiple CPU platform. If the idea is not practical, please tell me, as I would not want to waste my time on a big experimental program that was doomed to failure from the start! If the idea is practical, the my next inevitable question is what is the best reference for parallel search algorithms, and is there any single algorithm that is considered the most efficient?
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.