Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: High speed file access for tablebases under 32-bit Windows

Author: Robert Hyatt

Date: 07:17:41 12/24/98

Go up one level in this thread


On December 24, 1998 at 06:07:17, Roberto Waldteufel wrote:

>
>On December 24, 1998 at 01:40:45, Eugene Nalimov wrote:
>
>>The idea is interesting, but I doubt it can be used. For
>>example, I don't have FAT drive - all my drives are NTFS;
>>I also know persons who use HPFS at Windows NT. Also, there
>>are several FAT formats - e.g. FAT16 and FAT32. Possible
>>more to come when Windows 2000 will be released... I don't
>>want to compete with MS in supporting new disk formats.
>
>Aha - this is a pity. Still, how many different formats can there be? I suppose
>it might be possible to cater for some presently common FAT types, and resort to
>a default normal disc read if the program encounters a format that it did not
>recognise, which would allow for support of newer formats to be added later on,
>but it certainly makes the whole approach rather problematic. Perhaps as a
>safeguard the program could run a test at start-up time where it looks up data
>using both methods to ensure that it is reading the same data?
>
>>
>>I don't know what percentage of time is spend inside the
>>operating system on non-disk-reading tasks, but I think
>>that it's small enough. SQL Server was speed up by only
>>several percents when moving from the OS partitions to
>>the raw partitions, and I'd bet that yours speedup will
>>be even smaller.
>>
>>OS keeps parts of FAT (or NTFS disk allocation table)
>>in it's disk cache. You'll keep data on data location
>>directly in the chess program. This way you'll save several
>>hundred instructions, maybe several thousand, not more. After
>>that you'll wait disk drive (that is rotated with 4,500-
>>10,000RPM) for the data. That will take tents of thousand
>>instructions.
>>
>I had a feeling there were going to be more problems to face :-(
>However, I think that the disc waiting time might be made use of for a limited
>amount of parallelism in the search algorithm, ie while waiting for the
>tablebase data to become available from the disc, the program can perhaps be
>doing something useful like retracting the move that lead to the look-up
>position and generating/making the next move from the ply above. If we are
>talking time enough for tens of thousands of instructions, then relatively CPU
>intensive tasks might be performed while access takes place.
>


you can already do this under linux at least, and probably in windows.  In
linux, we call this "non-blocking I/O" where I can do a read when I want, but
my process isn't blocked at that point.  Later I can either ask for a signal to
be sent to my process when the I/O has been completed, or when I reach the point
where I need the data I can block there waiting on the I/O to finish.

It doesn't help much.  The fastest disk I know of has a 5ms average access
time.  the xeon/400mhz processor executes a couple of instructions every 2.5
nanoseconds, which translates into 2.5 *million* instructions every millisecond,
or (in the case of my IBM 10K drives) 12.5 *million* instructions that I could
execute while waiting on that seek/read to complete.  I don't have that many
instructions to execute, unfortunately...  Most programs do maybe 2,000
instructions or so per node.  not 2 million...  so there is really nothing to
do while waiting on the I/O...



>
>>Also, I'd never install *almost any* program that performs
>>direct access to my disk drive for *any* purpose, especially
>>when that disk drive holds gigabytes of the valuable
>>information. I know how kernel components of the operating
>>system (and especially of the server applications - e.g. of
>>NT) are tested inside the MS, and I doubt that you can test
>>yours drivers with the similar level of test coverage.
>>Unfortunately, to test such programs you have to be MS, IBM,
>>Oracle, HP, etc., or have *very large* community of volunteers,
>>as in Linux case.
>>
>Yes, if it should ever write data in the wrong place, it could be a disaster
>indeed! However, what I am talking about is only reading, *never* writing data.
>That way, if I screw up on my disc location, I retrieve nonsense-data for the
>program, which of course is bad, but nothing gets overwritten anywhere. I am
>assuming in this that I know in advance the precise places on the physical disc
>where my data is, and that *no* disc writes are performed by the program at all,
>and that no other applications that might access the disc are running
>concurrently. The point of that is to be sure that the data does not get moved
>to different places on the disc by any unforseen disc operations in between
>program start-up and the end of the game (after which no more tablebase lookups
>would be required). The program can then save the game to disc if required, but
>only after the game is over.
>
>Thanks for your comments. I don't know if I'll ever get as far as implementing
>this kind of disc access, but as you say it is an interesting idea, as long as
>it is never used for writing!
>
>Merry Christmas,
>Roberto
>



interesting from a "doing it" point of view...not from making a program faster,
because the numbers just don't quite "add up" as you see from the above...



>
>
>>Eugene
>>
>>On December 23, 1998 at 20:36:30, Roberto Waldteufel wrote:
>>
>>>Hi all,
>>>
>>>I would like to share an idea I am currently looking into concerning hard disc
>>>access. For many chess programs that use tablebases hard disc access can be a
>>>serious problem when there are many hits on the tablebases deep in the search.
>>>If the disc accesses could be sped up, the program would benefit very much in
>>>these types of positions.
>>>
>>>The idea is this: bypass everything and go directly to the data via hardware. Of
>>>course this is technically much more complicated than simply opening the file
>>>and retrieving the required byte in the normal way, but I suspect the speed
>>>improvement may be substantial if it can be done, which I think it can. The
>>>fastest way to access the hard disc is via the hardware ports. We need to know
>>>the sector and the sector offset for the data in the file, but this information
>>>can be read from the FAT. By using ports, we load just the data we want from
>>>just the sector we want (not a cluster of sectors), and cut out all the
>>>operating system overhead. The point is that we know the size of the file, we
>>>know it is not going to be altered while the program runs, so all the FAT
>>>information should be readable at start-up time, and then instead of calculating
>>>a simple byte location in the file for a given position, we calculate the actual
>>>sector/offset on disc and read the data via the ports directly from disc.
>>>
>>>There is a problem in that Windows disallows direct hardware access, and goes to
>>>great lengths to conceal the necessary API call (CallVxD0) in an effort to
>>>prevent programmers from doing precisely the sort of thing I am proposing. By
>>>writing the port access code in a virtual device driver, we can obtain the
>>>required Ring0 privelege to read/write not only ports, but any area of memory
>>>(protected or not). Micro$oft removed the vxdcall function from the export list
>>>for kernel32.exe when the function's potential uses were publicised, but it is
>>>still present - it just takes a bit of hacking to get at it, but there is a
>>>"back door": it is still there and very useful indeed! Using this API call, I
>>>have now a VxD in place that can, among other things, read and write the
>>>hardware ports under 32-bit windows I think, if I can make this work with disc
>>>accessing, the potential could extend beyond chess programming to other disc
>>>byte-data intensive computer operations. If anyone has already tried anything
>>>along these lines or has any comments I would be intersted to hear from you.
>>>
>>>Merry Christmas to all,
>>>Roberto



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.