Computer Chess Club Archives


Search

Terms

Messages

Subject: Slightly OT programming question : searching a word/an expression

Author: Andrei Fortuna

Date: 14:10:11 06/11/02


First some background info : I have some ideas on rewriting my search engine
some time later this year - I realized that all I need to do I can do with php
and mysql and I can do it personalized for each user this way - including which
posts to ignore, moving posts to different folders, color and display schemes
... basically what I had in my offline browser but much much better and improved
and easier to code due to the combination php+mysql. On the other hand the only
thing I need to write in C/C++ is the part that keeps the word lists, just a
program that has internally those lists and has as input a phrase/search
expression and returns a list of article id for articles containing this.

So far so good but my desire is to make the search as complex as possible. Until
now my word search has been like in +abc* -xyz* i.e. I couldn't afford to have
1) phrase searches as in "best move" 2) expressions like "abc*xx" or "abc?x" 3)
expressions like "*abc?xx" (words starting with anything) 4) case insensitive
searches (I kept the word list as case sensitive so a case insensitive search I
thought might be expensive - the alternative was to keep all case insensitive
but then case sensitive searches would have not worked as expected)

Now I would like to have all those 4 cases covered. 1) would mean to store for a
word not only the articles it appears in but also the position(s) in which it
appears, so a phrase with many words would have to have those words with
consecutive indices. For 4) if I store a checksum for each word I can identify
quickly a list of words and do a case insensitive strcmp for them .

The part I'm having trouble is points 2) and 3) -> having * inside the word
(with letters following) or especially as the first character in the search
word. I confess I searched last year the net for algorithms for it and came with
not very satisfying results ... so I'm asking all the bright minds in here for
pointers, algorithms, ideas, buzz-words that I should use for an internet search
... I'm 100% certain this is a well known problem but I have no direction to
search further and I would really like to incorporate a more complex search in
my future engine ... please help a programmer fellow in distress :)))

To detaliate more - until now I used two programs, one a server and one a
client, the server was the one always loaded and listening to a port, the client
got the request from the web page and contacted the server on that port and made
the query ... now I want to eliminate the server alltogether, the only reason I
had it was that if I did it all in the client loading indexes into memory would
take too long time ... but I plan to keep indexes for words starting with 'a' in
a file, for words starting with 'b' in another file etc .. so it could be done
by a client alone.





This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.