Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Slightly OT programming question : searching a word/an expression

Author: pavel

Date: 15:10:52 06/11/02

Go up one level in this thread


On June 11, 2002 at 17:10:11, Andrei Fortuna wrote:

>First some background info : I have some ideas on rewriting my search engine
>some time later this year - I realized that all I need to do I can do with php
>and mysql and I can do it personalized for each user this way - including which
>posts to ignore, moving posts to different folders, color and display schemes
>... basically what I had in my offline browser but much much better and improved
>and easier to code due to the combination php+mysql. On the other hand the only
>thing I need to write in C/C++ is the part that keeps the word lists, just a
>program that has internally those lists and has as input a phrase/search
>expression and returns a list of article id for articles containing this.
>
>So far so good but my desire is to make the search as complex as possible. Until
>now my word search has been like in +abc* -xyz* i.e. I couldn't afford to have
>1) phrase searches as in "best move" 2) expressions like "abc*xx" or "abc?x" 3)
>expressions like "*abc?xx" (words starting with anything) 4) case insensitive
>searches (I kept the word list as case sensitive so a case insensitive search I
>thought might be expensive - the alternative was to keep all case insensitive
>but then case sensitive searches would have not worked as expected)
>
>Now I would like to have all those 4 cases covered. 1) would mean to store for a
>word not only the articles it appears in but also the position(s) in which it
>appears, so a phrase with many words would have to have those words with
>consecutive indices. For 4) if I store a checksum for each word I can identify
>quickly a list of words and do a case insensitive strcmp for them .
>
>The part I'm having trouble is points 2) and 3) -> having * inside the word
>(with letters following) or especially as the first character in the search
>word. I confess I searched last year the net for algorithms for it and came with
>not very satisfying results ... so I'm asking all the bright minds in here for
>pointers, algorithms, ideas, buzz-words that I should use for an internet search
>... I'm 100% certain this is a well known problem but I have no direction to
>search further and I would really like to incorporate a more complex search in
>my future engine ... please help a programmer fellow in distress :)))
>
>To detaliate more - until now I used two programs, one a server and one a
>client, the server was the one always loaded and listening to a port, the client
>got the request from the web page and contacted the server on that port and made
>the query ... now I want to eliminate the server alltogether, the only reason I
>had it was that if I did it all in the client loading indexes into memory would
>take too long time ... but I plan to keep indexes for words starting with 'a' in
>a file, for words starting with 'b' in another file etc .. so it could be done
>by a client alone.


And hopefully we will have one for CTF :)

I couldnt get it to work for CTF and the server would occasionally hog the cpu.

cheers,
pavs



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.