Author: pavel
Date: 15:10:52 06/11/02
Go up one level in this thread
On June 11, 2002 at 17:10:11, Andrei Fortuna wrote: >First some background info : I have some ideas on rewriting my search engine >some time later this year - I realized that all I need to do I can do with php >and mysql and I can do it personalized for each user this way - including which >posts to ignore, moving posts to different folders, color and display schemes >... basically what I had in my offline browser but much much better and improved >and easier to code due to the combination php+mysql. On the other hand the only >thing I need to write in C/C++ is the part that keeps the word lists, just a >program that has internally those lists and has as input a phrase/search >expression and returns a list of article id for articles containing this. > >So far so good but my desire is to make the search as complex as possible. Until >now my word search has been like in +abc* -xyz* i.e. I couldn't afford to have >1) phrase searches as in "best move" 2) expressions like "abc*xx" or "abc?x" 3) >expressions like "*abc?xx" (words starting with anything) 4) case insensitive >searches (I kept the word list as case sensitive so a case insensitive search I >thought might be expensive - the alternative was to keep all case insensitive >but then case sensitive searches would have not worked as expected) > >Now I would like to have all those 4 cases covered. 1) would mean to store for a >word not only the articles it appears in but also the position(s) in which it >appears, so a phrase with many words would have to have those words with >consecutive indices. For 4) if I store a checksum for each word I can identify >quickly a list of words and do a case insensitive strcmp for them . > >The part I'm having trouble is points 2) and 3) -> having * inside the word >(with letters following) or especially as the first character in the search >word. I confess I searched last year the net for algorithms for it and came with >not very satisfying results ... so I'm asking all the bright minds in here for >pointers, algorithms, ideas, buzz-words that I should use for an internet search >... I'm 100% certain this is a well known problem but I have no direction to >search further and I would really like to incorporate a more complex search in >my future engine ... please help a programmer fellow in distress :))) > >To detaliate more - until now I used two programs, one a server and one a >client, the server was the one always loaded and listening to a port, the client >got the request from the web page and contacted the server on that port and made >the query ... now I want to eliminate the server alltogether, the only reason I >had it was that if I did it all in the client loading indexes into memory would >take too long time ... but I plan to keep indexes for words starting with 'a' in >a file, for words starting with 'b' in another file etc .. so it could be done >by a client alone. And hopefully we will have one for CTF :) I couldnt get it to work for CTF and the server would occasionally hog the cpu. cheers, pavs
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.