I am trying to develop a complex text search engine I have thousands of text pages from many books, I have to find pages that have specified complex logical criteria, in these criteria, any of the following collections can be included:
A: The full word.
B: The roots of the word (all words with only a few major letters i.e.; P>
C: Word templates (Some languages are full of templates in some of the languages, such as nonactive, past / Present verb ...).
D: Logical connectivity: and / or / XOR / NOT / IF / IFF and bracket state preferences.
Now, what pages are full Keep the text in the database (not indexed) and SQL and regular expressions?
or Or the word / root / template-page-place would be better for indexing tuples, therefore, we can promote search for individual words / roots / templates, however, it becomes difficult because we have a logical connection I have thought of taking the following steps in such cases:
1: For every individual word / root / templates in the specified query Individual search.
2: Based on priority, we merge, for example, if we are searching for "he and (or or was)" then:
1: We will search for "he", "is" and "separate" and at the time of depedng timing on logical connectivity (from step 1) Will receive the results list for.
2: Match the list of "is" and using the merge function "-MERGE.
3: Merge the merged function from the merged result list and merge it using "merge".
The result of step 3 is returned as the result of the specified query.
What do you think of gurus? Which is fast?
There are a lot of things I strongly recommend off-the-shelf solutions for this kind of problem That you use one of them rather than developing yourself.
You do not say that which database solution you are using if it is a Microsoft SQL Server, then you can use its features if it is MySQL, take a look at it. That would be the same functionality of Oracle, DB2 and any other major DBMS.
Alternatively, take a look at Apache, this will allow you to index the documents without the use of DBMS.
Comments
Post a Comment