Storing n-grams in database in &lt; n number of tables -

If I was writing a piece of software that tries to guess the word how to write the user The user who has typed next using two previous words, I will create two tables.

Similarly:

  == 1-Grams table == Tokens | Next Ward | Frequency ------ + ---------- + ----------- "I" | "Likes" | 15 "I" | "Hate" | | 20 == 2-gm table == token | Next Ward | Frequency --------- + ------------ + ----------- "I like" | | "Apple" | 8 "I like" | "Tomatoes" | | 12 "I hate". "Tomatoes" | | 20 "I hate". "Apple" | 2

Using the above database, user type "I" and software, after the implantation of this example, predicts that the way users are going to type the next word "hate "If the user types" hate "then the software will predict that the next word user is" tomatoes ".

However, this implantation will require a table for each additional NGram that I choose to take into account if I decide that I predict the next 5 words, 5 or 6 preceding words I would like to take 5-6 tables, and every n-gram will have a sharp increase in space.

Would it be the best way to represent it in only one or two tables, which has no upper limit to aid N-grams?

one two column table -

  phrase, frequency

An optimization will be done to "Narail" in the phrase "is not" in the word "is not".

A second optimization will have the same as the use of hash as an MD5, CRC32 or phrase.

New Tmime

Search This Blog

Storing n-grams in database in < n number of tables -

Comments

Post a Comment