This is my try at text search described here
There is already redis running in my app as the back end for Rails Cache.The new requirement was the ability to search posts by user.
I wanted to make better use of the redis server that is already running (and is under utilized by the Rails cache).
We first build index of all the content we want to search.That is , like the index you find at the back of a book , all the page numbers where a term can be found.In my case , the page numbers are the id of the post in the database.
My index would be :
… and so on
Redis Sets can be used for this purpose , we need to use the following from redis :
- To add an item into a set
- To retrieve an item that is in the intersection of set1 and set2
Instead of using the exact words as the keys of index , we use their metaphones , to qoute from the original post that inspired to do this :
…Enter phonetic algorithms, a blessing to the linguistically challenged.
Phonetic algorithms are actually pretty simple. They are just a way of taking a word, dropping out some letters (according to a set of rules), and providing you with a rough representation of how that word sounds. Common phonetic algorithms are Soundex and Metaphone. The former is pretty dated now, but we understand the latter to be more up-to-scratch (any to play nicely with other Western languages)…
Looking for a Metaphone implementation in ruby , I come across metaphone.rb
I planned to build the index every thirty minutes for now .Depending on the use of search and interval new posts are created , we can fine tune that.If there were frequent searches and less frequent new posts , we could update the search index everytime a new post is made.
But for now , we will have a rake task that will run once in every thirty minutes,
We will have an index build by now , time to query the index :
Now we can perform the search as
which would give the result ‘2’.