Friday, April 30, 2010

Search Techniques – Part 2: Technology Overview

Its long time, I thought of putting down my understanding on search technologies available today. When I started my career as software professional, I heard about a very popular search engine at that time named Altavista (http://www.altavista.com/). We were developing a content management server and this search engine was being used in the application to perform content based search. I was just starting my career and wasn't aware of the such technologies and its usefulness. With the time, I learnt and used multiple search engines in various applications. Some of the tools (one in a class) with a brief summary are listed below.

1. Google Search Appliance: One thing I like about Google is that it has provided simple tools/interfaces to perform complex tasks as I have mentioned in my previous blog Simple Application Design. Google search appliance is one of such tool developed by Google. Configuring and using this tool is lot easier and it provides similar efficiency in searches is Google search website itself (http://www.google.com). At the same time, it has limited scope for customizing search algorithms and search results look and feel. If we want simple Google kind of search performed on enterprise contents, Google Search appliance could be the best option.

2. Verity/Autonomy Search: Verity now owned by Autonomy (http://http://www.autonomy.com/) is one of the best search engines I have seen. Autonomy website claims that its leading enterprise search engine. To use verity, you need to have a dedicated server and you need to create indexes on the enterprise contents in advance. Using simple search query written in verity query language performs the search. As opposed to Google search appliance, verity provides huge scope of customizing the search algorithm, managing the scope of search and flexibility to customize the search results in desired GUI.

3. Lucene/Solr: Apache Lucene (http://lucene.apache.org/) is an open source library provided by apache to perform the similar searches as Google or Verity. Apache Solr (http://lucene.apache.org/solr/) is the search product provided by apache which uses Lucene library internally and can be installed on a simple Java Web server such as Tomcat (//http://tomcat.apache.org/) or Jetty (http://jetty.codehaus.org/jetty/). The entire stack to host the search engine is possible using apache open source. Though Solr capabilities and scope of customization is limited, but being a Java based open source product, the possibilities are endless. If your search needs are not extremely complex and you care about cost vs value, Solr is the best option and you should definitely evaluate it first.

4. Endeca Search: Endeca: (http://www.endeca.com/) is a search engine designed altogether with a different intention. It uses business intelligence and guided search algorithm which leads the user to his/her desired result by narrowing down the search results in several stages. As opposed to above three search technologies, where search term has to be refined by the user to reach the desired result, in Endeca, search results are presented to the user with possible classifiers to narrow down the search results. Endeca most of the times doesn't returns empty results and helps the user to reach his desired result by providing the possible guidance through result narrowing options. This search technology is heavily used retail websites as helps customers to find their desired merchandise by providing different narrow down options.

With growing knowledge/product/inventory base every day, search technologies are becoming inevitable every day. There are lot of search engines in the market with every possible domain expected. These different search engines use different search algorithms; some mentioned in my previous blog Search Techniques-Part1: Overview. Finding the most suitable search engine against a particular need is becoming more challenging itself. I won't be surprised if there is a search engine available to search the suitable search engine itself :)

No comments:

Post a Comment