For a project at work we need some search engine like functionality. We’ve tried to do it ourselves, but usually our implementations were too slow, too memory intensive or too much work to maintain. I was recently recommend by one of my coworkers to have a look at the Apache Lucene project, so I’ve spent the last couple of nights studying it and doing a test implementation. Coincidentially, Lucene released the 2.0.0 version in late May. 🙂
First impressions are quite impressive – Lucene is a very easy to use search engine and it’s quite fast as well. It is possible to implement a fully working search engine with only a few lines of code. To do this you need two parts – an indexer and a searcher.
The indexer can be implemented with the following lines of java code.
IndexWriter indexWriter = new IndexWriter("/tmp/index", new StandardAnalyzer(), true);
Document document = new Document();
document.add(new Field("article", "This is a sample article to test the Apache Lucene search engine", Field.Store.YES, Field.Index.TOKENIZED));
indexWriter.addDocument(document);
indexWriter.close();
These lines of code creates the index for the Lucene’s searcher. Of course the index is rather small with just one document, but it’s easy to add more fields to a document and more documents to the index.
The searcher can also be implemented with a few lines of code.
IndexReader indexReader = IndexReader.open("/tmp/index");
Searcher searcher = new IndexSearcher(indexReader);
QueryParser parser = new QueryParser("", new StandardAnalyzer());
Hits hits = searcher.search(parser.parse("sample"));
System.out.println("First hit: score " + hits.score(1) + " - " + hits.doc(1).get("article"));
indexReader.close();
This code creates a Searcher that will search the index for the search word “sample” and print the first hit to System.out. This is really all that is required to build a basic search engine. You will of course need all the glue code to make it user friendly and customize it to your needs.
All classes are from the Lucene project, so all that is required is the lucene-core-2.0.0.jar file somewhere in your classpath. Oh.. yes.. you need to catch a few exceptions as well, so the above code won’t run directly. 😉