The best options for application search engines run on Java. Solr, by the Apache foundation, has emerged as a favorite among the Rails community.
Managing a second application running in a different language and virtual machine can be a headache. WebSolr, http://websolr.com/, emerged as an easy way to outsource the running of your search engine. Though it runs as an external service, it is completely transparent to your user and, generally, transparent to the developer.
Let’s look at making use of it from a Rails application.
The Sunspot gem, http://outoftime.github.com/sunspot/, is the most comprehensive Ruby interface to Solr-powered search engines.
To install Sunspot and it’s dependencies, add
sunspot_rails to your
Gemfile and run bundle.
You can optionally setup a local Solr instance with an embedded JRuby server by installing the
sunspot_solr gem. To get the latest beta version, install with this command:
Then start the server with
At the time of this writing, I had trouble getting `sunspot_solr` to run correctly.
Local Solr Install
Solr is available as a package in Homebrew for OS X (
brew install solr) or Ubuntu’s apt (
apt-get install solr).
To setup your Heroku application to make use of WebSolr, run this command from your project directory:
The bottom level WebSolr package is a $20/month add-on.
There are two options for telling your application and the Sunspot library how to find the Solr server.
Sunspot will look for and use a
WEBSOLR_URL environment variable in available.
When you use the WebSolr add-on, this is automatically managed for you.
If you want to setup the configuration information in the application, generate a config file by running:
That will create a
config/sunspot.yml where you can set the host and port.
Once you have Solr running and Sunspot setup, you need to tell it how to index your model data.
In the model, call the
searchable method and pass a block. In the block, we call methods specifying the type and name of attributes to index. For instance:
1 2 3 4 5 6 7
Then Sunspot will index each of these three fields in Solr.
Available Methods and Settings
The following indexing methods are available:
text: breaks the data into individual keywords
string: index the data as a single string.
time: datetime fields
integer: numeric fields, especially foreign keys
If you index multiple fields, like the title and the body here, then it’s likely some components are more important that others.
For instance, you might want to promote matches in the title more highly than matches in the body. You can add the
default_boost parameter, like this:
1 2 3 4
Updating the Index
By default, Sunspot will update the index whenever an object is created, saved, or destroyed.
This is easy, but in production it can slow your application down because it happens during the request/response cycle. Instead, it’d be better to push the index updating to an asynchronous worker process.
Sunspot has a built in capability to use background workers, triggered by adding calling
1 2 3 4 5 6 7
The only catch is that this relies on the Heroku default background job queue:
delayed_job. If you’re using Resque, instead, try the following code written by the author of Sunspot: https://gist.github.com/659188
You’ve setup the server and indexed the data, now you can actually run queries. Use the
search class method and pass in a block.
A basic search might look like this:
The block passed to search can be more specific, too:
1 2 3 4 5
There are many more options and techniques that can be used to refine the search results, for information on them check out the Sunspot gem API.
Once you execute a search you have access to both the matched objects and metadata about the search itself.
.results method to get back the ordered set of search results:
These are just your normal domain objects with no metadata.
If you’re interested in the metadata, use the
.hits method. The Sunspot wiki has two great examples of ways you could use the metadata along with the matched objects, adapted below.
We can use the
each_hit_with_result method to iterate through the match data and the matched objects. Call the
.score method for the numeric quality-of-match indicator, here’s how we might output it in the results:
1 2 3 4 5 6 7 8 9
Or, you could highlight the fragment of the object which matched the search:
1 2 3 4 5 6 7 8
- Heroku DevCenter on Websolr: http://devcenter.heroku.com/articles/websolr
- SunSpot quickstart: https://github.com/sunspot/sunspot/wiki/Adding-Sunspot-search-to-Rails-in-5-minutes-or-less
- Working with Sunspot Results: https://github.com/sunspot/sunspot/wiki/Working-with-search
- WebSolr Add-On Service Levels: http://addons.heroku.com/websolr