Small Topics
Implementing Search with WebSolr
The best options for application search engines run on Java. Solr, by the Apache foundation, has emerged as a favorite among the Rails community.
Managing a second application running in a different language and virtual machine can be a headache. WebSolr, http://websolr.com/, emerged as an easy way to outsource the running of your search engine. Though it runs as an external service, it is completely transparent to your user and, generally, transparent to the developer.
Let’s look at making use of it from a Rails application.
Sunspot
The Sunspot gem, http://outoftime.github.com/sunspot/, is the most comprehensive Ruby interface to Solr-powered search engines.
Install
To install Sunspot and it’s dependencies, add sunspot_rails
to your Gemfile
and run bundle.
sunspot_solr
You can optionally setup a local Solr instance with an embedded JRuby server by installing the sunspot_solr
gem. To get the latest beta version, install with this command:
Terminal
$
|
|
Then start the server with sunspot-solr start
.
At the time of this writing, I had trouble getting `sunspot_solr` to run correctly.
Local Solr Install
Solr is available as a package in Homebrew for OS X (brew install solr
) or Ubuntu’s apt (apt-get install solr
).
Heroku Setup
To setup your Heroku application to make use of WebSolr, run this command from your project directory:
Terminal
$
|
|
The bottom level WebSolr package is a $20/month add-on.
Configuration
There are two options for telling your application and the Sunspot library how to find the Solr server.
Environment Variable
Sunspot will look for and use a WEBSOLR_URL
environment variable in available.
When you use the WebSolr add-on, this is automatically managed for you.
Configuration File
If you want to setup the configuration information in the application, generate a config file by running:
Terminal
$
|
|
That will create a config/sunspot.yml
where you can set the host and port.
Indexing
Once you have Solr running and Sunspot setup, you need to tell it how to index your model data.
In the model, call the searchable
method and pass a block. In the block, we call methods specifying the type and name of attributes to index. For instance:
1 2 3 4 5 6 7 |
|
Then Sunspot will index each of these three fields in Solr.
Available Methods and Settings
The following indexing methods are available:
text
: breaks the data into individual keywordsstring
: index the data as a single string.time
: datetime fieldsinteger
: numeric fields, especially foreign keys
Setting Priorities
If you index multiple fields, like the title and the body here, then it’s likely some components are more important that others.
For instance, you might want to promote matches in the title more highly than matches in the body. You can add the default_boost
parameter, like this:
1 2 3 4 |
|
Updating the Index
By default, Sunspot will update the index whenever an object is created, saved, or destroyed.
This is easy, but in production it can slow your application down because it happens during the request/response cycle. Instead, it’d be better to push the index updating to an asynchronous worker process.
Sunspot has a built in capability to use background workers, triggered by adding calling handle_asynchronously :solr_index
:
1 2 3 4 5 6 7 |
|
The only catch is that this relies on the Heroku default background job queue: delayed_job
. If you’re using Resque, instead, try the following code written by the author of Sunspot: https://gist.github.com/659188
Searching
You’ve setup the server and indexed the data, now you can actually run queries. Use the search
class method and pass in a block.
A basic search might look like this:
1
|
|
The block passed to search can be more specific, too:
1 2 3 4 5 |
|
There are many more options and techniques that can be used to refine the search results, for information on them check out the Sunspot gem API.
Search Results
Once you execute a search you have access to both the matched objects and metadata about the search itself.
.results
Call the .results
method to get back the ordered set of search results:
1 2 |
|
These are just your normal domain objects with no metadata.
.hits
If you’re interested in the metadata, use the .hits
method. The Sunspot wiki has two great examples of ways you could use the metadata along with the matched objects, adapted below.
We can use the each_hit_with_result
method to iterate through the match data and the matched objects. Call the .score
method for the numeric quality-of-match indicator, here’s how we might output it in the results:
1 2 3 4 5 6 7 8 9 |
|
Or, you could highlight the fragment of the object which matched the search:
1 2 3 4 5 6 7 8 |
|
References
- Heroku DevCenter on Websolr: http://devcenter.heroku.com/articles/websolr
- SunSpot quickstart: https://github.com/sunspot/sunspot/wiki/Adding-Sunspot-search-to-Rails-in-5-minutes-or-less
- Working with Sunspot Results: https://github.com/sunspot/sunspot/wiki/Working-with-search
- WebSolr Add-On Service Levels: http://addons.heroku.com/websolr