Skip to content

Managing Sphinx Real-time indexing in Ruby

The Sphinx Search Server has support for real-time indexing for quite a while now. Real-time indexing means that you do not have to trigger an index update whenever you update searchable data in your application, which makes Sphinx a viable option for apps with big searchable datasets.

The following class implements a basic RT Index interface. This particular example uses Sequel as ORM, but you can easily modify it to use ActiveRecord or whatever else you prefer.

You can hook this class up your models’ after filters, thus ensuring that our search index is always up to date. No cronjobs, no nothing.
Here is an example:

class Artist < ActiveRecord::Base
  after_create :add_to_index
  after_update :update_index
  after_delete :remove_from_index
  
  private
  # make use you include the record pk and any searchable fields
  def add_to_index
    rt_index.insert(:id => id, :name => name)
  end
  
  def update_index
    rt_index.replace(:id => id, :name => name)
  end
  
  def remove_from_index
    rt_index.delete(id)
  end
  
  def rt_index
    RtIndex.new(:artists)
  end
end

And here is the complete class:

require 'sequel'
class RtIndex
  attr_reader :index_name
  
  def initialize index_name
    @index_name = index_name.to_sym
  end
  
  def insert hash
    DB["INSERT INTO #{index_name} (?) VALUES (?)",
      Sequel.lit(hash.keys.join(',')),
      Sequel.lit(hash.values.map {|v| DB.literal(v)}.join(','))
    ].insert
  end
  
  def replace hash
    DB["REPLACE INTO #{index_name} (?) VALUES (?)",
      Sequel.lit(hash.keys.join(',')),
      Sequel.lit(hash.values.map {|v| DB.literal(v)}.join(','))
    ].replace
  end
  
  def delete id
    DB["DELETE FROM #{index_name} WHERE id = ?",id].delete
  end
  
  class << self
    # you want to amend this to load sphinx configuration from wherever you keep it.
    def config
      { "sphinx" => { "host" => "localhost", "port" => 3309 } }
    end
  end
  
  # create a shared mysql connection to sphinx
  DB = Sequel.mysql(:host => config["sphinx"]["host"], :port => config["sphinx"]["rt_port"])
end

Creating reusable ORM-specific extensions is fairly trivial. A class method call such as has_rt_index :name, :description is easy to implement and left as an exercise.

Categories: Code.

Tags: ,

Comment Feed

No Responses (yet)



Some HTML is OK

or, reply to this post via trackback.