Sergey Nikolaev
September 22, 2010 ・ Sphinx
Should you switch to Sphinx real time indexes?
Regular indexes problem
The main inconvenience of regular indexes is their update speed. In order to update one you should entirely rebuild it.
For large amounts of data we usually use main+delta indexes. The main contains the most of the data, and delta — only recent changes. So to keep whole index up-to-date we should rebuild delta index every 3-5 minutes. But the larger delta grows the longer it takes to rebuild it. That’s why we recommend to flush delta into main every day or week depending on your data growth rate.
But this approach has a couple of drawbacks:
-
high average load of the system due to frequent index rebuilding;
-
fresh data will be indexed only in several minutes in worst case.
Real-time indexes
Sphinx 1.10 introduces real-time index support. The main idea of it is the ability to insert and update index records on-the-fly. RT indexes are compatible with MySQL protocol which allows us to use existing MySQL client apps for work with them using SELECT, DELETE, INSERT and REPLACE operators.
Currently there are some performance issues with real-time indexes on large data sets. But for smaller ones (say, up to 500.000 Wikipedia documents) they show comparable to regular indexes speed.
So real-time indexes performance and simplicity makes them a preferable choice for storing relatively small but frequently changing data index.
Conclusion
Real-time indexes can be used as a replacement of main+delta regular index bundle. They can reduce server workload and simplify index updating routine.
Also we can make use of mixed indexes to plug in real-time indexes to the existing app.
Here’s the example of mixed index:
index distributed { type = distributed local = plain_main_index local = real_time_increment_index }
In this example we connect to both regular and real-time indexes using one distributed index. This way migration to real-time indexes can be performed seamlessly without significant modifications of production system.
Good luck and have fun with real-time indexes.
- Sphinx