The Miracle of Elasticsearch
Elasticsearch is probably the premier full-text search engine that is available today (this side of Google). It is open source software, free to download and install. It is maintained very actively by the Elastic company, which earns revenue from various support and consulting programs, utilities, as well as hosting. Elasticsearch is the centerpiece of a “stack” of products: the ELK stack consists of Elasticsearch, Logstash, and Kibana. However, Elasticsearch has a great deal to offer all on its own.
The reader is invited to learn more about Elasticsearch at the Elastic website.
However, in this article we want to highlight a few of Elasticsearch’s remarkable qualities based on our experience at Pabo Beach.
Amazingly Flexible and Powerful Full-Text Search
Elasticsearch offers highly configurable search capabilities, supporting many languages and character sets. Search is highly customizable, with configurable synonyms, hypernyms, fuzzy matches, stop words, and word stems/roots. What’s more it offers customizable search ranking boosts. These features make it especially valuable in e-commerce, where particular matches may be critical to the business. (We have all experienced the frustration when a search doesn’t bring up a product we are seeking, because the match logic isn’t up to snuff.) Elasticsearch is in fact mind-bendingly configurable. Having deployed it for an international e-commerce platform, we can attest to the value of being able to provide custom search configuration for each e-commerce client.
An Amazingly Powerful Query Engine
Elasticsearch is not only a search vehicle, but it also includes extremely powerful query tools. These query tools are comparable to, though not identical to, SQL. Boolean combinations, numeric and date ranges, are among the query tools. What’s more, the data aggregation features provide numerous ways to analyze, bucket, group, and summarize data. All the query capabilities can be combined with the sophisticated, configurable full-text search – which could not be done in SQL.
At Pabo Beach we have set up Elasticsearch in parallel with SQL Server in two different applications to handle the most voluminous, volatile, and heavily queried data. It is especially powerful in situations where the query parameters may vary considerably – cases where it would be very difficult to set up and maintain all the indices necessary in a relational database to achieve any kind of performance at all.
While it may seem that maintaining two systems in parallel like this causes a lot of overhead, the performance gain is highly worth it. Comparing similarly provisioned SQL Server and Elasticsearch systems, with critical data involving many millions of records, a query in SQL Server might take 10 minutes or longer to return, whereas the Elasticsearch query returns in one second!
Elasticsearch is not relational, and thus requires careful design, based on its own unique features, to get the most out of it. It can replace a relational database, but that is often not a good idea. In a complex application, a relational database may contain dozens or even hundreds of tables that essentially drive the application. Elasticsearch would only be used to parallel a handful of tables where query performance is essential.
Rich Data Combined with Conventional Data Fields
In Elasticsearch, an Index is comparable to a Table in a relational database, and a Document is comparable to a Record. The Fields that comprise a document are fairly similar to those that go into a relational Record. They can be conventional datatypes, like int, double, date/time, or text/string. However, the Elasticsearch fields are structurally more flexible. Because text/string fields can be set up for full-text search, they support “rich text” in ways that are not easily duplicated in a relational context.
Amazingly Fast Performance
As we noted in the last section, Elasticsearch if built for speed. We are not talking two or three times faster, but typically two or three orders of magnitude faster than a relational database in handling comparable queries. We at Pabo Beach have validated these performance numbers repeatedly.
Amazing Scalability
Elasticsearch was designed from the beginning for scalability. Scalability is built into the product. It can run perfectly well on a laptop (running Windows or Linux). It can be deployed on multiple servers, or in the Cloud. A single Elasticsearch “index” can be deployed across hundreds of nodes (servers or instances) to handle huge quantities of data. One company, (Synthesio) has reported loading over 136 Terabytes of data. They currently run complex queries fetching up to 50 million rich documents out of tens of billions in the blink of an eye.