The Miracle of Elasticsearch

Elasticsearch is probably the premier full-text search engine that is available today (this side of Google). It is open source software, free to download and install. It is maintained very actively by the Elastic company, which earns revenue from various support and consulting programs, utilities, as well as hosting. Elasticsearch is the centerpiece of a “stack” of products: the ELK stack consists of Elasticsearch, Logstash, and Kibana. However, Elasticsearch has a great deal to offer all on its own.
 
The reader is invited to learn more about Elasticsearch at the Elastic website.
 
However, in this article we want to highlight a few of Elasticsearch’s remarkable qualities based on our experience at Pabo Beach.
 
Amazingly Flexible and Powerful Full-Text Search
 
Elasticsearch offers highly configurable search capabilities, supporting many languages and character sets. Search is highly customizable, with configurable synonyms, hypernyms, fuzzy matches, stop words, and word stems/roots. What’s more it offers customizable search ranking boosts. These features make it especially valuable in e-commerce, where particular matches may be critical to the business. (We have all experienced the frustration when a search doesn’t bring up a product we are seeking, because the match logic isn’t up to snuff.) Elasticsearch is in fact mind-bendingly configurable. Having deployed it for an international e-commerce platform, we can attest to the value of being able to provide custom search configuration for each e-commerce client.
 
An Amazingly Powerful Query Engine
 
Elasticsearch is not only a search vehicle, but it also includes extremely powerful query tools. These query tools are comparable to, though not identical to, SQL. Boolean combinations, numeric and date ranges, are among the query tools. What’s more, the data aggregation features provide numerous ways to analyze, bucket, group, and summarize data. All the query capabilities can be combined with the sophisticated, configurable full-text search – which could not be done in SQL.

At Pabo Beach we have set up Elasticsearch in parallel with SQL Server in two different applications to handle the most voluminous, volatile, and heavily queried data. It is especially powerful in situations where the query parameters may vary considerably – cases where it would be very difficult to set up and maintain all the indices necessary in a relational database to achieve any kind of performance at all.
 
While it may seem that maintaining two systems in parallel like this causes a lot of overhead, the performance gain is highly worth it. Comparing similarly provisioned SQL Server and Elasticsearch systems, with critical data involving many millions of records, a query in SQL Server might take 10 minutes or longer to return, whereas the Elasticsearch query returns in one second!
 
Elasticsearch is not relational, and thus requires careful design, based on its own unique features, to get the most out of it. It can replace a relational database, but that is often not a good idea. In a complex application, a relational database may contain dozens or even hundreds of tables that essentially drive the application. Elasticsearch would only be used to parallel a handful of tables where query performance is essential.
 
Rich Data Combined with Conventional Data Fields
 
In Elasticsearch, an Index is comparable to a Table in a relational database, and a Document is comparable to a Record. The Fields that comprise a document are fairly similar to those that go into a relational Record. They can be conventional datatypes, like int, double, date/time, or text/string. However, the Elasticsearch fields are structurally more flexible. Because text/string fields can be set up for full-text search, they support “rich text” in ways that are not easily duplicated in a relational context.

 

Amazingly Fast Performance
 
As we noted in the last section, Elasticsearch if built for speed. We are not talking two or three times faster, but typically two or three orders of magnitude faster than a relational database in handling comparable queries. We at Pabo Beach have validated these performance numbers repeatedly.
 
Amazing Scalability
 
Elasticsearch was designed from the beginning for scalability. Scalability is built into the product. It can run perfectly well on a laptop (running Windows or Linux). It can be deployed on multiple servers, or in the Cloud. A single Elasticsearch “index” can be deployed across hundreds of nodes (servers or instances) to handle huge quantities of data. One company, (Synthesio) has reported loading over 136 Terabytes of data. They currently run complex queries fetching up to 50 million rich documents out of tens of billions in the blink of an eye.

 

Amortizing the Costs of Software and Licensing Fees

One of questions I am sometimes asked as a Tech Advisor is “How do we amortize software development costs?”  Or “How are we going to recoup the costs of software development?”  The answer is, if the business is doing well, software development costs are typically paid off quickly.  As a prime example, you only have to look at the stock valuations of the big software companies in relation to size and cost of their workforces.

 

The big software companies are experts at developing software.  Companies that don’t specialize in software can do very well as long as they hire the right consultants and employees, or put together excellent teams of developers.

 

In simple terms, software developed “in-house” or by hired consultants is owned by the company.  This means that the total cost of development is a one-time expense.  Obviously, there are issues of hardware costs, hosting or cloud costs, operations, maintenance, upgrades, new features, and so on, which must be included in the budget.  Nevertheless, once developed, the software can simply be a part of operations.

 

It is often shocking to learn about the advanced age of major software systems in huge organizations.  The core code may be untouched for many years.  The hardware may be old as well, like old IBM systems that run reliably for years.  A major reason for the Y2K crisis was that the original developers couldn’t conceive that the software would last so long.  As the saying goes, “If it ain’t broke, don’t fix it.”  Another reason for software longevity is actually the lack of longevity of the original developers who really understand the software’s functionality, whether through retirement, changing jobs, or passing away.

 

Our contention, therefore, is that, as long as the business thrives, the software costs will be easily absorbed.  In fact, software can offer one of the most profound returns on investment that can be had.

 

There is a cautionary note we want to sound here.  Often software is sold based on a recurring licensing fee.  Depending on the size of the business, these fees may be relatively modest, easily absorbed in the costs of doing business – or they can become onerous, a drag on profitability, even viability.

 

Recently I learned about a mid-sized sales and distribution company whose operations were tied to a legacy ERP system.  For normal business to take place, they needed about 150 seat licenses, at a cost of $2,000 per seat annually.  Since they began with the system, technology has evolved.  Today there would be no need for 150 “seats.”  The operations performed could potentially be combined into a more modern architecture, such as service bus, which could potentially handle many workers, perhaps all 150, at a single connection point.  However, the ERP company would not willingly give up that $300,000 annually to provide such an interface.

 

In my experience, sometimes companies opt to purchase a platform with annual licensing fees, in order not to pay for custom development up front.  There are some advantages to this approach:

 

  • The platform will have “built-in” many of the capabilities that are planned for the system.
  • The initial cost will be lower with the platform.
  • Customization of the platform will be quick, so the system will come online more quickly.

The downside is of course that the licensing fees are ongoing.  The are other issues as well, however.  I recently spoke with the owner of a small business, who, years before, had opted for a platform solution, against my advice at the time.  Currently they were up against a wall.  The platform was software-as-a-service.  Their version was going to be discontinued shortly, forcing them to upgrade.  Unfortunately, all their customizations, built on the old platform, would be gone with the upgrade.  They were in a quandary.  My only advice was that they should try to engage the original developer to replicate the functionality of the earlier customizations on the latest platform version.  They were very concerned about the time, disruption, and costs involved.

 

Another issue for this company was their plans for the system.  Although this was a small business, the owners had excellent industry contacts.  Their initial vision was to build a system that they would own, and which could be marketed to other companies, putting them into the software business.  Since nothing in their vision had been done before, and it offered vast efficiency improvements over the way things were done in the industry, this plan did not seem totally farfetched.   What’s more, the cost of building custom could have been recouped in a few months, based on a modest assessment of the productivity increase.  However, once they decided to go with a platform with its own licensing fees, versions, and upgrades, they didn’t end up with a system that had any value outside their own office.

 

There is no easy answer when evaluating a system that involves licensing fees.  Consider, for example, that applications running in the cloud on AWS or Azure do involve fees.  However, the savings in infrastructure and the staff to maintain software applications often mean that these fees are a bargain.  Also, such fees are usage-scaled, so that the client pays for actual processing that goes on, rather than paying per seat – which, in these times, seems fairer.  Also, you can deploy in the cloud, but still own your application.  Should a company wish, it could move its cloud application to its own server farm.

Licensing fees are sometimes enforced with measures that may be viewed draconian.  Oracle has such a reputation, whether deserved or not.  At very least, for companies to follow the licensing requirements exactly may be quite burdensome.

 

Companies may need to buy software systems of platforms with rather heavy licensing fees.  In some cases there may be better alternatives.  Licensing fees are not always “bad,” but there may be some drawbacks that aren’t obvious to prospective licensees.

 

Copyright © 2018 Patrick D. Russell