Recently, I've worked with a client on enhancing search throughout their site. Simple queries in the system could take a couple seconds or more -- searching through fewer than 4000 items! This obviously was not going to work. Our solution was to bring their search categories together into Elasticsearch with the goal of both speeding up search results as well as enhancing the relevance of search. With the move to Elasticsearch, we are able to search through more than a million records in a fraction of a second.
The search queries found throughout the site were complex, and we were planning to make them even more complex to achieve higher quality results. Because of this, we decided that it would be important to focus on building automated tests to prevent regressions. This means after building one feature into our search, we weren't going to break it later on when we started tweaking a new feature. Or at least, if we did, our automated tests would alert us before we pushed the change live for the site’s 100,000+ monthly visitors.
The first trouble we found when querying Elasticsearch for this project was the complexity of building the queries. Queries are built as JSON, with results pulled directly from the Elasticsearch instance through Ajax. There is no middle layer to hide the complex Elasticsearch queries that we were to build. This meant the Javascript code base would be building the complex query through something like Elasticsearch.js or by hand building the json query. This solution was feasible, but it led to overly complex Javascript and coupled the search query implementation too strongly with the Javascript site; which ultimately conflicts with the separation of concerns.
Suppose, for example, we were searching for the most recent documents in an index, filtered by states and countries selected by the user. The Javascript to build this sort of query would look something like this.
var data = {
"query": {
"bool": {
"must": [],
"should": [],
"filter": []
}
}
};
if (states.length > 0) {
data.query.bool.filter.push({
"terms": {"state": states}
});
}
if (countries.length > 0) {
data.query.bool.filter.push({
"terms": {"country": countries}
});
}
// Now we make the request.
$http({
method: 'POST',
url: 'localhost:9200/your_index/your_type/_search',
data: data
}).then(function successCallback(response) {
console.log(response)
});
Notice the deeply nested attribute access data.query.bool.filter.push({}). Because of the nature of Elasticsearch's JSON query structure, these nested levels can get deep fairly quickly.
Luckily, Elasticsearch gives us the option to build search queries as a mustache template. This allows Elasticsearch developers to focus on producing the queries which are then stored as mustache templates directly in the Elasticsearch instance, while Javascript developers can remain focused on building out the UI.
The above example could be simplified to building out just the query's parameters.
var data = {
"id": "your_search_template_name",
"params" {
"states": states,
"countries": countries
}
};
$http({
method: 'POST',
url: 'localhost:9200/your_index/your_type/_search/template',
data: data
}).then(function successCallback(response) {
console.log(response)
});
The below search template will have already been uploaded to Elasticsearch with the id your_search_template_name which handles the above logic much more elegantly. Notice Elasticsearch's version of Mustache adds toJson which simply takes the passed data object and automatically converts it to valid JSON, which otherwise would be impossible with Mustache.
{
"query": {
"bool": {
"must": [],
"should": [],
"filter": [
{
"terms": {
"state": {{#toJson}}states{{/toJson}}
}
},
{
"terms": {
"country": {{#toJson}}countries{{/toJson}}
}
}
]
}
}
}
Perfect, now we only have to write a few lines in Javascript to accomplish filtering by state and country.
You might notice, however, that there is a problem with the above. Because of the way bool queries work in Elasticsearch, whenever we filter by terms on an empty array (for example if states = []), we would never receive any results. This is because the filter is only keeping documents with attributes that match any of the strings in the passed array. If there are no items in the array, there are no items to match with, so by default, all documents will be filtered out.
Instead, what we have to do is add a bit more logic to our Javascript before posting to Elasticsearch. For arrays, we simply add a
var data = {
"id": "your_search_template_name",
"params" {
"states": states,
"countries":countries,
"states_count": states.length,
"countries_count": countries.length
}
}
Because a count of 0 returns a falsy value in mustache, the objects between {{#states_count}} and {{/states_count}} will only render for our query if states_count > 0 in our new mustache template definition:
{
"query": {
"bool": {
"must": [],
"should": [],
"filter": [
{{#states_count}}
{
"terms": {
"state": {{#toJson}}states{{/toJson}}
}
},
{{/states_count}}
{{#countries_count}}
{
"terms": {
"country": {{#toJson}}countries{{/toJson}}
}
},
{{/countries_count}}
...
]
}
}
Great, we've made the work of the Javascript Developer slightly simpler. With small query structures, this won't be the most time-saving route, but as queries become more complex, we end up saving quite a bit of logic in the Javascript code. However, simplifying the Javascript is not the only goal of this method. As queries become more complex, we also open ourselves up to disrupting the functionality that we had built in the past. This is where regression testing comes into play.
Because we have built the search template as a single mustache template, we can simply copy the template definition into a local file for testing queries locally.
Below is an example structure for our Elasticsearch definitions/testing project.
/your-directory ├── ansible/ # See https://github.com/elastic/ansible-Elasticsearch ├── definitions/ # JSON/Mustache definition files. | ├── index/ | | ├── mapping/ # See https://www.elastic.co/guide/en/elasticsearch/refe... | | | ├── your_index.json | | | └── another_index.json | | └── settings/ # See https://www.elastic.co/guide/en/elasticsearch/refe... | | └── your_index.json | └── search-templates/ # Contains our search templates as explained above. | ├── your_index.json.mustache | ├── another_index.json.mustache | └── another_way_to_search_another_index.json.mustache └── tests/ # Automated tests.
Ansible allows us to quickly setup a local instance of Elasticsearch.
The definitions directory contains the mapping/settings for your indices as well as the search template mustache files we've created above.
Finally, we have our test directory which contains our tests. Since we aren't building our queries with Javascript, the search templates are simply files on our system, so we can write our tests in any language we desire. We've used PHP with PHPUnit for our project's tests as well as the example test below.
Note: Methods addDocument() and search() below are just example methods that could be used in your test. Implementation of these methods is not discussed in this post.
public function testStatesFilter() {
$this->addDocument([
'name' =>; 'Example 1',
'state' =>; 'Virginia',
]);
$this->addDocument([
'name' =>; 'Example 2',
'state' =>; 'Virginia',
]);
$this->addDocument([
'name' =>; 'Example 3',
'state' =>; 'New York',
]);
$results = $this->search('your_search_template_name', [
'states' =>; [],
'states_count' =>; 0,
]);
$this->assertEquals(3, count($results['hits']['hits']));
$results = $this->search('your_search_template_name', [
'states' =>; [
Virginia
],
'states_count' =>; 1,
]);
$this->assertEquals(2, count($results['hits']['hits']));
$results = $this->search('your_search_template_name', [
'states' =>; [
"Virginia",
"New York",
],
'states_count' =>; 2,
]);
$this->assertEquals(3, count($results['hits']['hits']));
}
Once we have this test passing, we can be sure that our search template is filtering our results by states correctly. This not only helps us check our query is functioning as expected now, but also ensures it continues to function into the future. Suppose in the future we add a feature where documents from the same state as the current user are given a higher ranking in their search results. Depending on the implementation, we might accidentally break our state filtering capabilities. Luckily, now that we've put the work into building the test, our testing framework will be able to warn us automatically of any regressions. Thus we are able to confidently push forward to unlock all of the potential of Elasticsearch as a search engine.
Posted in #Technologies