Stampy

Dev Blog

Setting Up Elasticsearch Synonyms

Here at Paperless Post, we’re in the process of upgrading our search engine from Thinking Sphinx to Elasticsearch to provide better and faster search results to our users - more on this in a future blog post! As a result, we wanted to take some time to explore the possibility of implementing synonyms in Elasticsearch. Using synonyms is a very powerful way to cheaply increase the flexibility of your search capabilities. With minimal configuration you can associate “The Big Apple” and “NYC” to “New York City” without specifically spelling out new search terms for each word, or you can make “programmer” and “developer” synonymous in Elasticsearch.

To set up synonyms we have to do two things:

  1. Add a synonyms file.
  2. Create the index with setting and mappings to support synonyms.

Creating a synonyms file

1
2
3
# synonyms.txt
sea cow => manatee
cat, feline, lolcat

This file is a plain text file located in the same directory as your elastic search config by default. You will see later how you can specify a path to the file if need be. Here we are specifying two synonyms. The first is a mapping of “sea cow” to manatee. The second is making cat, feline, and lolcat synonymous. For more information the rules for the syntax are located here.

Setting up index settings and mappings

POST http://localhost:9200/my_index/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "whitespace",
            "filter": ["synonym"]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms_path": "synonyms.txt",
            "ignore_case": true
          }
        }
      }
    }
  },
  "mappings": {
    "animal": {
      "properties": {
        "name": {
          "type": "String",
          "analyzer": "synonym"
        },
        "type": {
          "type": "String",
          "analyzer": "synonym"
        }
      }
    }
  }
}

We are doing two things here:

In our settings we are adding an analyzer called synonym that uses the whitespace tokenizer and the filter synonym. Then we set up the filter with the type, synonym file, and we set ignore_case to true to make our lives easier.

In our mappings we are giving Elasticsearch some clues about what the fields are and which analyzer we want to use when we search them. This is what hooks synonyms to search.

Now we are good to go.

Bonus: Refreshing synonyms file

What happens when you want to change your synonyms on the fly but you don’t want to recreate your index to do so? Luckily there is an easy way to refresh your settings with minimal downtime.

1
2
3
4
5
6
7
8
curl -XPOST 'localhost:9200/my_index/_close'
curl -XPUT 'localhost:9200/my_index/_settings' -d '
{
    "index" : {
        "analysis.filter.synonym.synonyms_path" : "synonyms.txt"
    }
}'
curl -XPOST 'localhost:9200/my_index/_open'

Sources

Comments