Setting Up Elasticsearch Synonyms

Jun 30th, 2015

Here at Paperless Post, we’re in the process of upgrading our search engine from Thinking Sphinx to Elasticsearch to provide better and faster search results to our users - more on this in a future blog post! As a result, we wanted to take some time to explore the possibility of implementing synonyms in Elasticsearch. Using synonyms is a very powerful way to cheaply increase the flexibility of your search capabilities. With minimal configuration you can associate “The Big Apple” and “NYC” to “New York City” without specifically spelling out new search terms for each word, or you can make “programmer” and “developer” synonymous in Elasticsearch.

To set up synonyms we have to do two things:

Add a synonyms file.
Create the index with setting and mappings to support synonyms.

Creating a synonyms file

# synonyms.txt
sea cow => manatee
cat, feline, lolcat

This file is a plain text file located in the same directory as your elastic search config by default. You will see later how you can specify a path to the file if need be. Here we are specifying two synonyms. The first is a mapping of “sea cow” to manatee. The second is making cat, feline, and lolcat synonymous. For more information the rules for the syntax are located here.

Setting up index settings and mappings

POST http://localhost:9200/my_index/

{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "whitespace",
            "filter": ["synonym"]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms_path": "synonyms.txt",
            "ignore_case": true
          }
        }
      }
    }
  },
  "mappings": {
    "animal": {
      "properties": {
        "name": {
          "type": "String",
          "analyzer": "synonym"
        },
        "type": {
          "type": "String",
          "analyzer": "synonym"
        }
      }
    }
  }
}

We are doing two things here:

In our settings we are adding an analyzer called synonym that uses the whitespace tokenizer and the filter synonym. Then we set up the filter with the type, synonym file, and we set ignore_case to true to make our lives easier.

In our mappings we are giving Elasticsearch some clues about what the fields are and which analyzer we want to use when we search them. This is what hooks synonyms to search.

Now we are good to go.

Bonus: Refreshing synonyms file

What happens when you want to change your synonyms on the fly but you don’t want to recreate your index to do so? Luckily there is an easy way to refresh your settings with minimal downtime.

curl -XPOST 'localhost:9200/my_index/_close'
curl -XPUT 'localhost:9200/my_index/_settings' -d '
{
    "index" : {
        "analysis.filter.synonym.synonyms_path" : "synonyms.txt"
    }
}'
curl -XPOST 'localhost:9200/my_index/_open'

Sources

RailsConf 2015 Roundup

Jun 3rd, 2015

This year, seven of us attended RailsConf 2015 which was held in Atlanta. It was a great experience especially since it was a big group of us. We listened to interesting talks, discussed with each other about what we’ve learned, and met a lot of people from the Ruby community. Here are some of our favorite talks:

Ivan Tse

Speed Science - Richard Schneeman

Video

This is a great talk about how one can measure performance in Ruby applications and how to address them. Richard Schneeman recounts some of his personal experiences of investigating slow applications and how he was able to measure and experiment with his hypothesis. One of the biggest takeaways for me was that you can increase your memory footprint with retained objects as well as non-retained objects!

He then gave examples of using derailed_benchmarks, which help get metrics about performance. I used the memory usage at require time benchmark against our application and found that one of our gems was unneccessarily loading classes that weren’t used. This is the pull request to fix it.

Finally, he talks about “Funday Fridays” which is the idea that instead of deploying on Fridays, we could work on miscellaneous tasks such as investigating performance issues. We have a similar concept here at Paperless Post called Project Days. Project Days are alloted time for you to work on things you don’t really have the time for but you think add value to the team.

There is also a blog post about this topic that you can check out.

Yanik Jayaram

Nothing is Something - Sandi Metz

Video

This talk brings up the notion of how, when writing our applications, we may often find ourselves checking to see whether an object we are dealing with is Nil. We often handle these Nil checks in the same way throughout our application, and, given that we do so, it is worth considering to move our handling of Nils into a centralized location in our code. Enter the ‘Null Object Pattern’. This pattern is a way for handling Nil values in a way which is DRY, object-oriented, and in a “tell, don’t ask” style of programming.

I summarize the idea of the Null Object Pattern here:

Chris Belsole

I went to a lot of great talks this year. Here are some of my favorites

DevOps for The Lazy - Aja Hammerly

Video

Here Aja Hammerly shows us how to be lazy as she types fewer than 30 lines of code to spin up Docker and Kubernetes to allow for rapid deployment. It is a good intro into both of these technologies as it is a very high level overview.

Microservices, a bittersweet symphony - Sebastian Sogamoso

Video

The theme this year was Monoliths. Here Sebastian Sogamoso discusses why you would want to break one up into micro-services and more importantly why you would not. This is worth checking out if you are knocking this idea around.

Mary Cutrali

It was my first time at Railsconf this year and in the spirit of something different, I’d like to talk about Rust.

Bending the Curve - Yehuda Katz & Tom Dale

Video

I will admit that I walked into this talk only knowing two things: Yehuda & Tom were funny and Rust was wacky. Full disclosure: I left the talk affirmed in my beliefs. In a truly entertaining fashion, Tom and Yehuda walk us through why Rubyists might be interested in choosing Rust to embed some data heavy services in their Rails applications. Functioning as a high-level introduction to some of the more exciting performance boosting features of Rust, this talk succeeds at illustrating how the low-level control and high-level safety of C and the human friendly interface of Ruby come together to deliver a powerful and easy to use language tool. This talk would be a great watch for anyone interested in learning more about Rust, garbage collection, or all things wacky.

Different Methods for Merging Ruby Hashes

Feb 19th, 2015

Today, a co-worker was reviewing some code of mine similar to this:

foo({a: 1}.merge(b: 2))

He suggested that using merge! would be faster, as it would save instantiating a new hash. I was skeptical but decided to put it to the test using benchmark-ips. If you are unfamiliar with benchmark-ips, it is a really awesome gem that measures how many times something can be run in a given timeframe, as opposed to how long it takes to run something. This is a particularly useful measurement when looking at things that take a variable amount of time to execute or, in this case, things that are very quick.

I set up the script to compare these methods as follows:

require 'benchmark/ips'

def foo(hash = {}); end

Benchmark.ips do |x|
  x.report("merge") { foo({a: 1}.merge(b: 2)) }
  x.report("merge!") { foo({a: 1}.merge!(b: 2)) }
  x.compare!
end

This simply replaces merge with merge! and runs each repeatedly for 5 seconds (the default from benchmark-ips). I made foo do nothing just so that all the same objects would be instantiated, without adding any overhead to each run. The results were surprising!

Calculating -------------------------------------
               merge    29.046k i/100ms
              merge!    48.407k i/100ms
-------------------------------------------------
               merge    416.087k (± 3.8%) i/s -      2.091M
              merge!    819.903k (± 4.0%) i/s -      4.115M

Comparison:
              merge!:   819903.3 i/s
               merge:   416087.1 i/s - 1.97x slower

Using merge! is almost 2 times as fast! That’s really great. Out of curiosity, I wanted to check the number of objects that each makes as well. I know that the difference in the way merge and merge! work should mean that with merge! we have half as many objects created, but I wanted to measure it to be sure. For that, we can use ObjectSpace. If you are unfamiliar with ObjectSpace, or need a refresher, our very own Aaron Quint has covered it a few times. To count the number of hash objects we make in a given time period, I run a script like this:

original = ObjectSpace.count_objects[:T_HASH]
1000.times { foo({a: 1}.merge(b: 2)) }
new = ObjectSpace.count_objects[:T_HASH]
puts "Made #{new - original} hash objects"

original = ObjectSpace.count_objects[:T_HASH]
1000.times { foo({a: 1}.merge!(b: 2)) }
new = ObjectSpace.count_objects[:T_HASH]
puts "Made #{new - original} hash objects"

Using merge, we created 4039 hash objects. With merge!, we made only 2039, just as I expected.

It is important to note, however, that using merge! can have some side effects in certain instances. Because it modifies the original hash, you won’t have a copy of that original object. This is especially relevant when using a method argument. For example, take the following code:

def bar(hash_arg)
  baz(hash_arg.merge!({ a: "blah" }))
end

hash = {a: 'hi'}
hash[:a] #=> 'hi'
bar(hash)
hash[:a] #=> 'blah'

This over-writes the :a attribute in the original object. In this instance, using merge would be preferable if you want to retain the original state of hash. You could also call dup on hash_arg. This is particularly useful when doing a number of merges:

def qux!(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

In case you’re curious, using merge! here is still faster than the equivalent with merge (we have to reassign the hash to actually modify it):

def qux!(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

def qux(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg = hash_arg.merge({ "num_#{i}" => i }) }
end

Benchmark.ips do |x|
  x.report("merge") { qux({}) }
  x.report("merge!") { qux!({}) }
  x.compare!
end

Calculating -------------------------------------
               merge     2.386k i/100ms
              merge!     5.962k i/100ms
-------------------------------------------------
               merge     24.337k (± 3.4%) i/s -    121.686k
              merge!     63.059k (± 4.3%) i/s -    315.986k

Comparison:
              merge!:    63058.8 i/s
               merge:    24337.1 i/s - 2.59x slower

All in all, this was a pretty fun dive into some minor performance stuff. While it might not make a huge difference at a small scale, as you start to run a method more and more the time and object space saved can add up! It’s often worth it to grab a few tools and take a look.

UPDATE: Tieg posed the question below of whether Hash#[] would be faster than using dup. I took a swing at it and it appears that he is correct! Here are my findings:

def quux(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

def corge(hash_arg)
 hash_arg = Hash[hash_arg]
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

Benchmark.ips do |x|
  x.report("merge! with dup") { quux({}) }
  x.report("merge! with Hash[]") { corge({}) }
  x.compare!
end

Calculating -------------------------------------
     merge! with dup     4.759k i/100ms
  merge! with Hash[]     4.863k i/100ms
-------------------------------------------------
     merge! with dup     52.455k (± 3.7%) i/s -    266.504k
  merge! with Hash[]     53.576k (± 3.7%) i/s -    267.465k

Comparison:
  merge! with Hash[]:    53575.8 i/s
     merge! with dup:    52454.7 i/s - 1.02x slower

Thanks to Chris Belsole, Mary Cutrali, Dan Condomitti, Aaron Quint, Ari Russo, and Ivan Tse for their help on this post.

Welcome Tatum Lade as the CTO of Paperless Post

Aug 19th, 2014

I’m very excited to announce that this week Tatum Lade has joined Paperless Post as our new Chief Technology Officer. Tatum is coming from a tenure at Boxee and Samsung, where he helped grow the product and the team while also leading R&D. We’re looking forward to having him bring his tech and management expertise to the CTO role and empower us to continue to expand our product and development team.

I (Aaron Quint) am staying full time at Paperless as Chief Scientist, a new role that will be focused on solving our team’s biggest technical challenges.

When I came on as CTO of PP, our entire company was less than 10 people. Now, more than 4 years later, the dev team alone is over 35 and the company is almost 90 people. I, along with the rest of the management team, wanted to find someone who had the experience to help the team grow even further so we could tackle ever larger problems and build better software, faster. I’m so pleased that we were able to find and bring on someone as smart and talented as Tatum. The future is bright.

Intern Recap Summer 2014

Aug 12th, 2014

Each summer, Paperless Post welcomes a few interns on to our engineering team. Our interns have varied backgrounds, but are all interested being developers and working at a tech company/startup. They go through an intensive education program consisting of talks, lunches, meetings with teams and team leads, and then spend the majority of the summer working on a single team developing code. This was our 5th year of the program and we invited our interns to share some thoughts on their experiences!

Richard

My name is Richard. I am a student currently attending University of Waterloo studying Computer Science. I have a passion for mobile development and see myself working in that field in the future. I joined the Platform team in the role of iOS developer. I helped the team develop a lot of new features for the iPhone and iPad applications.

When I began my internship, my first task was to familiarize myself with the team’s codebase. Prior to working there, I had experience with iOS development from my previous internship so I was already familiar with a lot of the codebase and was immediately ready to start tackling some issues. To start off, I was given some bugs and really small features to fix and build. This was to help me become more familiar with the codebase. My first major task was to build a “Welcome To Paperless Post” screen, shown when a user signs up through the application. This welcome screen appears immediately after the user’s account is created, and along with displaying a welcoming message, it gives the user the option to be sent emails about updates. One really cool thing about the Welcome screen is that on the iPhone, it blurs out the background and adds the message on top of it.

My next major task was to implement an automatic reminder scheduler to guests about an event. The automatic reminder is used to remind guests about the event that is coming up. Upon opening the automatic reminder, the user can type in a message and set up a time and date for this message to be sent to all their guests. The user can also send themselves a sample email to see how the message would look.

Throughout my internship, I learned to use a lot of design patterns and coding techniques, and overall I expanded my knowledge in Objective-C. While I was building the Welcome screen, I learned how to interact with the API backend, as this was needed for the email updates sending option. I was really fascinated by the interactions, and this was the first time, I used Objective-C and interacted with something outside of Objective-C. While building the Welcome screen, I also learned a really cool pattern, called categories. Categories are used to extend the functionality of a class. This made my changes to the codebase a lot cleaner and a lot easier to understand and follow. When I wanted to add extra functionality to an already huge class, I made a category based off of that huge class, and thus made those extra functionalities I added more easier to find and readable.

While I was building the automatic reminder, I learned about another really cool pattern I learned, is attributed strings. More specifically, TTTAttributedLabels, which is a customized version of attributed strings. Attributed strings are strings that can be customized, as in certain substrings of the string can be of a different size, font, color, etc than other substrings. Along with all those functionalities, TTTAttributedLabels also adds clickable substrings and links. This made it easier if you have a string and you only want one word of that string to be clickable.

In a lot of these features I have built, I was asked to write unit tests to test their functionalities making sure that it is all working fine. This was all new to me, as I have never built unit tests for iOS, however I quickly learned that making unit tests in iOS is really similar to making unit tests in Python (for web) or in Java (for Android), in which there is a set up and a tear down, that gets called before each test.

I really enjoyed the iOS weekly meetings. In these meetings, the iOS developers join together to listen to one of us talk about a design pattern or coding technique. In these talks, I learned even more about design patterns and coding techniques, and after each one of them, I couldn’t wait to try them out.

Working in Paperless Post was a completely new experience for, with a hip and really relaxing environment. This is also the first time I worked in New York. My team has taught me a lot and given me a lot of really useful feedback that has definitely made me a much better developer, and the experience I gained here is something that will carry on with me in the future.

Hyoju

Hi, my name is Hyoju and I’m a student at Brown University. This summer, I worked with the Host/Account Management (HAM) team. I decided to join Paperless Post because I was impressed by how their product integrates technology and design, beautifully transforming a traditional experience into a digital one. This was my first developer internship and my background was not in Ruby or Rails, however, it didn’t matter. Paperless Post was very willing to not only help me familiarize with their product and processes, but also with the general languages and tools.

During the first few weeks, we were greeted with a series of talks and lunches, quickly making us comfortable with the company. My first tasks were small bug fixes (e.g. optimizing queries, changing copies and constraints), which were completed through pair programming with Gordon, Dev Lead for HAM team. Pair programming allowed me to see how to navigate through the massive web app and familiarize with the coding styles. Then, my tasks evolved into adding small and large features to cart, account and admin pages.

My most interesting tasks were helping with the pricing and discounting method changes. For pricing, I was able to change the method with which we apply price adjustments to our card products. For discounting, I helped change the way promo codes are applied. Currently, promo codes can only be applied on an entire order with various restrictions, but with the change, they will be able to be more freely applied on specific SKUs. Both tasks were difficult to approach at first, as there were a lot of layers of code to go through, making it difficult to figure out all the places the change would affect. However, it was very interesting to dig into that part of the app and I really appreciated that I was given the responsibility to work on changes that affect an integral part of the business model.

I also worked on building various administrator tools and fixes to allow the User Support team to help users more effectively. Such tasks included adding a quick view to keep track of card delivery statuses, building a dropdown to update review statuses, and allowing to batch update guest names.

Being an intern at Paperless Post was an amazing experience. I was not only able to explore and contribute to a live web app, but was able to learn something new every day. From front-end additions to testing, I was able to work on tasks covering various aspects of the app, allowing me to fully explore the product. Though I’ve only been with the company for 3 months, I really enjoyed being part of such a fast growing company and truly felt myself growing with it.

Rick

My name is Rick and I am a fourth year student studying the wonders of Computer Science at Waterloo. I came on board the internship program at Paperless Post as part of my school’s co-op program. I was assigned to the Create team which is mainly responsible for everything having to do with the homepage, paper browser, and the create tool. After doing a quick tour of the office, all of the new hires went through a training program spanning two weeks where we were introduced to the internal dev tools and the architecture of our product. This was awesome, and the time it took to get myself pushing branches and merging into production was only one day and a half. It just goes to show how refined the process is for new people to get started. Much better than some previous companies I have worked for.

I started working on some small bugs and fixes, like adding one letter to the end of a word or making a new link open in a new tab. While these tasks seemed easy, I was not familiar with the codebase and had no previous experience with Ruby or Haml. The codebase has a fairly straightforward structure, but without understanding how everything was hooked up, these simple tasks ended up taking up a lot more time. My coworkers were always more than happy to help me and I remember them saying, “we might have headphones on, but that doesn’t mean you can’t approach us.”

My work was very standalone and while most of the team was working on larger branches, I was getting faster at taking care of the smaller things. It got to the point where I no longer needed to ask people “where can I find #{some_code}?” or “how do you make it appear on a new line like %this?” or “what does the ? mean?”

In the middle of the summer I was told that I’d be working on a huge feature. It was exciting to know that I would add a whole new level of code with another colleague of mine to offer something to our customers that would make them extremely happy. It was great to know that I was making a difference.

Aside from work, we had plenty of team outings and events, with short talks every other Friday from individuals about something development related, to company lunches and internship parties. I even took the canteen floor to demonstrate to the entire company some of my gloving skills!

As a co-op, I wanted to learn things, but not just things like the stuff they teach you in school. I want to know what companies are doing to deal with scaling issues, what’s hot in the software world, and where everything is going. Ever since the first day I came here, I have learned mountains about things I had no clue about before I arrived. I’m really grateful for the internship experience that Paperless Post has given me. The office is very nice, the people are even better, and it’s located in New York City.

If you think you’d be a good fit at Paperless Post, check out our current openings and apply online!

← Older Blog Archives