Different methods for merging ruby hashes

Today, a co-worker was reviewing some code of mine similar to this:

foo({a: 1}.merge(b: 2))

He suggested that using merge! would be faster, as it would save instantiating a new hash. I was skeptical but decided to put it to the test using benchmark-ips. If you are unfamiliar with benchmark-ips, it is a really awesome gem that measures how many times something can be run in a given timeframe, as opposed to how long it takes to run something. This is a particularly useful measurement when looking at things that take a variable amount of time to execute or, in this case, things that are very quick.

I set up the script to compare these methods as follows:

require 'benchmark/ips'

def foo(hash = {}); end

Benchmark.ips do |x|
  x.report("merge") { foo({a: 1}.merge(b: 2)) }
  x.report("merge!") { foo({a: 1}.merge!(b: 2)) }
  x.compare!
end

This simply replaces merge with merge! and runs each repeatedly for 5 seconds (the default from benchmark-ips). I made foo do nothing just so that all the same objects would be instantiated, without adding any overhead to each run. The results were surprising!

Calculating -------------------------------------
               merge    29.046k i/100ms
              merge!    48.407k i/100ms
-------------------------------------------------
               merge    416.087k (± 3.8%) i/s -      2.091M
              merge!    819.903k (± 4.0%) i/s -      4.115M

Comparison:
              merge!:   819903.3 i/s
               merge:   416087.1 i/s - 1.97x slower

Using merge! is almost 2 times as fast! That’s really great. Out of curiosity, I wanted to check the number of objects that each makes as well. I know that the difference in the way merge and merge! work should mean that with merge! we have half as many objects created, but I wanted to measure it to be sure. For that, we can use ObjectSpace. If you are unfamiliar with ObjectSpace, or need a refresher, our very own Aaron Quint has covered it a few times. To count the number of hash objects we make in a given time period, I run a script like this:

original = ObjectSpace.count_objects[:T_HASH]
1000.times { foo({a: 1}.merge(b: 2)) }
new = ObjectSpace.count_objects[:T_HASH]
puts "Made #{new - original} hash objects"

original = ObjectSpace.count_objects[:T_HASH]
1000.times { foo({a: 1}.merge!(b: 2)) }
new = ObjectSpace.count_objects[:T_HASH]
puts "Made #{new - original} hash objects"

Using merge, we created 4039 hash objects. With merge!, we made only 2039, just as I expected.

It is important to note, however, that using merge! can have some side effects in certain instances. Because it modifies the original hash, you won’t have a copy of that original object. This is especially relevant when using a method argument. For example, take the following code:

def bar(hash_arg)
  baz(hash_arg.merge!({ a: "blah" }))
end

hash = {a: 'hi'}
hash[:a] #=> 'hi'
bar(hash)
hash[:a] #=> 'blah'

This over-writes the :a attribute in the original object. In this instance, using merge would be preferable if you want to retain the original state of hash. You could also call dup on hash_arg. This is particularly useful when doing a number of merges:

def qux!(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

In case you’re curious, using merge! here is still faster than the equivalent with merge (we have to reassign the hash to actually modify it):

def qux!(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

def qux(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg = hash_arg.merge({ "num_#{i}" => i }) }
end

Benchmark.ips do |x|
  x.report("merge") { qux({}) }
  x.report("merge!") { qux!({}) }
  x.compare!
end

Calculating -------------------------------------
               merge     2.386k i/100ms
              merge!     5.962k i/100ms
-------------------------------------------------
               merge     24.337k (± 3.4%) i/s -    121.686k
              merge!     63.059k (± 4.3%) i/s -    315.986k

Comparison:
              merge!:    63058.8 i/s
               merge:    24337.1 i/s - 2.59x slower

All in all, this was a pretty fun dive into some minor performance stuff. While it might not make a huge difference at a small scale, as you start to run a method more and more the time and object space saved can add up! It’s often worth it to grab a few tools and take a look.

UPDATE: Tieg posed the question below of whether Hash#[] would be faster than using dup. I took a swing at it and it appears that he is correct! Here are my findings:

def quux(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

def corge(hash_arg)
 hash_arg = Hash[hash_arg]
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

Benchmark.ips do |x|
  x.report("merge! with dup") { quux({}) }
  x.report("merge! with Hash[]") { corge({}) }
  x.compare!
end

Calculating -------------------------------------
     merge! with dup     4.759k i/100ms
  merge! with Hash[]     4.863k i/100ms
-------------------------------------------------
     merge! with dup     52.455k (± 3.7%) i/s -    266.504k
  merge! with Hash[]     53.576k (± 3.7%) i/s -    267.465k

Comparison:
  merge! with Hash[]:    53575.8 i/s
     merge! with dup:    52454.7 i/s - 1.02x slower

Thanks to Chris Belsole, Mary Cutrali, Dan Condomitti, Aaron Quint, Ari Russo, and Ivan Tse for their help on this post.

Dev Blog

Different Methods for Merging Ruby Hashes

Comments