Stampy

Dev Blog

Different Methods for Merging Ruby Hashes

Today, a co-worker was reviewing some code of mine similar to this:

1
foo({a: 1}.merge(b: 2))

He suggested that using merge! would be faster, as it would save instantiating a new hash. I was skeptical but decided to put it to the test using benchmark-ips. If you are unfamiliar with benchmark-ips, it is a really awesome gem that measures how many times something can be run in a given timeframe, as opposed to how long it takes to run something. This is a particularly useful measurement when looking at things that take a variable amount of time to execute or, in this case, things that are very quick.

I set up the script to compare these methods as follows:

1
2
3
4
5
6
7
8
9
require 'benchmark/ips'

def foo(hash = {}); end

Benchmark.ips do |x|
  x.report("merge") { foo({a: 1}.merge(b: 2)) }
  x.report("merge!") { foo({a: 1}.merge!(b: 2)) }
  x.compare!
end

This simply replaces merge with merge! and runs each repeatedly for 5 seconds (the default from benchmark-ips). I made foo do nothing just so that all the same objects would be instantiated, without adding any overhead to each run. The results were surprising!

1
2
3
4
5
6
7
8
9
10
Calculating -------------------------------------
               merge    29.046k i/100ms
              merge!    48.407k i/100ms
-------------------------------------------------
               merge    416.087k (± 3.8%) i/s -      2.091M
              merge!    819.903k (± 4.0%) i/s -      4.115M

Comparison:
              merge!:   819903.3 i/s
               merge:   416087.1 i/s - 1.97x slower

Using merge! is almost 2 times as fast! That’s really great. Out of curiosity, I wanted to check the number of objects that each makes as well. I know that the difference in the way merge and merge! work should mean that with merge! we have half as many objects created, but I wanted to measure it to be sure. For that, we can use ObjectSpace. If you are unfamiliar with ObjectSpace, or need a refresher, our very own Aaron Quint has covered it a few times. To count the number of hash objects we make in a given time period, I run a script like this:

1
2
3
4
5
6
7
8
9
original = ObjectSpace.count_objects[:T_HASH]
1000.times { foo({a: 1}.merge(b: 2)) }
new = ObjectSpace.count_objects[:T_HASH]
puts "Made #{new - original} hash objects"

original = ObjectSpace.count_objects[:T_HASH]
1000.times { foo({a: 1}.merge!(b: 2)) }
new = ObjectSpace.count_objects[:T_HASH]
puts "Made #{new - original} hash objects"

Using merge, we created 4039 hash objects. With merge!, we made only 2039, just as I expected.

It is important to note, however, that using merge! can have some side effects in certain instances. Because it modifies the original hash, you won’t have a copy of that original object. This is especially relevant when using a method argument. For example, take the following code:

1
2
3
4
5
6
7
8
def bar(hash_arg)
  baz(hash_arg.merge!({ a: "blah" }))
end

hash = {a: 'hi'}
hash[:a] #=> 'hi'
bar(hash)
hash[:a] #=> 'blah'

This over-writes the :a attribute in the original object. In this instance, using merge would be preferable if you want to retain the original state of hash. You could also call dup on hash_arg. This is particularly useful when doing a number of merges:

1
2
3
4
def qux!(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

In case you’re curious, using merge! here is still faster than the equivalent with merge (we have to reassign the hash to actually modify it):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def qux!(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

def qux(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg = hash_arg.merge({ "num_#{i}" => i }) }
end

Benchmark.ips do |x|
  x.report("merge") { qux({}) }
  x.report("merge!") { qux!({}) }
  x.compare!
end

Calculating -------------------------------------
               merge     2.386k i/100ms
              merge!     5.962k i/100ms
-------------------------------------------------
               merge     24.337k (± 3.4%) i/s -    121.686k
              merge!     63.059k (± 4.3%) i/s -    315.986k

Comparison:
              merge!:    63058.8 i/s
               merge:    24337.1 i/s - 2.59x slower

All in all, this was a pretty fun dive into some minor performance stuff. While it might not make a huge difference at a small scale, as you start to run a method more and more the time and object space saved can add up! It’s often worth it to grab a few tools and take a look.

UPDATE: Tieg posed the question below of whether Hash#[] would be faster than using dup. I took a swing at it and it appears that he is correct! Here are my findings:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def quux(hash_arg)
 hash_arg = hash_arg.dup
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

def corge(hash_arg)
 hash_arg = Hash[hash_arg]
 10.times { |i| hash_arg.merge!({ "num_#{i}" => i }) }
end

Benchmark.ips do |x|
  x.report("merge! with dup") { quux({}) }
  x.report("merge! with Hash[]") { corge({}) }
  x.compare!
end

Calculating -------------------------------------
     merge! with dup     4.759k i/100ms
  merge! with Hash[]     4.863k i/100ms
-------------------------------------------------
     merge! with dup     52.455k (± 3.7%) i/s -    266.504k
  merge! with Hash[]     53.576k (± 3.7%) i/s -    267.465k

Comparison:
  merge! with Hash[]:    53575.8 i/s
     merge! with dup:    52454.7 i/s - 1.02x slower

Thanks to Chris Belsole, Mary Cutrali, Dan Condomitti, Aaron Quint, Ari Russo, and Ivan Tse for their help on this post.

Comments