photo credit Tumisu lt: @pixabay
After some feedback on reddit (thx @f9ae8221b), pointing out a JSON gem I wasn’t aware of, I updated the benchmarks to also support FastJSONparser
and cover symbolize_keys, which is important for my companies use cases (which a co-worker pointed out) and can cause significant performance issues if you have to do that independently of JSON parsing.
I was recently looking at the performance of some endpoints that process large amounts of JSON, and I wondered if we could do even better than we do in terms of performance for that processing. Across our company we have recently switch most of our apps from the Ruby StdLib JSON to OJ, but I had read about SimdJSON and was curious if we should look further into it as well. In this article I will tell you a bit about each of the Ruby JSON options and why you might want to consider them.
OJ is a Ruby library for both parsing and generating JSON with a ton of options. I would basically say if you don’t want to think too much but care about JSON performance just set up the OJ gem, and it should be the best option for most folks. It is well known, tested, and trusted in the community with a ton of support.
The SimdJSON Ruby library doesn’t have a lot of talks, documentation, or attention… but it is binding to the fastest JSON parser out there… It offers parsing speeds that nothing else can touch, and if you are trying to parse extremely large and dynamic JSON, it might just be the best option for you.
symbolize_keys
While simdJSON
is a fast gem, it doesn’t have much support and the way it handles rescuing errors could leak memory. While I didn’t see such issues in my limited production rollout that is worth noting. The user @f9ae8221b
pointed out the memory issue and that the gem FastJsonparser
also wraps simdjson
and has wider community support. I had never heard of the gem, and was already trying to patch SimdJSON to support symbolize_keys
. Luckily FastJsonparser
already supports that option. It still is faster than OJ, and requires a bit more work to intgrate, but looks like a better option than SimdJSON
when you are looking for improved parsing speed. The user still mentions it could have some production issues, so I will have to report back as I roll it out to various systems.
symbolize_keys
It is built in, seriously if you do much with JSON in a production system, just use OJ, unless you want to dig in deeper or find some specific reason it won’t work for you. The Ruby library is fine and will work for any quick check, but if you have any reason to care about performance, OJ is a easy to use drop in replacement… A note of when you shouldn’t use it? If you are authoring a gem, reduce your hard dependencies as much as possible, if you call JSON.parse
and a hosting app is using OJ
, your gem will use OJ
and be faster… You shouldn’t force users of your gem to require OJ
.
Let’s see the difference with favorite Ruby benchmarking gem benchmark-ips, which gives a bit more readable reports than the standard benchmark
lib. These are just quick micro-benchmarks, with all the issues that come with them, but the performance impact has been further validated by deploying to production systems with measurable impacts on the response time. The product use case included far larger JSON payloads and with much higher variability to the data, making me think the results would apply to most web service like systems.
We will load up the various libraries, and some weird fake HASH/JSON data. Then benchmark parsing it for a number of seconds…
require 'benchmark/ips'
require 'json'
require 'oj'
require 'simdjson'
require 'fast_jsonparser'
require 'memory_profiler'
require 'rails'
json = {
"one":1,
"two":2,
"three": "3",
"nested": {
"I": "go",
"deep": "when",
"i": "need",
a: 2
},
"array":[
true,
false,
"mixed",
"types",
2,
4,
6
]
}.as_json.to_json.freeze
puts "ensure these match"
puts Oj.load(json, symbol_keys: false) == Simdjson.parse(json) &&
Simdjson.parse(json) == JSON.parse(json, symbolize_names: false) &&
FastJsonparser.parse(json, symbolize_keys: false) == Simdjson.parse(json)
Benchmark.ips do |x|
x.config(:time => 15, :warmup => 3)
x.report("oj parse") { Oj.load(json, symbol_keys: false) }
x.report("simdjson parse") { Simdjson.parse(json) }
x.report("FastJsonparser parse") { FastJsonparser.parse(json, symbolize_keys: false) }
x.report("stdlib JSON parse") { JSON.parse(json, symbolize_names: false) }
x.compare!
end
# Let's check memory as well...
report = MemoryProfiler.report do
100.times { Simdjson.parse(json.dup) }
end
puts "simpdjson memory"
report.pretty_print
report = MemoryProfiler.report do
100.times { Oj.load(json.dup) }
end
puts "OJ memory"
report.pretty_print
This shows as claimed that SimdJSON and FastJsonparser outperform OJ even on pretty small and contrived JSON examples. The Performance gap holds up or sometimes looks more significant when looking at more realistic production payloads seen in some of the product systems I work with. Note if you need symbolize_keys
or want a bit more community support I would go with FastJsonparser
.
ensure these match
true
Warming up --------------------------------------
oj parse 12.697k i/100ms
simdjson parse 17.276k i/100ms
FastJsonparser parse 17.834k i/100ms
stdlib JSON parse 8.662k i/100ms
Calculating -------------------------------------
oj parse 121.709k (± 3.5%) i/s - 1.828M in 15.040973s
simdjson parse 171.253k (± 4.3%) i/s - 2.574M in 15.060276s
FastJsonparser parse 190.436k (± 3.2%) i/s - 2.853M in 15.000218s
stdlib JSON parse 93.032k (± 3.4%) i/s - 1.403M in 15.102830s
Comparison:
FastJsonparser parse: 190436.3 i/s
simdjson parse: 171252.9 i/s - 1.11x (± 0.00) slower
oj parse: 121709.5 i/s - 1.56x (± 0.00) slower
stdlib JSON parse: 93032.1 i/s - 2.05x (± 0.00) slower
require 'benchmark/ips'
require 'json'
require 'oj'
require 'simdjson'
require 'fast_jsonparser'
require 'memory_profiler'
require 'rails'
json = {
"one":1,
"two":2,
"three": "3",
"nested": {
"I": "go",
"deep": "when",
"i": "need",
a: 2
},
"array":[
true,
false,
"mixed",
"types",
2,
4,
6
]
}.as_json.to_json.freeze
puts "ensure these match"
puts Oj.load(json, symbol_keys: true) == Simdjson.parse(json).deep_symbolize_keys! &&
Simdjson.parse(json).deep_symbolize_keys! == JSON.parse(json, symbolize_names: true) &&
FastJsonparser.parse(json) == Simdjson.parse(json).deep_symbolize_keys!
Benchmark.ips do |x|
x.config(:time => 15, :warmup => 3)
x.report("oj parse") { Oj.load(json, symbol_keys: true) }
x.report("simdjson parse") { Simdjson.parse(json).deep_symbolize_keys! }
x.report("FastJsonparser parse") { FastJsonparser.parse(json) }
x.report("stdlib JSON parse") { JSON.parse(json, symbolize_names: true) }
x.compare!
end
This is the other main reason to use FastJsonparser
depending on the integrations in your apps you might rely on symbolized_keys… We had added that at a very low level in our shared ApiClient, and the performance implications of having to symbolize_keys as a second pass make a big difference. This shows how the simdjson
performance win doesn’t hold up when you need symbolized_keys
.
ensure these match
true
Warming up --------------------------------------
oj parse 13.455k i/100ms
simdjson parse 7.752k i/100ms
FastJsonparser parse 19.458k i/100ms
stdlib JSON parse 8.546k i/100ms
Calculating -------------------------------------
oj parse 134.285k (± 4.5%) i/s - 2.018M in 15.060313s
simdjson parse 75.825k (± 7.2%) i/s - 1.132M in 15.022033s
FastJsonparser parse 208.199k (± 3.1%) i/s - 3.133M in 15.061737s
stdlib JSON parse 86.504k (± 3.5%) i/s - 1.299M in 15.035736s
Comparison:
FastJsonparser parse: 208199.1 i/s
oj parse: 134285.4 i/s - 1.55x (± 0.00) slower
stdlib JSON parse: 86503.7 i/s - 2.41x (± 0.00) slower
simdjson parse: 75825.4 i/s - 2.75x (± 0.00) slower
The results are very similar for a much larger production 120K JSON payload, pulled for a live system. (NOTE: these benchmarks were run on a different machine)… In this case we are showing nearly a 2X performance boost.
without symbolize_keys:
Warming up --------------------------------------
oj parse 62.000 i/100ms
simdjson parse 79.000 i/100ms
stdlib JSON parse 42.000 i/100ms
Calculating -------------------------------------
oj parse 622.377 (± 3.9%) i/s - 9.362k in 15.066907s
simdjson parse 815.699 (± 4.5%) i/s - 12.245k in 15.045902s
stdlib JSON parse 426.656 (± 3.5%) i/s - 6.426k in 15.083428s
Comparison:
simdjson parse: 815.7 i/s
oj parse: 622.4 i/s - 1.31x (± 0.00) slower
stdlib JSON parse: 426.7 i/s - 1.91x (± 0.00) slower
with symbol_keys:
ensure these match
true
Warming up --------------------------------------
oj parse 71.000 i/100ms
simdjson parse 29.000 i/100ms
FastJsonparser parse 82.000 i/100ms
stdlib JSON parse 41.000 i/100ms
Calculating -------------------------------------
oj parse 726.191 (± 1.5%) i/s - 10.934k in 15.059977s
simdjson parse 294.947 (± 2.4%) i/s - 4.437k in 15.052250s
FastJsonparser parse 909.828 (±10.2%) i/s - 13.530k in 15.026051s
stdlib JSON parse 497.749 (± 3.6%) i/s - 7.462k in 15.011659s
Comparison:
FastJsonparser parse: 909.8 i/s
oj parse: 726.2 i/s - 1.25x (± 0.00) slower
stdlib JSON parse: 497.7 i/s - 1.83x (± 0.00) slower
simdjson parse: 294.9 i/s - 3.08x (± 0.00) slower
The MemoryProfiler
(nor production deployments, server metrics) on either small or large JSON objects didn’t really show any substantial difference, so I wouldn’t be too concerned with memory when picking these libraries.
If you have a Ruby service that is parsing large quantities of JSON, it might be worth taking a look at the newer and less known FastJsonparser. While the gem is less documented and takes a bit more work to integrate into your app than OJ. If you are looking for a drop in replacement OJ is still the way to go, but for some use cases SimpdJSON
or FastJsonparser
will be worth the extra effort. If you are using Rails with a production deployment I can’t really see any reason to not use OJ
for the significant performance benefits that come with it. The OJ
library made it as easy as possible to use as a drop in replacement and if you rely on nearly any particular JSON quick of the past they have options to help you stay fully compatible. I know as we look towards Ruby 3 we are also hoping to move away from some of the native extension C libraries, but when it comes to very low level repetitive application tasks vs application logic, sometimes it is hard to beat and worth the integration and dependency cost.