Description
I’m writing for my PhD a software which does not a lot of things, but needs to do it fast. I added a lot of logging, for debugging and demonstration purposes, and – of course – I’m using Marshal to dump the various logging events.
To my very surprise, it introduced a very big performance hit. Moreover, the log files were very big. I was, at that time, logging /each/ log event separately, like this:Marshal.dump(ev_name, event_log) Marshal.dump(ev_args, event_log)
Recently, I asked myself what was the size and performance hit of Marshal.dump itself: would not it be better to dump a few events at a time instead of separately. Fortunately for me, my software is running through an event loop, so it is easy to log events one cycle at a time. Here is the surprise: it is not only much faster, but it results in much smaller logs …
First, a small benchmark
You can download here a small benchmark which measures the various times and sizes for different sets of ruby objects, each being written either separately or as a big array. The results are below. The result is IMO without ambiguity: avoid marshalling small objects separately.
| count | size | size/obj | write | read | |
|---|---|---|---|---|---|
| separated | 1000 | 94608 | 23 | 0.037 | 0.028 |
| array | 1000 | 69631 | 17 | 0.012 | 0.012 |
| separated | 15000 | 1424311 | 23 | 0.573 | 0.443 |
| array | 15000 | 1049334 | 17 | 0.256 | 0.307 |
| separated | 25000 | 2374141 | 23 | 0.923 | 0.734 |
| array | 25000 | 1749165 | 17 | 0.475 | 0.399 |
| separated | 35000 | 3324068 | 23 | 1.302 | 1.016 |
| array | 35000 | 2449092 | 17 | 0.717 | 0.578 |
| separated | 45000 | 4273752 | 23 | 1.665 | 1.334 |
| array | 45000 | 3148776 | 17 | 0.901 | 0.912 |
plotted, this gives



Real-world results
The results for my own application point in the same direction: avoid marshaling objects one by one. If you can, do it in sets. The gain in marshalling/unmarshalling times is around 6, same for size. The results are below.
| cycles | events | read_time | time/cycle | time/ev | log_size | bytes/cycle | bytes/evt | |
|---|---|---|---|---|---|---|---|---|
| old | 178 | 4611 | 1.879002 | 10.6 | 0.4ms | 5414392 | 30417 | 1174 |
| new | 142 | 4055 | 0.386898 | 2.7ms | 0.1ms | 786554 | 5539 | 193 |