Uncovering Ruby Bytecode Patterns
Since Ruby 1.9, Ruby runs your code in a bytecode VM. That means that the ruby compiler converts your code to a series of bytecode instructions. For example,
ruby --dump=insns -e '5 * 10'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,6)> (catch: false)
0000 putobject 5
0002 putobject 10
0004 opt_mult <calldata!mid:*, argc:1, ARGS_SIMPLE>[CcCr]
0006 leave
The bytecode instructions putobject
is called twice, opt_mult
is called next, then lastly leave
. These are the bytecode instructions that Ruby runs when executing 5 * 10
. Ruby uses a stack based VM, so after putobject 5
is called, 5
is on the stack to be used by other instructions.
What instructions the Ruby VM is actually running? Finding common patterns could lead to interesting optimizations and a better understanding of the Ruby VM. Ruby provides hooks for dtrace
/systemtap
to give this information. a DTrace probe can send us information every time an instruction is run in the VM.
Setup #
build Ruby with DTrace,
git clone https://github.com/ruby/ruby.git
./autogen.sh
mkdir -p ~/.rubies
./configure --prefix="${HOME}/.rubies/ruby-master" --enable-dtrace
In vm_opts.h
set VM_COLLECT_USAGE_DETAILS
to 1
then,
make install
Create this stp script somewhere handy,
// ruby-instructions.stp
probe process("/path/to/ruby/bin/ruby").mark("insn")
{
printf("%s\n", user_string($arg1))
}
Setup rails bench benchmark from https://github.com/k0kubun/railsbench
Make sure to run single threaded and copy the pid from the web worker for the next step.
Capture with,
bundle exec puma -e production --threads=1
sudo stap ruby-instructions.stp -o rails-bench.txt -x <web worker pid>
ab -c 1 -n 10000 localhost:3000/posts
You should now have a text file that is a long list of instructions Ruby ran during your test.
What instructions are most common? #
- getlocal_WC_0 = 1,903,656
- opt_send_without_block = 1,638,356
- leave = 1,057,370
- putself = 729,499
- branchunless = 626,085
- putobject = 584,513
- setlocal_WC_0 = 502,096
- getinstancevariable = 434,992
- pop = 407,503
- dup = 375,151
It seems getlocal_WC_0
, opt_send_without_block
, leave
and putself
are the most common instructions run in rails benchmark.
What instruction pairs are common? #
After an instruction ran, what instructions are most likely to follow it? Because we are measuring a running program, these instructions are not necessarily generated next to each other, but instead they must be executed one after another.
getlocal_WC_0 #
getlocal_WC_0
is very often followed by opt_send_without_block
or getlocal_WC_0
calling itself again. Often running this instruction twice could mean that local access for multiple variables being as fast as possible could show good results in Rails bench.
opt_send_without_block #
leave #
The pattern leave
-> leave
is interesting. It is possible Ruby generates bytecode where leave
is always called twice in a row. Unfortunately, it might be very uncommon that Ruby can know that two leave
instructions are always called one after another. I am curious if Ruby could identify these situations.
putself #
This pattern seems like a, “get ready for opt_send_without_block
” pattern that putself
and getlocal
are both a part of.
branchunless #
putobject #
setlocal_WC_0 #
I am really curious how commonly these setlocal
-> getlocal
pattern is referencing the same variable. This could be replaced with dup
-> setlocal
, though that is only useful if these references are to the same variable. Alternatively, a “stack preserving” version of setlocal might be a different interesting alternative.
getinstancevariable #
pop #
dup #
A common pattern starting with dup
is, ["dup", "branchif", "pop"]
. This pattern ran 912,485 in our test. This instruction sequence gets generated from x ||= 1
. rubyexplorer.xyz
== disasm: #<ISeq:<compiled>@<compiled>:1 (1,0)-(1,7)> (catch: FALSE)
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] x@0
0000 getlocal_WC_0 x@0 ( 1)[Li]
0002 dup
0003 branchif 10
0005 pop
0006 putobject_INT2FIX_1_
0007 dup
0008 setlocal_WC_0 x@0
0010 leave
This instruction sequence duplicates the top element on the stack, branches if that element is truthy and then removes the element it duplicated from the stack. If we are going to remove the original element on the stack anyways, why duplicate it in the first place? Turns out the Ruby VM knows that this sequence is suboptimal and has the information to generate a better sequence. This is the change to generate that better sequence: ruby/ruby#6414. This optimization only has any effect if Ruby knows that the result of the conditional assignment is unused. After this change, conditional assignment where the result is ignored is 1.72x faster.
objtostring #
objtostring
is almost always followed by anytostring
. These instructions are used nearly exclusively for string interpolation rubyexplorer.xyz. There have been two interesting Ruby changes related to string interpolation ruby/ruby#6334 and ruby/ruby#6335
P.S. #
While writing this I rediscovered Tenderlove’s old post about introducing Dtrace. A good read if you find the time.
Also, if you are interested in learning more about YARV and the Ruby VM. I recommend this YARV reference and Ruby Under a Microscope.