Uncovering Ruby Bytecode Patterns

cute_stack_machine.JPG

Since Ruby 1.9, Ruby runs your code in a bytecode VM. That means that the ruby compiler converts your code to a series of bytecode instructions. For example,

ruby --dump=insns -e '5 * 10'

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,6)> (catch: false)

0000 putobject                              5

0002 putobject                              10

0004 opt_mult                               <calldata!mid:*, argc:1, ARGS_SIMPLE>[CcCr]

0006 leave

The bytecode instructions putobject is called twice, opt_mult is called next, then lastly leave. These are the bytecode instructions that Ruby runs when executing 5 * 10. Ruby uses a stack based VM, so after putobject 5 is called, 5 is on the stack to be used by other instructions.

What instructions the Ruby VM is actually running? Finding common patterns could lead to interesting optimizations and a better understanding of the Ruby VM. Ruby provides hooks for dtrace/systemtap to give this information. a DTrace probe can send us information every time an instruction is run in the VM.

Setup #

build Ruby with DTrace,

git clone https://github.com/ruby/ruby.git
./autogen.sh
mkdir -p ~/.rubies
./configure --prefix="${HOME}/.rubies/ruby-master" --enable-dtrace

In vm_opts.h set VM_COLLECT_USAGE_DETAILS to 1

then,

make install

Create this stp script somewhere handy,

// ruby-instructions.stp
probe process("/path/to/ruby/bin/ruby").mark("insn")
{
    printf("%s\n", user_string($arg1))
}

Setup rails bench benchmark from https://github.com/k0kubun/railsbench

Make sure to run single threaded and copy the pid from the web worker for the next step.

Capture with,

bundle exec puma -e production --threads=1
sudo stap ruby-instructions.stp -o rails-bench.txt -x <web worker pid>
ab -c 1 -n 10000 localhost:3000/posts

You should now have a text file that is a long list of instructions Ruby ran during your test.

Full Parsed Results

What instructions are most common? #

chart.svg

It seems getlocal_WC_0, opt_send_without_block, leave and putself are the most common instructions run in rails benchmark.

What instruction pairs are common? #

After an instruction ran, what instructions are most likely to follow it? Because we are measuring a running program, these instructions are not necessarily generated next to each other, but instead they must be executed one after another.

getlocal_WC_0 #

getlocal_WC_0.svg

getlocal_WC_0 is very often followed by opt_send_without_block or getlocal_WC_0 calling itself again. Often running this instruction twice could mean that local access for multiple variables being as fast as possible could show good results in Rails bench.

opt_send_without_block #

opt_send_without_block.svg

leave #

leave.svg

The pattern leave -> leave is interesting. It is possible Ruby generates bytecode where leave is always called twice in a row. Unfortunately, it might be very uncommon that Ruby can know that two leave instructions are always called one after another. I am curious if Ruby could identify these situations.

putself #

putself.svg

This pattern seems like a, “get ready for opt_send_without_block” pattern that putself and getlocal are both a part of.

branchunless #

branchunless.svg

putobject #

putobject.svg

setlocal_WC_0 #

setlocal_WC_0.svg

I am really curious how commonly these setlocal -> getlocal pattern is referencing the same variable. This could be replaced with dup -> setlocal, though that is only useful if these references are to the same variable. Alternatively, a “stack preserving” version of setlocal might be a different interesting alternative.

getinstancevariable #

getinstancevariable.svg

pop #

pop.svg

dup #

dup.svg

A common pattern starting with dup is, ["dup", "branchif", "pop"]. This pattern ran 912,485 in our test. This instruction sequence gets generated from x ||= 1. rubyexplorer.xyz

== disasm: #<ISeq:<compiled>@<compiled>:1 (1,0)-(1,7)> (catch: FALSE)
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] x@0
0000 getlocal_WC_0 x@0 ( 1)[Li]
0002 dup
0003 branchif 10
0005 pop
0006 putobject_INT2FIX_1_
0007 dup
0008 setlocal_WC_0 x@0
0010 leave

This instruction sequence duplicates the top element on the stack, branches if that element is truthy and then removes the element it duplicated from the stack. If we are going to remove the original element on the stack anyways, why duplicate it in the first place? Turns out the Ruby VM knows that this sequence is suboptimal and has the information to generate a better sequence. This is the change to generate that better sequence: ruby/ruby#6414. This optimization only has any effect if Ruby knows that the result of the conditional assignment is unused. After this change, conditional assignment where the result is ignored is 1.72x faster.

objtostring #

objtostring.svg

objtostring is almost always followed by anytostring. These instructions are used nearly exclusively for string interpolation rubyexplorer.xyz. There have been two interesting Ruby changes related to string interpolation ruby/ruby#6334 and ruby/ruby#6335

P.S. #

While writing this I rediscovered Tenderlove’s old post about introducing Dtrace. A good read if you find the time.

Also, if you are interested in learning more about YARV and the Ruby VM. I recommend this YARV reference and Ruby Under a Microscope.

 
2
Kudos
 
2
Kudos

Now read this

Speaking about Performance

I get confused reading about performance differences. When someone says some software is “twice as fast” I understand it can now do two tasks in the time it took to do one. However, There are very similar sounding phrases with vastly... Continue →