Regular Expressions and Onigmo, the Ruby regular expression engine

Regular expressions (regex), are powerful tools for finding and manipulating patterns in text. They are widely used in programming languages and text editors, though they are often treated as a black box. I always considered them one part programming and one part magic. The internet is full of articles about how regex are used, but very few diving deeply into their implementations. Today we will explore the theory behind regular expressions, including a brief tour of the most basic theory. We will also delve into the implementation of the Onigmo regular expression engine, which is used in the Ruby programming language.

Brief Theory

I learned some of the theory behind regular expressions reading “Engineering a Compiler” (Cooper & Torczon).

Regular expressions are a type of recognizer. Recognizer is a type of Finite State Automata (FA) which focuses on either accepting or rejecting a...

Nov 28, 2022

Uncovering Ruby Bytecode Patterns

Since Ruby 1.9, Ruby runs your code in a bytecode VM. That means that the ruby compiler converts your code to a series of bytecode instructions. For example,

ruby --dump=insns -e '5 * 10'

== disasm: <ISeq:<main>@-e:1 (1,0)-(1,6)> (catch: false)

0000 putobject                              5

0002 putobject                              10

0004 opt_mult                               <calldata!mid:*, argc:1, ARGS_SIMPLE>[CcCr]

0006 leave

The bytecode instructions putobject is called twice, opt_mult is called next, then lastly leave. These are the bytecode instructions that Ruby runs when executing 5 * 10. Ruby uses a stack based VM, so after putobject 5 is called, 5 is on the stack to be used by other instructions.

What instructions the Ruby VM is actually running? Finding common patterns could lead to interesting optimizations and a better understanding of the Ruby VM. Ruby...

Continue reading →

Handmade Seattle - notes

Handmade Seattle had an exceptionally high number of exceptional talks especially given the it is a small conference with only one track. I collected a few notes on some of my favorites, though I invite you to watch all of the talks.

Weathering Software Winter - Devine always has inspiring and visually stunning talks. Seeing more of the inspiration behind the Uxn virtual machine inspires me to also pull inspiration from history. His ideas around e-waste, preservation and doing more with less are under appreciated areas to focus in an industry that wants to reinvent everything all the time.

Compexity: Why Can’t We Make Simple Software - Peter Van Hardenberg did an excellent job explaining software complexity and the ways that it manifests without blame! It is a very easy trap to fall in as an engineer to think that complexity is a problem created by other engineers – or your past...

Speaking about Performance

I get confused reading about performance differences. When someone says some software is “twice as fast” I understand it can now do two tasks in the time it took to do one. However, There are very similar sounding phrases with vastly different meanings.

Consider,

This system is 40% faster than the previous system

compared with,

This system is 40% the speed of the previous system

The first phrase is talking about a system that is slightly faster, the second is talking about a system that is significantly slower.

interestingly,

The system is 1.3x the speed

and,

The system is 1.3x faster

the first phrase indicates a slightly faster machine, but the second is more then twice as fast! Or does it? The really insidious problem with the words we use about performance is that often the person saying/writing them doesn’t have a clear idea how the numbers might be misinterpreted. “3 times...

Embedding Lua in C by Example

Here is a series of short C snippets to learn how to embed Lua in your C program. The complete source for these examples can be found at: https://github.com/HParker/embedding-lua-in-c-by-example

Run a string of Lua code

include <stdio.h>
include <lauxlib.h>
include <lua.h>
include <lualib.h>

int main() {
  // setup lua
  lua_State *L = luaL_newstate();
  luaL_openlibs(L);

  // Run a lua string
  luaL_dostring(L, "print(\"hello from lua\")");
}

Run a file of Lua code

include <stdio.h>
include <lauxlib.h>
include <lua.h>
include <lualib.h>

int main() {
  // setup lua
  lua_State *L = luaL_newstate();
  luaL_openlibs(L);

  // run a lua file
  luaL_dofile(L, "hello.lua");
}

Get a number from Lua

  luaL_dostring(L, "x = 10");
  lua_getglobal(L, "x");
  int x = (int)lua_tonumber(L, -1);
  printf("x = %i\n", x);

Get a string from Lua

  luaL_dostring(L, "string = 'hi there'");

...

Invoca Hackathon

I participated in a hackathon at Invoca where we where give 2 days to build and present a project of our choosing. I won most technical project for my adaptation of a very simple RISC game written in Elm

Invoca Post: https://blog.invoca.com/developers-working-weekend-recap-invoca-hackathon/ Update: This post seems to be gone and I can’t find it on the way back machine.
Project: https://github.com/HParker/TIS

The project is a small assembly-like programing language that only knows a few primitives. From this you are able to program a small fictional “chip” to do simple tasks like increment or sum numbers together. The programing concept is that inputs come from the left and output to the right. You also have two registers one of which you can write to directly and one that you can only swap your register with. Given more time, I would consider adding additional puzzles and adding...

Orbtoberfest by CircleCI

Screen Shot 2019-10-16 at 3.58.03 PM.png

CircleCI put on an event in Seattle to promote their open Orb registry. Orbs are their shared configuration format that allows you to write a CircleCI configuration for a job that anyone can use. This was possible before Orbs by sharing a bit of YAML that you could copy into your existing CircleCI config, but now you can reference the Orb by name and leave the rest up to the Orb. They call it Orbtoberfest which is kinda cute.

The simplest Orb YAML file looks like:

version: 2.1
description: Hello World Orb

commands:
  hello-world:
    description: say hello world
    steps:
    - run:
        command: echo Hello, World
        name: hello-world

examples:
  hello-world:
    description: Say hello
    usage:
      orbs:
        hello-world: hparker/hello-world@x.y.z
      version: 2.1
      workflows:
        test:
          jobs:
          - hello-world/hello-world
        version: 2

...

Your web server filesystem is a liability

There seems to be a disconnect about how we should manage our infrastructure. “Infrastructure as code” and “immutable infrastructure” are common approaches now. The “pet servers” anti-pattern seems well explained, but no critique of file system use on a web server.

The filesystem on your web servers is a liability and you should avoid touching it at run time if at all possible. There are a number of ways that it can go wrong. Maybe you deploy to hosts with hard disks or you are referencing your Kubernetes cluster’s persistent volume, either way persistent storage access should be treated with respect.

Local static assets

Let’s assume that your infrastructure is a typical web stack, n web servers talking to primary db with replica. A common approach I see to managing assets such as images, css and javascript is to server them from the web server filesystem. Write some nginx or haproxy...

Boulder Pebble Rocks hackathon

In 2015 I participated in the pebble rocks boulder hackathon. You can see a summary of my teams project here:
https://www.hackster.io/teamturing/turnakit-706245

We used the pebble watch to provide you turn by turn directions on the handlebars of your bike.

Event Photos: https://www.flickr.com/photos/23rdstudiosboulder/albums/72157657803685160
Event info: https://www.viget.com/articles/pebble-rocks-boulder-hardware-innovation-packed-into-a-weekend-hackathon/

One of my favorite projects there was largely a huge 3d printing exercise making a HUGE print to attempt in one weekend, a very complicated musical charging station for the pebble watch:
https://www.hackster.io/team-engineerable/timedock-speakeasy-67fcfa

Adam Hess

Words about software by Adam

Read this first