Archive for January 22, 2010
Domain Specific Language in Ruby used as a Virtual Machine
Introduction
In this post I will describe my attempts to build virtual machine, that executes source byte-code as files written in DSL (Domain Specific Language). This DSL is built on the top of other popular language.
Couple of months ago I created a small programming language, that was compiled to assembler. Compiled files could been executed by virtual machine. There were two versions of VM, in Python and in C (created by czarodziej). I take this assembler and (tiny) virtual machine and tried to build another version, as described below.
I was confused was what language should I use as a base for DSL. It should be flexible, expressive and with good support for metaprogramming. It couldn’t be Java or PHP for sure! ;-) I considered Lisp, Ruby, Erlang and Python. My choice was Ruby.
Some people might say that it’s not DSL at all, that it’s in place where DSL shouldn’t be used or that all is wrong. I didn’t worry about those questions and treat it as an interesting experiment, but for sure it could be made better.
It isn’t useless theory, I’ve seen those languages in my previous jobs (used to creating specialized tests scenarios or gathering in formations). The way of accessing data through Django’s ORM is also sort of DSL.
Details of implementation
Part of assembler’s syntax is invalid in Ruby, so I made small preprocessor, that runs before code is loaded and makes it valid. It’s second purpose is to modify this code to the form that is more expressive in Ruby. It’s coded in DSLPreprocesor class.
This is how assembly file looks before preprocesing:bar:
int 0
ret
foo:
call bar
ret
main:
push 13
call foo
ret
and this is how it looks after, I’ve added indents to make code more readable:
@bar = Proc.new{
int 0
}
@foo = Proc.new{
call @bar
}
@main = Proc.new{
push 13
call @foo
}
We have to implement in Ruby all mnemonics, like call, add in form, that makes state and behavior of running virtual machine exactly like in original version. Then we have to evaluate processed file (there are only variables, so nothing will happened), and then run @main object.
You might be wondering why I didn’t use normal functions instead of Ruby’s mods, well, I needed to tread functions as objects (see implementation of call in DSLVirtualMachine).
You can obtain more information by analyzing source code (see bellow) or by confrontation with original version, you can also check project’s website.
1 #!/usr/bin/env ruby 2 3 class DSLVirtualMachine 4 def initialize(file_name = String.new) 5 @preprocesor = DSLPreprocesor.new file_name 6 @stack = Array.new 7 @memmory = Array.new 8 @stack_ptr = [0] 9 @memmory_ptr = [0] 10 end 11 12 def run() 13 eval @preprocesor.parse() # dynamic code loading 14 @main.call() # code executing 15 end 16 17 private 18 19 def alu(operation) 20 parser = { 'add' => lambda { @stack.pop + @stack.pop }, 21 'sub' => lambda { @stack.pop - @stack.pop }, 22 'div' => lambda { @stack.pop / @stack.pop }, 23 'mul' => lambda { @stack.pop * @stack.pop }, 24 'gt' => lambda { @stack.pop > @stack.pop }, } 25 push parser[operation].call() 26 end 27 28 def call(function) 29 @memmory_ptr.push @memmory.length 30 function.call() 31 @memmory_ptr.pop 32 end 33 34 def push(element) 35 @stack.push element 36 end 37 38 def load(offset) 39 memmory_ptr = @memmory_ptr.last 40 @stack.push @memmory[offset+memmory_ptr] 41 end 42 43 def store(offset) 44 memmory_ptr = @memmory_ptr.last 45 @memmory[offset+memmory_ptr] = @stack.pop 46 end 47 48 def get_if_condition() 49 return @stack.pop 50 end 51 52 def int(interrupt) 53 parser = [ lambda { p @stack.pop }, 54 lambda { push gets.to_i } ] 55 parser[interrupt].call() 56 end 57 end 58 59 60 class DSLPreprocesor 61 attr_accessor :fname 62 63 def initialize(fname = String.new) 64 @fname = fname 65 # input token => output token, order does matter 66 @parser = { 67 # remove comments 68 /^([^;]*);.*$/ => 69 lambda { |u| "#{u}" }, 70 # convert jump to label into 'if' statement 71 # it's very LAME! 72 /^[\s]*jz ([a-z0-9]*)$/ => 73 lambda { |u| "if (get_if_condition)" }, 74 /^a[0-9]:*$/ => 75 lambda { |u| "end" }, 76 77 # convert function label to declaration 78 # of proc in ruby ruby 79 /^([a-z]*[0-9]*):$/ => 80 lambda { |u| "@#{u} = Proc.new{" }, 81 # convert end of function in assembler 82 # to end of proc in in ruby 83 /^[\s]*ret$/ => 84 lambda { |u| "}" }, 85 # 86 /^[\s]*call ([a-z0-9]*)$/ => 87 lambda { |u| "call @#{u}" }, 88 89 # treat ALU operation in consistent way 90 /^[\s]*(add|sub|mul|div|gt|lt|eq)$/ => 91 lambda { |u| "alu '#{u}'" }, 92 } 93 end 94 95 def parse() 96 file = File.readlines(self.fname) 97 return file.map { |l| parse_mnemonic(l) }.join('') 98 end 99 100 private 101 def parse_mnemonic(mnemonic) 102 @parser.map { |k,v| mnemonic.gsub!(k){ v.call($1) } } 103 return mnemonic 104 end 105 end 106 107 # TODO use optparse instead of raw ARGV[0] 108 vm = DSLVirtualMachine.new ARGV[0] 109 vm.run() 110
Results
My implementation is incomplete and in some places incompatible with original, but I reached the point and build it. It becomes overcomplicated due to fact that syntax wasn’t created to be compatible with Ruby’s syntax. Second problem was my lack of experience with Ruby — to be honest I don’t know this language at all.
It would be interesting to count SLOC, but I didn’t have tool to do that (I don’t have space on HDD to install it) and implementation in Ruby is incomplete. I count all lines in source files, for C implementation it was ~600, for Python ~250, for DSL+Ruby ~110.
Case studies and materials about DSL
- Ruby
- Lisp
- Erlang
- JavaScript
- Python
- Here is link, how to do that in Python, IMHO it’s not as sexy as in Ruby, Erlang or others.