• Shortcuts : 'n' next unread feed - 'p' previous unread feed • Styles : 1 2

» Publishers, Monetize your RSS feeds with FeedShow:  More infos  (Show/Hide Ads)


Date: Monday, 13 Aug 2007 22:28

Like many Pragmatic Programmer fans, I've been having a look at Erlang recently by working my Programming Erlang. In the book, the author includes a challenge: build a message ring of processes of size M and send a message around the ring N times, timing how long this takes. The author also suggests doing this in other languages and comparing the results. Having now done this, I can tell you that it is an interesting exercise.

First, the Erlang results. Here's a sample run that creates 30,000 processes and sends a message around that ring 1,000 times:

$ erl -noshell -s solution start 30000 1000
Creating 30000 processes (32768 allowed)...
Done.
Timer started.
Sending a message around the ring 1000 times...
Done:  success
Time in seconds:  29

So we see about 30,000,000 message passes there in roughly 30 seconds. I should also note that Erlang creates those processes very, very fast. It's possible to raise the process limit shown there, but I'm more interested in comparing what these languages can do out of the box.

Now Ruby doesn't have an equivalent to Erlang processes, so we need to decide what the proper replacement is. The first thing I tried was fork()ing some Unix processes:

$ ruby forked_mring.rb 100 10000
Creating 100 processes...
Timer started.
Sending a message around the ring 10000 times...
Done.
Done:  success.
Time in seconds:  32

You should notice here is the small number of processes I could create here using the default limits imposed by my operating system. Again, it's possible to raise this limit but I don't think I'm going to get it up to 30,000 very easily. I did get these processes very quickly though, again.

So here we are passing 1,000,000 messages in about the same amount of time.

In an attempt to bypass the low process limit, I wrote another implementation with Ruby's threads. The results of that aren't too impressive though:

$ ruby threaded_mring.rb 100 1000
Using the standard Ruby thread library.
Creating 100 processes...
Timer started.
Sending a message around the ring 1000 times...
Done:  success.
Time in seconds:  32
$ ruby threaded_mring.rb 1000 4
Using the standard Ruby thread library.
Creating 1000 processes...
Timer started.
Sending a message around the ring 4 times...
Done:  success.
Time in seconds:  30

You should see from the second run that it is possible to create quite a few more threads, but I need to mention that creating that many took around 15 seconds. Sadly, both of these runs paint an ugly picture: introducing synchronization just kills performance. Using the fastthread library doesn't help as much as we would like:

$ ruby -rubygems threaded_mring.rb 100 1000
Using the fastthread library.
Creating 100 processes...
Timer started.
Sending a message around the ring 1000 times...
Done:  success.
Time in seconds:  28
$ ruby -rubygems threaded_mring.rb 1000 5
Using the fastthread library.
Creating 1000 processes...
Timer started.
Sending a message around the ring 5 times...
Done:  success.
Time in seconds:  29

So at best, we're passing 100,000 and 5,000 messages in our roughly 30 second timeframe, depending on how many processes we need.

Am I suggesting we all switch to Erlang? No. I've enjoyed seeing how the other side lives and I've learned a lot from getting into the functional mindset. Parts of Erlang are very impressive and concurrency is definitely one of them. It hasn't been enough to win me over from Ruby yet though. I couldn't ever see myself doing my day to day work without The Red Lady.

What I would love to see is some way to manage Erlang-like concurrency in Ruby. We could have some great fun building servers with that, I think.

I'll share the Erlang code here so the people that know the language better than me can provide corrections. First, here's the code the spawns processes and passes messages:

-module(mring).
-export([build/1, send_and_receive/2, round_and_round/3]).

build(RingSize) ->
  ParentPid = self(),
  spawn(fun() -> build(RingSize - 1, ParentPid) end).

build(0,        StartPid) -> forward(StartPid);
build(RingSize, StartPid) ->
  ChildPid = spawn(fun() -> build(RingSize - 1, StartPid) end),
  forward(ChildPid).

forward(Pid) ->
  receive
    {message, Text, PassCount} ->
      Pid ! {message, Text, PassCount + 1},
      forward(Pid)
  end.

send_and_receive(Ring, Text) ->
  Ring ! {message, Text, 0},
  receive Returned -> Returned end.

round_and_round(_, _, 0)                    -> success;
round_and_round(Ring, ProcessCount, Repeat) ->
  Check = "Checking the ring...",
  case send_and_receive(Ring, "Checking the ring...") of
    {message, Check, ProcessCount} ->
      round_and_round(Ring, ProcessCount, Repeat - 1);
    Unexpected                     -> {failure, Unexpected}
  end.

Next, we have a little helper I wrote to time things. I'm pretty confident there must be a better way to do this, but all my attempts to find it failed. Help me Erlang Jedi:

-module(stopwatch).
-export([time_this/1, time_and_print/1]).

time_this(Fun) ->
  {StartMega, StartSec, StartMicro} = now(),
  Fun(),
  {EndMega, EndSec, EndMicro} = now(),
  (EndMega * 1000000   + EndSec   + EndMicro div 1000000) -
  (StartMega * 1000000 + StartSec + StartMicro div 1000000).

time_and_print(Fun) ->
  io:format("Timer started.~n"),
  Time = time_this(Fun),
  io:format("Time in seconds:  ~p~n", [Time]).

Finally we have the application code that glues these modules together:

-module(solution).
-export([start/1]).

start([ProcessesArg, CyclesArg]) ->
  Processes = list_to_integer(atom_to_list(ProcessesArg)),
  Cycles    = list_to_integer(atom_to_list(CyclesArg)),

  io:format( "Creating ~p processes (~p allowed)...~n",
             [Processes, erlang:system_info(process_limit)]),
  Ring = mring:build(Processes),
  io:format("Done.~n"),

  stopwatch:time_and_print(
    fun() ->
      io:format("Sending a message around the ring ~p times...~n", [Cycles]),
      Result = mring:round_and_round(Ring, Processes, Cycles),
      io:format("Done:  ~p~n", [Result])
    end
  ),

  init:stop().

You will see the Ruby solutions in this week's Ruby Quiz.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "Language Comparisons"
Send by mail Print  Save  Delicious 
Date: Friday, 03 Aug 2007 20:32

For details about this ongoing interview, please see my introductory post.

You have told us before that one of the big reasons to move to a new Ruby VM was to provide new options for optimization. Can you talk a little about the optimizations you have added to the new Ruby VM thus far and what operations will likely be faster because of them?

ko1:

OK. At first, I write about basic of YARV instruction. YARV has two type instructions. First is primitive instruction. It's as written, primitive. Ruby code can be represented in these primitive instruction. Second is instructions for optimization. It's not needed to represent Ruby scripts, but they are added for optimization. Primitive instructions doesn't include _ in their name (like putobject), and optimize instructions do (like opt_plus). This policy helps you if you want to see VM instructions. Initially, you need to read primitive instructions.

The most easy and effective optimization is Specialized Instructions. This optimization replace method call with another VM instruction, such as Fixnum#+ to opt_plus. Current Ruby's numeric calculation is slow because all operations are method call. For example, 1 + 2 means 1.+(2). But numeric operations are more lightweight than Ruby's method invocation. So method call is only overhead for numeric operation. Specialized Instructions allow the VM to skip method call overhead.

But we can't know which expression is numeric operation or not at compile time. See this expression: a = c ? 1 : [:elem], a will be Fixnum or Array at runtime.

So, we can't replace + expression with numeric operation instruction. Specialized Instruction, for example opt_plus which is replaced with + method invocation will do following code:

def opt_plus(recv, val) # simple version
   if recv.class == Fixnum && val.class == Fixnum
     if Fixnum#+ is not redefined
       return calculate "recv + val" without method call
     end
   end
   # normal method invocation
   recv.+(val)
end

Check receiver and value are Fixnum or not, and check Fixnum#+ are not redefined. After these check, calculate them without method invocation. In fact, Float#+ are also checked. There are other specialized instructions.

YARV eases to implement such instructions with VM generator. You shouldn't write bothersome code such as stack manipulation. If you write VM instruction such as opt_plus in simple VM DSL, VM generator will translate it to C code.

Specialized Instruction is very simple, but effective for simple benchmark such as fib() or tak() and some calculate bound program.

One question I thought of while reading your previous answer was: will Ruby scripts be able to access these VM instructions, if desired?

ko1:

Simple answer is "yes".

On YARV, bytecode and other information are represented as the VM::InstructionSequence class. I often use the name "ISeq" to point that class. ISeq object contains a bytecode sequence, a catch table (to retrieve exception and other global escape such as break), a local variable name table and others.

ISeq object can be dumped in Ruby's primitive objects such as Array, Hash, Fixnum and so on. In the same way, ISeq can be built with such data with primitive objects. This means that you can built YARV bytecode without YARV compiler. Of course, this feature can be used for other purpose such as ruby script obfuscation (this is like Java class file).

(BTW, I use this feature on Ruby2C compiler. It is hard to translate Ruby program to C program directly. But from YARV instruction, translation is easy. If I finished it, I want to bundle this with Ruby.)

Therefore it is hard to write ISeq dumped data. So I had prepared "lib/yasm.rb" as YARV Assembler (this is not committed on current trunk). With YASM, you can write YARV bytecode sequence on Ruby program. Note that YARV/ISeq loader doesn't have the byte code verifier. So illegal bytecode sequence is loaded, YARV/Ruby will dumps core.

If I commit lib/yasm.rb, I'll write tutorial to use that.

Does the new Ruby VM optimize tail recursive methods? If no, are there any plans to add this optimization?

ko1:

YARV doesn't support "tail recursion optimization", but supports "tail call optimization".

See this program:

class C
   def foo
     foo # (A) tail recursive call
   end
end

class D < C
   def foo
     super
   end
end

D.new.foo

Can you replace goto with (A)? (A) should call D#foo so we eliminate tail method call. Yes, we can implement this optimization with following trick.

class C
   def foo
     if search_method(:foo) == C#foo
       goto first_of_foo
     else
       foo
     end
   end
end

But we must think of inter block tail recursion or so (inter block goto is not permitted) if implement tail recursion optimization.

BTW, YARV support tail call optimization, eliminate stack frame of caller. You can call method which at tail position without consuming VM stack like scheme language. So you can use method call to loop something. You can make state transition with method call.

Note that tail call optimization has some caution. First is backtrace elimination. You can't see caller method of tail method with backtrace. Second, this optimization does not speedup method call. Tail call process is almost same as process of normal method call. At end of normal method call process, check if tail call or not. If that method call is tail call, use current method frame to setup method frame instead of pushing new stack frame.

Current Ruby 1.9 (trunk) is not enabled this optimization. If you want to try this, please re-write that option in "vm_opts.h" (OPT_TAILCALL_OPTIMIZATION) and re-compile that. I think release version of Ruby 1.9 is enabled this optimization. I need more comments of it. Please teach me if you find out some critical problem.

Can you talk a little about some optimizations you would like to add to the new Ruby VM in the future?

ko1:

In near future, I'll release AOT, Ruby to C compiler. This translator will support all Ruby specification, so it's shouldn't be silver bullet for performance.

Keeping all Ruby spec means "can't achieve high performance". If I ignore some spec, I'll be able to do more drastic optimization. So C code translated from Ruby script will be slow (of course, faster than normal interpretation).

Ruby specification is enemy for compiler/VM developer. So I want to add a "pragma" syntax to add programer's knowledge. For example, "eval is not appear in this file" or "Fixnum methods are not re-defined". These information will help compiler to do more effective optimization.

And I'm planning to implement block inlining. I think it is very effective for Ruby. An experimental, incomplete version has been made. I need more research to realize it.

BTW, I will not touch JIT compilation. I think it is not reasonable (not worth the cost of implementation). Everyone love "JIT" words, but I think it's not effective on Ruby spec.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Ruby VM Interview"
Send by mail Print  Save  Delicious 
Date: Friday, 13 Jul 2007 15:42

For details about this ongoing interview, please see my introductory post.

We've talked about threads, so let's talk a little about character encodings. This is another big change planned for Ruby's future. Matz, you have stated that you plan to add m17n (multilingualization) support to Ruby. Can you talk a little about what that change actually means for Ruby users?

Matz:

Nothing much, except for some incompatibility in string manipulation, for example, "abc"[0] will give "a" instead of 97, and string indexing will be based on character instead of byte. I guess the biggest difference is that we can officially declare we support Unicode. ;-)

Unlike Perl nor Python, Ruby's M17N is not Unicode based (Universal Character Set or USC). It's character set independent (CSI). It will handle Unicode, along with other encoding schemes such as ISO8859 or EUC-JP etc. without converting them into Unicode.

Some misunderstand our motivation. We are no Unicode haters. Rather, I'd love to use Unicode if situation allows. We hate conversion between character sets. For historical reasons, there are many variety of character sets. For example, Shift_JIS character set has at least 5 variations, which differ each other in a few characters mapping. Unfortunately, we have no way to distinguish them. Thus conversion may cause information loss. If a language provide Unicode centric text manipulation, there's no way to avoid the problem, as long as we use that language.

ko1:

On my policy, I escape from this topic :)

With String being enhanced to be encoding aware, some worry that we will need to specify an encoding for every String we make. Can you talk a little about how this will work in practice? Is there a default encoding? Can we set an encoding for the entire program?

Matz:

You can specify the encoding for Ruby scripts by the coding pragma at the head of the script. For example, if your script is in UTF-8, try specify

# coding: utf-8

that makes all strings and regex literals in the script to be specified UTF-8. You can also specify the encoding for IO reading strings via open, e.g.

open(path, "r:utf-8") do |f|
  line = f.gets
end

or by binmode (ala Perl), e.g.

f = open(path, "r")
f.binmode(":utf-8")

The default encoding is binary for ordinary IO, and locale specified encoding for STDIN. It should be allowed that encoding conversion at the time of IO reading, but the API is not fixed yet. Maybe

open(path, "r:utf-8<euc-jp")

that should read EUC-JP data then convert it into UTF-8 and return the converted string.

Can you tell us how far along the m17n code is and how much still needs to be done? Is this change expected to be in the 1.9.1 release?

Matz:

You will see M17N in 1.9.1 coming out next Christmas, unless something bad happens. I have done almost everything for character treating, but things related to code conversion (String#encode method and code conversion for IO) are still left undone.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Ruby VM Interview"
Send by mail Print  Save  Delicious 
Date: Friday, 27 Apr 2007 01:52

For details about this ongoing interview, please see my introductory post.

Let's talk a little about threading, since that's a significant change in the new VM. First, can you please explain the old threading model used in Ruby 1.8 and also the new threading model now used in Ruby 1.9?

Matz:

Old threading model is the green thread, to provide universal threading on every platform that Ruby runs. I think it was reasonable decision 14 years ago, when I started developing Ruby. Time goes by situation has changed. pthread or similar threading libraries are now available on almost every platform. Even on old platforms, pth library (a thread library which implements pthread API using setjmp etc.) can provide green thread implementation.

Koichi decided to use native thread for YARV. I honor his decision. Only regret I have is we couldn't have continuation support that used our green thread internal structure. Koichi once told me it's not impossible to implement continuation on YARV (with some restriction), so I expect to have it again in the future. Although it certainly has lower priority in 1.9 implementation.

ko1:

Matz explained old one, so I show you YARV's thread model.

As you know, YARV support native thread. It means that you can run each Ruby thread on each native thread concurrently.

It doesn't mean that every Ruby thread runs in parallel. YARV has global VM lock (global interpreter lock) which only one running Ruby thread has. This decision maybe makes us happy because we can run most of the extensions written in C without any modifications.

Why was this change made? What's wrong with green threads?

Matz:

Because green threads does not work well with libraries using native threads. For example, Ruby/Tk has made huge effort to live along with pthread.

ko1:

Ruby's green (userlevel) thread implementation was too naive to run fast. All machine stacks are copied when thread context switches. And more important point is it's not easy to re-implement green thread on YARV :)

What are the downsides to the native threads approach?

Matz:

It is pretty difficult to implement continuation. Besides that, even with native thread approach, no real concurrency can not be made due to the global interpreter lock. Koichi is going to address this issue by Multi-VM approach in the (near) future.

ko1:

Yes, it has several problems. First is Performance problem (as you know, I love to discuss about performance). Too create native thread is too pricey. So you may use thread pool or so. And current trunk (YARV) is not tuned on native thread, so I believe some unknown problems around threads.

Second problem is portability. If your environment has pthread library, but there are some difference from other pthread system in detail.

Third problem is absence of callcc (which is implemented with green thread scheme) ... for some people :)

Programming on native thread has own difficulty. For example, on MacOS X, exec() doesn't work (cause exception) if other threads are running (one of portability problem). If we find critical problems on native thread, I will make green thread version on trunk (YARV).

Are there plans to support other threading models in the future?

Matz:

Other threading model, no. Win32 threads and pthreads are enough burden for us to support. There might be other features to support parallelism in the future, for example light-weight process a la Erlang.

Koichi may have other idea(s) about supporting concurrency, such as Multi-VM since he is the expert on it.

ko1:

Parallel computing with Ruby is one of my main concern. There are some way to do it, but running Ruby threads in parallel (without Giant VM Lock) on a process is too difficult to support current C extension libraries because of their synchronization problems.

As matz say, if we have multiple VM instance on a process, these VMs can be run in parallel. I'll work on that theme in the near future (as my research topic).

BTW, I wrote on last question, if there are many many problems on native threads, I'll implement green thread. As you know, it's has some benefit against native thread (lightweight thread creation, etc). It will be lovely hack (FYI. my graduation thesis is to implement userlevel thread library on our specific SMT CPU).

... Does anyone have interest to implement it?

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Ruby VM Interview"
Send by mail Print  Save  Delicious 
Date: Monday, 16 Apr 2007 18:35

If your number one concern when working with CSV data in Ruby is raw speed, you might want to know that FasterCSV is no longer the fastest option.

There are a couple of new contenders for Ruby CSV processing including a C extension called SimpleCSV and a pure Ruby library called LightCsv. I haven't been able to test SimpleCSV locally, because I can't get it to build on my box, but users do tell me it's faster. I have run some trivial benchmarks for LightCsv though and it too is pretty quick:

$ rake benchmark
(in /Users/james/Documents/faster_csv)
time ruby -r csv -e '6.times { CSV.foreach("test/test_data.csv") { |row| } }'

real    0m5.481s
user    0m5.468s
sys     0m0.010s
time ruby -r lightcsv -e \
'6.times { LightCsv.foreach("test/test_data.csv") { |row| } }'

real    0m0.358s
user    0m0.349s
sys     0m0.008s
time ruby -r lib/faster_csv -e \
'6.times { FasterCSV.foreach("test/test_data.csv") { |row| } }'

real    0m0.742s
user    0m0.732s
sys     0m0.009s

It's important to note that LightCsv is indeed very "light." FasterCSV has grown up into a feature rich library that provides many different ways to look at your data. In contrast, LightCsv doesn't yet allow you to set column or row separators. Given that, it's only an option for vanilla CSV you just need to iterate over. If that's what you have though, and speed counts, it might just be the right choice.

For the curious, LightCsv achieves its speed advantage in two ways. First, it uses StringScanner to manage the parsing. StringScanner is a C extension, though it is a standard library installed with Ruby.

More importantly, I suspect, LightCsv uses an input buffer for reading while FasterCSV works line by line. I suspect this second difference accounts for the majority of the speed increase since the buffered code will hit the hard drive quite a bit less for the average CSV file. This does require more memory though, of course.

Aside from these differences, FasterCSV and LightCsv have very similar parsers.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "FasterCSV"
Send by mail Print  Save  Delicious 
Date: Friday, 16 Mar 2007 13:38

For details about this ongoing interview, please see my introductory post.

We started these talks because of the excitement around the alternate implementations, like JRuby and Rubinius. How do you feel about all of these new interpreters and how do you see them affecting the official development of Ruby?

Matz:

Alternate implementations mean maturity of Ruby language. I'm glad for the fact. But we have never had enough number of developers for core, so I think we need more cooperation between implementations. I had a good talk about future Ruby spec. with Charles Nutter recently. I expect occasion like this more often.

ko1:

I think having alternatives is very important. I want to know how to implement Ruby and apply these techniques to YARV.

In fact, implementing from scratch is very fun. YARV (official Ruby implementation) has many problems resulted from historical reasons (a biggest problem is compatibility to extension libraries).

Have you downloaded and installed any of the other interpreters?

Matz:

No, I just skimmed a few files from Rubinius, but not others. Mostly because I am not familiar with neither Java nor Parrot.

ko1:

I wanted to try these alternatives, but no time to do it (and no time to hack YARV ...).

Answer is: No. I'll try.

Is there a good exchange of ideas between the various implementation teams? Do you talk to the other teams, read their code, and/or discuss implementation details with them?

Matz:

Besides Koichi who works on YARV with me, Last month I met with Charles Nutter and exchanged very interesting idea about 2.0 behavior. Evan Phoenix also gave me inspiration. I am very glad to see more programmers with interest and knowledge in language implementation.

ko1:

Sometimes I talked with JRuby team on IRC. I want to discuss every Ruby implementation developers, especially performance of it.

BTW we need 3 things on this context:

  1. Documents of specification
  2. Good tests
  3. Good benchmarks

Tests: Ruby trunk and 1.8 have test suits. But it's too difficult to test with it on early stage of implementation, because test/unit use many ruby's functions (RSpec has a same problem). Now, trunk has "bootstraptest" to solve it. I think it is good solution for this problem. And it's show a minimum ruby's specification.

Benchmark tests: Some people using YARV's bnechmarks I wrote. But I didn't write these codes to measure "Ruby's general benchmark test", but to measure speed-up ratio on YARV. It's means that I wrote codes what YARV optimizes. We must prepare more suitable benchmarks for "Ruby implementations".

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Ruby VM Interview"
Send by mail Print  Save  Delicious 
Date: Saturday, 17 Feb 2007 22:59

For the contest and all you Bull Durham fans...

Well, I believe in blocks, iterators, closures, that everything should be an object, the power of reflection, garbage collection, exception handling, that multiple inheritance causes more problems than it solves. I believe interpreters should be totally free. I believe there ought to be a constitutional amendment outlawing pointers and verbose syntax. I believe in a strong standard library, green threads, that a language should trust the programmer rather than restrict his efforts and I believe in sheer fun of coding that truly is possible to achieve.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "Non-Code"
Send by mail Print  Save  Delicious 
Date: Friday, 16 Feb 2007 15:49

For details about this ongoing interview, please see my introductory post.

Hello and thank you both for agreeing to answer my questions. To begin, would you please introduce yourselves and tell us about your role in Ruby's development?

Matz:

I am the designer and the first implementer of the Ruby language. My real name is Yukihiro Matsumoto, that sounds something like You-Key-Hero Matz-Motor in English. But it's too long to remember and pronounce, so just call me Matz.

I have been developing Ruby since 1993. It is now quite complicated and has performance problem. I have had vague plan of rewriting the interpreter for long time, but I have never been motivated enough to throw out the current interpreter and start developing new one.

Then Koichi came in with YARV that seemed to have much brighter future than my vaporware - it runs - so I asked him to take a role of the official implementer of the core. Although I enjoy both designing and implementation of the language, I don't think I am gifted for language implementation. So I thought that it might be the time to focus on designing when I saw YARV.

ko1:

Thank you for your interest in YARV and me. BTW, I'm thinking what "YARV" stand for. Because it is not Yet Another. Someone proposed that "YARV ain't RubyVM". If YARV means "YARV ain't RubyVM", what is YARV?

I'm Koichi Sasada. Koichi is given name, and "ichi" means "one" in Japanese. So I use "ko1" as my nick. I'm an assistant at Department (...snip...) of Tokyo. My research interest is systems software, especially operating system, programming language, parallel systems, and so on. And I'm a member of Nihon Ruby no Kai (Ruby Association in Japan). I plan(ed) some Ruby events like RubyKaigi and am an editor of Rubyist Magazine. I also develop(ed) Nadoka, Rava, Rucheme, and some projects. Say, I'm a developer of YARV: Yet Another RubyVM.

My role in Ruby's development? To steal VM hacking pleasure from Matz?

The point of this interview is to talk about the future of Ruby's interpreter. To start that, can you please explain what YARV/Rite is? How is it different in design from the old Ruby interpreter?

Matz:

I have always been more interested in designing the language than implementing it. So Ruby interpreter is always slower than it should be. I think I pruned all low-hanging fruits, so that it seemed to required to re-implement whole core to achieve performance boost. I planned a new interpreter code-named 'Rite' in 2001 or so, but I have never motivated enough to start the project. Maybe I had been too busy, or perhaps too lazy.

Then, Koichi came in, and showed us his YARV. Many had tried implementing Ruby interpreter in the past, but no one but Koichi reached that level of implemented feature set (at the time; now we have JRuby and RubyCLR both compatible with Ruby 1.8). So I asked him to take part in the development of the new core, and he agreed.

January 1st 2007, he checked in YARV in to the trunk of our repository, so it is now official core of the Ruby 1.9. I am still working on old implementation in matzruby branch. Since it is easier for me to experiment new language features on the old interpreter, but I will eventually switch to the new engine.

For YARV implementation detail, Koichi will explain.

Does this mean we are leaving the name Rite behind and keeping YARV? Or will YARV be renamed at some point?

The name Rite will not be used for this generation of the language, unless Koichi ask me. I am not sure Koichi is going to keep YARV, or not, since it already 'the VM' for Ruby.

ko1:

YARV is vanished :)

In fact, I'm removing "yarv" words from structure names, function names, and file names. YARV is only code name that not made by *Matz*. Now, YARV is not "Yet Another". In this article, I use "YARV" words as current Ruby trunk on official repository.

At first, YARV is simple stack machine which run pseudo sequential instructions. Old interpreter (matzruby) *traverses* abstract syntax tree (AST) naively. Obviously it's slow. YARV compile that AST to YARV bytecode and run it.

Secondly, YARV uses native thread (that supported by OS or so) to implement Ruby thread. It means that you can run *blocking* task in extension libraries. (On Ruby's spec, blocking task should be interrupted by Thread#raise. To know details, see [ruby-core:10252].) Because thread creation is slower than matzruby (green thread), you shouldn't make many threads at a time. Supporting native thread *does not* means that you can run Ruby scripts in *parallel* on parallel machine such as Multi-Core CPUs. Current implementation uses Giant VM Lock to avoid synchronization problems. (Many extension libraries doesn't care thread safety. See array.c, string.c, etc.)

Thirdly, I made many optimization like specialized instructions, etc. These features are my purpose of developing YARV. Toy benchmarks run fast because of these optimization techniques.

YARV doesn't change parser/syntax/specs (matz' hobbytask), GC (memory/object management), and extension libraries like String/Array/Hash/Regexp/etc. Therefore your script doesn't run fast on YARV if bottleneck is string processing, or so.

Congratulations to you both for completing Ruby/YARV merger recently. That must have been a lot of work, but I know it has the whole Ruby world very excited. Now that the merger has taken place, how do you see this changing the way Ruby is developed?

Matz:

Congrats should go to Koichi who has done a lot of work. I am moving my developing from matzruby (a branch for my old interpreter) to trunk (the yarv). Recently I have implemented some new features on the trunk, for example, class local instance variables and new local variable scope. The transition will complete pretty soon.

Since the trunk is originally Koichi's work, I need more help from others especially from Koichi than before. I know everything about the previous interpreter (well, most of them), but there are still mysteries in the new one. I am well satisfied with new one. It's clearer, well-formed, and faster.

ko1:

Thank you. I'm a newbie of Ruby developer (in fact, I didn't have CVS account to commit any ruby codes). So I can't say how change on ruby development :)

When will the first production release of Ruby running on YARV by available for all Rubyists to play with?

Matz:

Short answer: now.

Longer answer: the YARV is already publicly avaiblabe via our Subversion repository. You can fetch and play with it now. But the first public "release" from us will be Christmas 2007, if we are as diligent as we should be. Knowing how lazy I am, I will try not to be a stumbling block for the release. ;-)

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Ruby VM Interview"
Send by mail Print  Save  Delicious 
Date: Friday, 16 Feb 2007 15:13

I have really enjoyed reading Pat Eyler's Rubinius Serial Interview and Nick Sieger's spun-off JRuby Serial Interview. It's very educational to read what the developers have to say about their projects and ideas.

The more I read though, the more I wanted the equivalent content for the official Ruby VM. I asked Matz and Koichi if they would be willing to answer questions from me and they agreed to do so. We are now ready to share their responses with the community.

This will be a serial interview as Pat Eyler calls them. We will deliver regular episodes until I run out of good questions or Matz and Koichi get sick of me bothering them, whichever comes first. I will ask the questions in the interview, but feel free to make suggestions in the comments to this article.

One last note: we are not promising any kind of schedule for the episodes. Matz and Koichi are heroically providing their answers in English. We want to respect how much work that is and give them all the time they need to do that. Personally, I cannot thank them enough.

With that, I give you the episode index:

  1. In this first episode I ask Matz and Koichi to introduce themselves and their roles as well as to give us an update on where we are with the Ruby VM.
  2. In the second episode I ask Matz and Koichi about for their thoughts on the alternate Ruby implementations and how they see them changing Ruby's development.
  3. In the third episode Matz and Koichi discuss the past, present and future of Ruby threading.
  4. In the fourth episode Matz tells us a little about how m17n is shaping up.
  5. In the fifth episode Koichi gives us the inside story on optimization in the new VM.
Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Ruby VM Interview"
Send by mail Print  Save  Delicious 
Date: Tuesday, 12 Dec 2006 03:30

Required Reading: You need to know what the Gateway is and the rules for suggesting changes before reading this article.

I have written up details on the second half of the Gateway, as promised. This completes my explanation of the Gateway source.

Any comments can be left under this post.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Gateway"
Send by mail Print  Save  Delicious 
Date: Tuesday, 05 Dec 2006 02:22

Required Reading: You need to know what the Gateway is and the rules for suggesting changes before reading this article.

I have written up details on the first half of the Gateway, as promised. I will add the other half when I have time, but I started with the side people seem most interested in hacking on.

Any comments can be left under this post.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Gateway"
Send by mail Print  Save  Delicious 
Date: Monday, 04 Dec 2006 22:48

Though I rewrote the current Gateway and I handle the maintenance, it really belongs to the Ruby community. Because of that, I'm going to release the two primary source files on this blog for all to view and critique. This may have value to those who want to know how the Gateway works, those who would like to implement similar technologies, and those who would like to purpose changes to the Gateway code.

I do welcome purposed changes to the Gateway, but let's set some ground rules for the right way to make suggestions:

  • I will show the important elements of the Gateway code and do my best to explain it as I go. In return, please take the time to read what I write about the code and try to understand how it works. Poorly developed change requests increase my maintenance time with the Gateway, which all comes out of my free time, so please be considerate.
  • You purpose changes to the Gateway by commenting on the code articles. This is intended to be a public discussion with all of us working together. Don't email me or Ruby Talk ideas, I'm monitoring the comments here.
  • Show code in your requests. I don't want to throw the Gateway in a publicly accessible Subversion repository and start taking patches for several reasons. If you want a change, convince me to implement it. The best way to do that is to throw around some code showing me how we would build your request and how it would make the Gateway better.
  • I am thinking about some elements of the Gateway you are not, like the fact that I run this code on a server provided by my work where security is a consideration and the level of maintenance a change will inflict on me. I ask only that you keep this in mind as we debate changes. In return, I will be as open minded to improvements as possible.
  • Gateway changes will not happen overnight. (See note about free time above.) Please be patient.
Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Gateway"
Send by mail Print  Save  Delicious 
Date: Friday, 01 Dec 2006 14:45

The Ruby community makes use of both email and Usenet communication, in addition to other resources. The primary mailing list is Ruby Talk and the primary Usenet group is comp.lang.ruby. These two services are joined by the Ruby Gateway.

In 2001 The Pragmatic Programmers wrote the initial version of the Ruby Gateway to ferry messages back and forth between these two resources. Emails sent to Ruby Talk are posted as Usenet messages and Usenet posts are forwarded to Ruby Talk by the Gateway. The Gateway has had a few guardians and code changes since then, but the functionality remains the same.

I'm am the current caretaker of the Ruby Gateway. My company generously provides hosting for it and I monitor the system for problems. I also wrote the current version of the Gateway.

You are free to report Gateway problems for me to look into. Before you do though, please read the following notes:

  • I rewrote the entire Gateway after I assumed control of it. My code was deployed on December 4th, 2006, so anything before that is history. If you raise issues, please make sure they involve posts after that date.
  • I now have very detailed logs on everything the Gateway does, so please be specific. For example, please send me links to exact messages that appeared on one side of the Gateway, but not the other.
  • Our Usenet host does not allow us to post HTML emails (multipart/alternative). This is not changing. These messages are not supported. (Yes, that means you should not be sending HTML email to Ruby Talk.)
  • Our Gateway is an NNTP <-> email Gateway. There has been at least one instance of a Usenet post using ancient header formatting predating NNTP. These messages are not supported.
Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Gateway"
Send by mail Print  Save  Delicious 
Date: Tuesday, 07 Nov 2006 03:57

I'm not really in the habit of putting non-code content on this blog, but more than one person asked me the same question at RubyConf. If people really want to know, I'll try to answer. Paraphrasing, the question was:

How do you keep up with so many Ruby projects?

First, this question surprised me. Do I really do that much? If you just said yes to that, I would like to introduce you to Ryan Davis. He easily doubles my output and his projects are wicked complex compared to mine.

That doesn't answer the question though.

In short, I do as much as I possibly can with the time I have. The truth is that I would like to do a lot more. I turn down at least as many damn cool Ruby projects as I accept because I'm a wimp and not willing to give up my sleep. There are so many crazy cool Ruby projects out there that I would love to be a part of. There just aren't enough hours in the day.

I guess I still didn't answer the question.

The question was "How…" and the answer to that is actually trivial. Masayoshi Takahashi summed it up with a single slide in his presentation at RubyConf:

Passion Matters

Definitely.

If we were talking about working in a coal mine for sixteen hours a day, I would probably be a lot less capable. It just so happens that I love what I do. If anything gives me extra energy to devote to Ruby projects it's that. Love what you do. I can't stress that enough.

Beyond that, don't hesitate to get involved! I can't tell you how often I see people say things like, "I'm not really qualified to do that," or similar excuses. Oh hell, neither am I, but I wouldn't let a little thing like that stop me! You learn as you go, you drag in the help you need, or whatever. Passion will conquer so care enough to have some. Be the driving force and the rest will take care of itself.

That's all the advice I have to give, I fear. My secret formula probably seems bland, but if I accomplish the work of ten men (someone said that to me a while back), the reason is that I'm passionate and fearless.

Oh, and I have the most understanding wife on this planet. Get one of those too.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "Non-Code"
Send by mail Print  Save  Delicious 
Date: Sunday, 01 Oct 2006 21:57

Every so often a person asks the question on Ruby Talk, "How can I get just one character from the keyboard (without needing the user to hit return)?" Everyone is always quick to post solutions, but sadly there are some issues with almost every one of them.

The general consensus is that this is a tough problem to solve correctly. I say that's the exact reason to let HighLine handle this for you:

#!/usr/bin/env ruby -w

require "highline/system_extensions"
include HighLine::SystemExtensions

print "Enter one character:  "
char = get_character
puts char.chr

That doesn't look too tough, does it?

What's terrific about this solution is that under-the-hood HighLine will check your platform and libraries and then try to use the solution that makes the most sense for your environment. The code is really pretty robust too, because people a lot smarter than me have been sending in patches for over a year, slowly eliminating all of those tricky edge cases.

As you can see, I've split this functionality of HighLine into a separate module so you don't even need to load the full HighLine system. This was done just because this is such a real and common problem. This section of HighLine is one pure Ruby file, so feel free to vendor it if the external dependancy is an issue.

Trust me, reading individual characters from the keyboard doesn't have to be that tough. You just need the right tool for the job.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "Ruby Tutorials"
Send by mail Print  Save  Delicious 
Date: Wednesday, 02 Aug 2006 21:34

Just recently I have been working with two different people to improve their regular expression skills. To help me in this endeavor, I built a trivial little script we have been using in IRb. To get started, you construct a new challenge object and add a couple of challenges:

>> reg_chal = RegexpChallenge.new
No challenges.
=> 
>> reg_chal.challenge("Gray, James", "James", "The names can vary.")
=> nil
>> reg_chal.challenge("abbbbbbbc bc", 10)
=> nil
>> reg_chal.challenge("    \n\t  ", nil, "We want to test for non-space data.")
=> nil
>> reg_chal.challenge("cogs 9, widgets 12, ...", "12", "The numbers can vary.")
=> nil
>> reg_chal.challenge( "I'm a simple sentence, with words.",
?>                     %w[I'm a simple sentence with words] )
=> nil

You can ask for challenges to see what you would like to solve:

>> reg_chal.challenges
Challenge #0:
   Input:  "Gray, James"
  Output:  "James"
    Note:  "The names can vary."
Challenge #1:
   Input:  "abbbbbbbc bc"
  Output:  10
Challenge #2:
   Input:  "    \n\t  "
  Output:  nil
    Note:  "We want to test for non-space data."
Challenge #3:
   Input:  "cogs 9, widgets 12, ..."
  Output:  "12"
    Note:  "The numbers can vary."
Challenge #4:
   Input:  "I'm a simple sentence, with words."
  Output:  ["I'm", "a", "simple", "sentence", "with", "words"]
=> nil

Finally, you attempt solutions by giving the system the index of the challenge, a method name to call on the input String, a Regexp to pass, and any other needed parameters:

>> reg_chal.solve(0, :=~, /, \w+/)
That is not a valid solution.
Expected output:  "James"
    Your output:  4
=> nil
>> reg_chal.solve(0, :[], /\w+$/)
Correct.  Nice job.
=> nil
>> reg_chal.solutions
Solution #0:
     Input:  "Gray, James"
    Output:  "James"
      Note:  "The names can vary."
  Solution:  [](/\w+$/)
=> nil

After you have played with it for a while you will probably build up some solutions and challenges. You can use the save() method to dump those to a YAML file and even load() them back later, if needed.

Finally, for some real fun, the challenger supports sharing the challenges over a network with another user. The host should just change the initial construction call to:

reg_chal = RegexpChallenge.host

And then another user can get a copy of the challenge object with:

reg_chal = RegexpChallenge.join("HOST IP ADDRESS HERE")

If that sounds like something you would like to play with, here's the actual code for the library you load into IRb.

Feel free to post challenges for people to try in the comments of this post.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "Early Steps, Ruby Tutorials"
Send by mail Print  Save  Delicious 
Date: Sunday, 30 Jul 2006 04:12

I love the PStore standard library. It's a very graceful interface to get some fairly robust serialized mini-database handling in just a few lines. With it you get:

  1. Transactions with commit and rollbacks (automatic on exception).
  2. File locking, shared and exclusive.
  3. Multiprocessing safety.

PStore does even more, including some file mode checking and MD5 hashing to avoid unneeded writes, but the above are the major selling points for me.

Now, if I had to level any one complaint at PStore, it would be that because it uses Marshal under the hood it doesn't create files you can easily browse or tweak by hand. (Marshal is a feature, don't get me wrong. It's fast, which is very helpful.) Sometimes though I want PStore protection with the YAML file format.

I'm embarrassed to admit that I use to use a hack for this:

require "pstore"
require "yaml"
class PStore; Marshal = YAML; end

That just redefines the Marshal constant in a scope that should only alter PStore. The library only uses dump() and load() and those methods work the same with Marshal and YAML.

Ready for the punch-line?

I learned today that my fragile hack has been in vain, no matter how clever it may be. YAML ships with a file that will load and modify PStore for you. Usage is as simple as:

require "yaml/store"

From there just replace your PStore.new() calls with YAML::Store.new() and you're in business. YAML::Store is a subclass of PStore, so you won't need to change one bit of the API to get PStore robustness with YAML output.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Standard Library"
Send by mail Print  Save  Delicious 
Date: Saturday, 29 Jul 2006 16:57

I participated in the ICFP programming contest last weekend with a group of friends. We had a great time with the event and learned a ton. I thought I should share two interesting insights with others that might appreciate them.

First, YARV looks very promising for some general speed increases in Ruby. If you are not familiar with YARV, that's the virtual machine that will run Ruby 2.0. During the contest, we ran into some performance issues with our Ruby solution and after we had optimized all we could think of, we decided to try running our entry on the experimental YARV VM to see if it was faster there. Good news: it was a lot faster.

Please do not take these numbers as anything more than very non-scientific observations, but we did notice a huge speed increase on YARV. We were reliably waiting around 15 minutes for one section of our program to run on Ruby 1.8.4, but when we introduced YARV the same section generally ran in just under seven minutes. You heard me right there, it was over twice as fast. I think that's very promising news for the future of Ruby.

The not so good news is that it still just wasn't fast enough.

Of course we want to be able to use Ruby for as much as possible, but it is important to admit that it's just not fit for every job. The programming contest involved the creation of a small VM that ran many, many instructions from contest provided data files. In order to get that to a reasonable level of performance, you really needed some C.

The good news is that Ruby will easily allow you to drop down to C and integrate that code with your script. The bad news is that James's C is so rusty, that was a nightmare. Thank goodness one of my partners was more capable. He certainly carried us through.

I don't need C very often any more. I think it has literally been about a year since I last felt the need. However, there are jobs Ruby is a bit too slow to handle and when they come up, C is your best friend. I'm definitely brushing up on my C skills before next year's contest.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "Language Comparisons"
Send by mail Print  Save  Delicious 
Date: Sunday, 16 Jul 2006 21:13

Ask anyone who knows me and they will tell you I'm a huge fan of regular expressions. I use them all the time and my FasterCSV library is a regular expression powered parser. However, even I know they are not for everything, and lately I keep running into almost comical examples of misuse. Here are some of my favorites.

First, we have:

str =~ /=/

That snippet is like calling for a military escort (the regular expression engine) to see you safely to the grocery store down the block. That's fun, but probably overkill. In this case, a call to include?() will do the trick:

str.include?("=")

That may be more like riding you bike to the grocery store, but it gets the job done and is a bit faster to boot.

Funny example number two. I've seen this before:

str =~ /\Aquit\Z/

Again, the regular expression engine appreciates the love, but you really just want ==:

str == "quit"

Even for some of the fancier stuff, you don't need a full blown regular expression. For example, this:

str.sub(/\D+/, "")

could be:

str.delete("^0-9")  # no, that's not a regex

You get the idea. I'm not trying to ban regular expressions. Instead, I just think people should remember there are a lot of other methods in String. When you have a few moments, look up these guys in the documentation:

  • String#[]= # can take a regex, but has other fun options
  • String#count
  • String#index # can take a regex
  • String#squeeze
  • String#rindex # can take a regex
Author: "james@grayproductions.net (James Edward Gray II)" Tags: "Ruby Tutorials, Early Steps"
Send by mail Print  Save  Delicious 
Date: Saturday, 08 Jul 2006 17:01

I've read several books that introduced the standard Logger library and they all agree on one thing: you can't customize the output. That's so last version in thinking! Behold...

Here's a trivial Logger script, showing basic functionality:

#!/usr/bin/env ruby -w

require "logger"

def expensive_error_report
  sleep 3  # Heavy Computation Simulation (patent pending)
  "YOU BROKE IT!"
end

log       = Logger.new(STDOUT)
log.level = Logger::INFO  # set out output level above the DEBUG default

log.debug("We're not in the verbose degub mode.")
log.info("We do see informative logs though.")
if log.error?  # check that this will be printed, before waste time
  log.error(expensive_error_report)
end

If you run that you will see:

I, [2006-07-08T11:17:19.531943 #340]  INFO -- : We do see informative logs though.
E, [2006-07-08T11:17:22.532424 #340] ERROR -- : YOU BROKE IT!

Now everyone has always known you can format the date and time display using a strftime() compatible pattern:

#!/usr/bin/env ruby -w

require "logger"

def expensive_error_report
  sleep 3
  "YOU BROKE IT!"
end

log                 = Logger.new(STDOUT)
log.level           = Logger::INFO
log.datetime_format = "%Y-%m-%d %H:%M "  # simplify time output

log.debug("We're not in the verbose degub mode.")
log.info("We do see informative logs though.")
if log.error?
  log.error(expensive_error_report)
end

Which gives us the slightly easier to read:

I, [2006-07-08 11:23 #384]  INFO -- : We do see informative logs though.
E, [2006-07-08 11:23 #384] ERROR -- : YOU BROKE IT!

All books I've read to date though tell you that's the end of the customization line. But, if you are using Ruby 1.8.4 or higher, it's no longer true:

#!/usr/bin/env ruby -w

require "logger"

# Build a Logger::Formatter subclass.
class PrettyErrors < Logger::Formatter
  # Provide a call() method that returns the formatted message.
  def call(severity, time, program_name, message)
    if severity == "ERROR"
      datetime      = time.strftime("%Y-%m-%d %H:%M")
      print_message = "!!! #{String(message)} (#{datetime}) !!!"
      border        = "!" * print_message.length
      [border, print_message, border].join("\n") + "\n"
    else
      super
    end
  end
end

def expensive_error_report
  sleep 3
  "YOU BROKE IT!"
end

log           = Logger.new(STDOUT)
log.level     = Logger::INFO
log.formatter = PrettyErrors.new  # Install custom formatter!

log.debug("We're not in the verbose degub mode.")
log.info("We do see informative logs though.")
if log.error?
  log.error(expensive_error_report)
end

Here's what that code produces:

I, [2006-07-08T12:20:31.325112 #521]  INFO -- : We do see informative logs though.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! YOU BROKE IT! (2006-07-08 12:20) !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Don't believe everything you read.

Author: "james@grayproductions.net (James Edward Gray II)" Tags: "The Standard Library"
Send by mail Print  Save  Delicious 
Next page
» You can also retrieve older items : Read
» © All content and copyrights belong to their respective authors.«
» © FeedShow - Online RSS Feeds Reader