Why Ruby?

.…continuing my posts on Ruby and SQLite as a micro ETL environment.

Like many people my first introduction to Ruby was via Ruby On Rails and like others, RoR introduced me to a best of breed approach to developing scalable MVC based web sites. But as building scalable web sites is not what I do, the real take-away from Rails was the language Ruby.

My major skill is as a ‘hacker and hauler’ of data, what I call data smithing. In the past I would have used a combination of readily available ‘corporate tools’; a SQL database (usually Oracle/Access), procedural languages (usually PL/SQL, VBA and various Unix scripting shells (bash, awk, Perl)) and last but not least Excel. Usually ‘hacking’ away at the database level, occasionally using APIs but even those APIs tending to match the underlying data schema. But now with the explosion in web service based APIs, a new tool is required. This is where , at least for me, Ruby comes in.

Ruby is an open source, multi-paradigm, interpreted and increasing popular language, and all four attributes combined to make Ruby a natural choice for inclusion in my data smithing tool-bag.

Open Source:

Being open source, the language has been ported to all major OSs including Windows, has a vast array of 3rd party libraries (many ported from Perl) and very powerful supporting tool-sets such as a Rake and Gem. There is also a tool RubyScript2Exe which can transform a Ruby application into a standalone, compressed Windows, Linux or Mac OS X executable, i.e. no need to have Ruby installed on the host machine. Ruby runs not just under its own VM, but has also been ported to the .NET (Ruby.NET) and Java (JRuby) VMs and indeed with the support for embedded scripting languages within Java 5 (starting with Javascript) this trend will continue.

Multi-paradigm:

Ruby can be programmed as if it was a shell scripting language; one procedural command followed by another (ideal for quick scripts) or used as most of use use VBA; a scripting language to manipulate existing sophisticated and professional object libraries or using its powerful OOP capabilities the language can be (and is ) used to develop robust and easy to use object libraries.

Interpreted:

Data analysis and the associated ETL process is, at least in its discovery phase, a ‘hacking environment’, i.e. a lot of suck-it-and-see, both the inputs (raw data) and the outputs (reports and dashboards) tend to be fixed outside the control of the data analyst, in such situations a forgiving and flexible development language is essential. VBA provides that flexibility and so does interpreted Ruby.

Popular:

Ruby’s popularity has increased over the last two years attracting enthusiastic and bright programmers to play with it and to produce some really useful code. If a new service appear, such as Amazon’s S3, you can be guaranteed within a short period several high quality helpful Ruby libraries will also appear.

My next post I’ll explain why I like SQLite….

Advertisements

3 responses to “Why Ruby?

  1. I’m not sure did you saw this:
    http://activewarehouse.rubyforge.org/

    …otherwise, some nice writings here… 🙂

  2. Nikola,

    I’d come across it before but the GEM install wasn’t working the last time I tried it, must give it a try-out again. Have you used it?

    I’ve also recently rediscovered http://raa.ruby-lang.org/project/saprfc/
    a ruby interface to SAP data; that’s the great thing about Ruby, it attracts bright and enthusiastic programmers. I find all I have to do is think “wouldn’t it be nice if such and such was supported in Ruby”, then have a look again in 3-6 months and lo and behold it is!

  3. Pingback: Ruby plus Amazon S3 - Document Centric Database « Gobán Saor