.…continuing my posts on Ruby and SQLite as a micro ETL environment.
Like many people my first introduction to Ruby was via Ruby On Rails and like others, RoR introduced me to a best of breed approach to developing scalable MVC based web sites. But as building scalable web sites is not what I do, the real take-away from Rails was the language Ruby.
My major skill is as a ‘hacker and hauler’ of data, what I call data smithing. In the past I would have used a combination of readily available ‘corporate tools’; a SQL database (usually Oracle/Access), procedural languages (usually PL/SQL, VBA and various Unix scripting shells (bash, awk, Perl)) and last but not least Excel. Usually ‘hacking’ away at the database level, occasionally using APIs but even those APIs tending to match the underlying data schema. But now with the explosion in web service based APIs, a new tool is required. This is where , at least for me, Ruby comes in.
Ruby is an open source, multi-paradigm, interpreted and increasing popular language, and all four attributes combined to make Ruby a natural choice for inclusion in my data smithing tool-bag.
Ruby can be programmed as if it was a shell scripting language; one procedural command followed by another (ideal for quick scripts) or used as most of use use VBA; a scripting language to manipulate existing sophisticated and professional object libraries or using its powerful OOP capabilities the language can be (and is ) used to develop robust and easy to use object libraries.
Data analysis and the associated ETL process is, at least in its discovery phase, a ‘hacking environment’, i.e. a lot of suck-it-and-see, both the inputs (raw data) and the outputs (reports and dashboards) tend to be fixed outside the control of the data analyst, in such situations a forgiving and flexible development language is essential. VBA provides that flexibility and so does interpreted Ruby.
Ruby’s popularity has increased over the last two years attracting enthusiastic and bright programmers to play with it and to produce some really useful code. If a new service appear, such as Amazon’s S3, you can be guaranteed within a short period several high quality helpful Ruby libraries will also appear.