RoR Data Warehouse on EC2

If you’ve been putting off evaluating Ruby on Rails and you’re lucky enough to have an Amazon EC2 beta account then it’s your lucky day. Paul Dowman has just made a public AMI (think of it like a virtual machine spec from which you can create a running EC2 instance) with various Ruby on Rails goodies preloaded.

Features:

  • Automatic backup of MySQL database to S3 every 10 minutes.
  • Mongrel_cluster behind Apache 2.2, configured according to Coda Hale’s excellent guide, with /etc/init.d startup script
  • Ruby on Rails 1.2.3
  • Ruby 1.8.5
  • MySQL 5
  • Ubuntu 7.04 Feisty with Xen versions of standard libs (libc6-xen package).
  • All EC2 command-line tools installed
  • MySQL and Apache configured to write logs to /mnt/log so you don’t fill up EC2’s small root filesystem
  • Hostname set correctly to public hostname
  • NTP
  • A script to re-bundle, save and register your own copy of this image in one step (if you want to).

I’ve been meaning to try out Anthony Eden’s RoR based data warehousing tool for some time; no more excuses as I now can fire up an EC2 instance based on Paul’s AMI , install the ActiveWarehouse plugin and away I go. As ActiveWarehouse primarily uses techniques described in The Data Warehouse Toolkit it’s also a good learning tool for those new to data warehousing. All I need now is a sizeable publicly accessible dataset to populate the warehouse to get a true fell for its capabilities. There’s only so much you can do with the venerable Northwind database. Does anybody know of a ‘beefier’ alternative?

Advertisements

3 responses to “RoR Data Warehouse on EC2

  1. if you need a big dataset, you should get weekly splog dumps at http://splogspot.com/

  2. Thanks Kai for the suggestion, however it looks like SplogSpot are not publishing the dumps any longer; the http://splogspot.com/dump/ folder holds only a test.txt file.

    Tom

  3. Pingback: Zimki - the spirt lives on … « Gobán Saor