…continuing my posts on Ruby and SQLite as a micro ETL environment.
I’ve written before that my most important take-away from Ruby On Rails was the language Ruby, but RoR also introduced me to SQLite. Although its typical use in Rails is as a development database I quickly realised this open source ,very fast and zero-configuration SQL database has a role to play in the real world of data analytics; a cross-platform and free alternative to the MS Access Jet database engine. Like MS Access, SQLite is not client-server based, so no separate services or jobs required on the host machine; a full database is held in a single file, making it easy to transfer from one machine to another.
Will SQLite replace MySQL, Oracle, SQLServer or indeed the venerable MS Access? No, but as an embedded database or as a light-weight local data-store it has many advantages:
- SQLite supports most SQL92 syntax, including database triggers, makes skipping back and forth between MySQL, Oracle and SQLite much easier.
- SQLite runs on all major OS and programming language platforms, e.g. a database populated on Windows by VBA can be read or modified under Linux by a Ruby program.
- It’s free, in future the Jet engine will be no longer be distributed with the latest Microsoft Data Access Components, you’ll have to purchase Access 2007.
- SQLite supports in memory tables, making it possible to create a local “ROLAP star” with the fact table file-based and the dimension tables loaded within memory.
- Several free GUI administration tools.
- SQLite does not enforce data type constraints. Any data can be inserted into any column.
Now, the last point may seem more like a bug than a bonus, but in the world of ETL, dirty data is a fact of life; those working at this end of the pipe must accept what’s thrown at them. A forgiving weakly-typed data store (like a late-binding and weakly-typed scripting language) can be just what the doctor ordered in such an environment. In my early career as an OLTP developer I would have baulked at the lack of strong-typing but now I side with Bruce Eckel’s essay on strong-typing versus strong-testing , productivity is what matters and the ability to test is the most important skill that a developer should possess.
See also SQLite as the MP3 of data