Although my data-smithing tool box is full to the brim with powerful tools such as Talend, Kettle PDI, Picalo and Excel, all backed by the cloud infrastructure of Amazon’s S3, SImpleDB and EC2, there’s one simple yet powerful tool that I always seem to gravitate back to, that tool is SQLite.
Now obviously being a hewer of data, I need a SQL compliant database for data manipulation and SQLite performs that task with speed and ease. But it’s not just in the hewing, it’s in the hauling of data where SQLite also shines.
I use SQLite as the container for passing tabular datasets between (and within) my various tools, that data doesn’t even need to be clean (due to SQlite’s liberal manifest typing rules) just so long as it can be expressed as a table.
For example; a Talend job could store an extracted dataset in a SQLite file, pass that file on to a Python script for some special processing (for example extracting further data from a source not directly supported by Talend such as SAP or SimpleDB), and then pass the resulting SQLite database on to Excel or a similar tool to allow a business user to view and perhaps modify the data; finally Talend picking up the file again to load it into a corporate data warehouse.
Now you could use flat files to transport the data or store the intermediate results in a corporate database, but SQLite is as easy, if not easier than, flat files and offers the SQL processing capabilities of big-iron databases, but without the hassle of getting write access to an existing server or setting one up from scratch.
And I know there are other similar file based database offerings such as MS Access and the Java only HSQLDB, but neither match SQLite’s ubiquitousness, sheer simplicity and powerful data processing ability.