I use the term “micro ETL” a lot when writing about tools such as HAMMER, but what do I mean by the term?
The ETL bit is easy to explain:
ETL, as all you data-warehousing and business intelligence folks will know, is the Extracting, Transformation and Loading of data from source systems into a reporting/data warehousing system. The techniques of ETL are not unique to the DW/BI worlds but are used anywhere transfers of data are needed between one computer system and another, for example, master data take-on for new systems or transactional interfaces between front and back-office systems – this is often referred to as DI (data integration) but is essentially the same problem domain.
So what’s the “micro” bit about?
You might assume that the micro adjective implies small or indeed tiny datasets, and in many cases you would be correct. Most final-mile data analysis, like politics, is local. Most business decisions along with their implementation and monitoring require ‘localised’ data. That data will be pre-filtered and summarised to some degree, but a fair degree of data shaping will still happen close to the decision makers. Excel is often the tool of choice when data gets to this stage.
HAMMER is optimised for this world, it sees the world how Excel sees it, but also adds the power of SQL and scripting languages to pick up where Excel stops. But enabling better Excel based data shaping is not HAMMER’s only function. It can operate outside of Excel (HAMMER.exe) and it can be used to craft task-specific ETL tools (HAMMER Inside). In both cases I continue to use Excel as my IDE, teasing out a problem before fixing it in code or in an external HAMMER call; and I can also use Excel as the UI for the end products.
In such scenarios, micro applies not so much to the datasets (which can be anything from tiny to very large) but to the concept of deploying simple micro “fractional horsepower” data engines to solve complex ETL, DI or RSS (Really SImple Systems) requirements.
HAMMER is built to take advantage of the distributed grid of powerful data crunchers (be that PCs, laptops, in-house cheap servers or just-in-time pay-as-you-go cloud-based CPUs) that every business, big or small, can now call on.
This revolution in distributed power is similar to what happened with the deployment of fractional horsepower AC-powered electrical motors in the last century. No longer was manufacturing restricted to “dark satanic mills” which had to be built close to natural power sources (water and later coal seams); and had to conform to the multi-story classic mill design to harness that captured power through belts, pulleys and shafts. With the expansion of the AC power grids (and the parallel expansion of internal combustion engine carrying roadways) the factory began to take on its modern single-story (or single story with mezzanine) distributed profile than can be seen everywhere from China to Cork. A similar landscape change is happening in IT.
HAMMER can take advantage of the “distributed engines” easily enough but the workflow, the actual control and distribution of tasks, data and decisions requires the ad hoc implementation of either steam-powered or classic centralised server processes. I badly needed a more pre-built modular approach, micro Workflow to complement micro ETL (and micro BI via PowerPivot ?), if you like. Last week I had started to think seriously about how/what to do about this (JSDB powered grids were featuring high on the list) when this appeared.
Perfect timing, Amazon’s SWF (Simple Workflow service) is exactly what I need!
SWF allows for the control and distributed deployment of stateless data processors. HAMMER was designed primarily as a stateless data processor (with state being persisted either in Excel or on disk as simple CSV/JSON flat files). Its default use of in-memory, rather than disk-based, SQLite assumes both abundant CPU and RAM (like is the case with your average 64bit laptop) and the existence of an external state-machine (which Excel and now SWF provide).
I’ve spent any spare time I had this week doing a deep dive into SWF and figuring out how HAMMER can take full advantage of this technology, not just for classic ETL, but for distributed decision control processes and RSS solutions. The result, in Dublin slang, is that I’m both “delira and excira” (delighted and excited). This is, to use that term again, yet another AWS game- changer.