So, why am I developing this new datasmithing tool when I already have microETL to do the same task?
To be honest, when I started, my aim was to simply port a lot of my useful framework code to .NET as I’m doing more and more work in pure .NET these days.
It’s a lot easier to do so with a project driving the work rather than the drudgery of line by line conversions simply for the sake of conversion.
But over the last few months of investigating the process of moving some of more useful code over to .NET; discovering C#-SQLite as a near drop-in replacement for SQLite; realising that IronPython 2.7 marks the coming-of-age of Python under .NET; discovering the ease of multi-threading programming under Net 4.o; I began to see the potential of building a tool more suited to this new emerging world of technological plenty; of 64bit, multi-core high-RAM machines playing host to ground-breaking tools such as PowerPivot, than the 32bit, single-threaded world that microETL was born into.
I wanted to bring microETL back to its roots as an in-cell function. To make it sit more easily with the spreadsheet functional style of programming, a style of programming that has been more successful at attracting “civilians” to programming that any other method. To make the tool more in tune with the Excel-way of doing things.
At the same time, I also wanted the tool to be “detachable” from Excel, so that it could perform ETL functions server-side without the need for Excel. Ideally capable of being run from the command-line from a single no-install-required executable.
And so, the Datasmith’s HAMMER was born.
So where does this fit in with PowerPivot? When I design something like HAMMER I normally craft it around a serious of “use cases”. Below would be a one of my PowerPivot & HAMMER use cases:
The Data Quality Team
The sales & marketing department of a wholesaling company have discovered PowerPivot and are excited about using it to help manage an upcoming merger with a recently purchased competitor, also in the wholesaling business. There’s a large overlap between customers and products supplied, and the sales structures, pricing and KPIs used by both companies are very different. The transition phase from two separate companies to a single operating unit will need detailed planning and constant monitoring. Oh, and there’s also the problem of the brand new ERP system that’ll replace the existing systems currently in use.
The group can see how PowerPivot models will help with this task but are worried about the sourcing and managing the data. They decide to appoint a small data quality team, with a mixture of IT and business experience; which will be responsible for mapping old to old,old to new, managing the data deltas (business will still goes on, new customers, new products etc.).
Most of this work revolves around “master data”. Transactional data may be much larger in volume, but get the master data right then transactional data will be easy to handle as long as the processing capacity to handle the volumes is available (which thanks to PowerPivot and really fast high-RAM PCs it is).
In golf, there’s a saying “You drive for show, but you putt for dough”. Likewise in data: a lot of emphasis is put on the big-data transactional datastores, but real analytical benefit comes from the less sexier master-data, aka the dimensions. As a result the most important output from the data quality team will be a set of conformed dimensions.
Each week (and at month-end) the team will get extracts of the master and transactional data in various formats from both legacy systems and will also get the WIP master datasets from the team working on the new ERP system. From these they’ll construct point-in-time conformed-dimensions combined old with old and old with new; and will rebuild transactional feeds with new conformed keys alongside existing keys. These datatsets are then sent to the various analysis teams to enable them build Excel and PowerPivot models that will hopefully all “sing from the same hymn sheet”.
And how will this be accomplished? With the datasmith’s HAMMER of course (well it is my
day-dream use-case!). Both in-Excel and command-line HAMMER “scripts” will be used to wrangle the datasets and the complete sets packaged as SQLite database files.
When the analysis groups receive these files, they too will use HAMMER to extract the datasets they require (either straight to CSV for loading into PowerPivot) or into Excel for some further transformation work prior to modelling. To handle slowing-changing dimension scenarios, many of the teams will squirrel away each week’s data in other to be able to model yet-unknown-but-very-likely changes of organisational structures.
Although this is a use-case, it’s based on reality, a reality that would have been so much easier to manage if I, and the teams I worked with, had access to technology such as HAMMER and PowerPivot.
To download the latest version of the code, go to this page on my website.
Follow the HAMMER tag on this blog for information on commands and examples (best start with the oldest and work forward …)