The Datasmith’s Hammer

Although my microETL add-in is very powerful, it can be a bit intimidating for those without a programming background. It was after all, designed for my needs primarily and being a professional programmer I tend to see the world from that perspective. Hence microETL’s ability to forge vast and complex datasets in parallel to those of Excel; to share those datasets not just with Excel and VBA but also with the powerful tool that is CPython. But microETLs genesis was not as an all-powerful ETL tool but as means to quickly and accurately handle tabular data in Excel. The original xLite (which begat microETL) started out with just two functions, Join two tables or Left Outer Join two tables. That was it, but it was still useful.

I’ve been intending for some time to build an offshoot of microETL that would be less powerful but perhaps more approachable and have fewer moving parts. This week I finished it, still needs some more testing, but the basic product is in place. It’s a single file add-in (a .xll) , called DATASMITH. At its heart is a single function called HAMMER, there will be other helper functions that in the main will wrap the HAMMER function, but in essence it is the datasmith’s HAMMER.

If microETL is a datasmith’s forge or indeed a mirco-foundry,HAMMER is a datasmith’s everyday portable tool (with perhaps Excel as the anvil, and your CPUs as the fire?). Talking of CPUs; multi-core CPUs are now the norm and since version 2007, Excel can utilise such multi-cores. MicroETL being VBA-based cannot however take advantage of this, to do so requires an .xll add-in; another reason to build the HAMMER.

So what will this new functionality look like:

Examples:

=HAMMER(Invoices[#All],InvoiceLine[#All],”JOIN”)

…will take the two ranges (you need the [#All] to pick-up the header and data sections of a 2007/2010 Excel Table) and join them using the columns with the same name as the join fields. The JOIN command expects the last two preceding arguments to be tables (aka arrays with a header line).

=HAMMER(“SALES”,DeptSales!A1:F2101,”SELECT * from table2 where dept=’:1′”,”SQL”)

… the 1st argument is loaded as Arg(1) (:1 in SQL), the 2nd argument is loaded (being an array) into a table named table2; if they were the other way round, it would be table1 and Arg(2). The 3rd argument is loaded as Arg(3) and the 4th is a command: SQL. SQL looks back at the preceding argument (Arg(3) in this case) and executes its contents as SQLite SQL. If the preceding argument was a table, it would expect to find a list of SQL statements for execution in the first column. The output of the last issued SELECT statement is then returned to Excel as an array.

=HAMMER(DeptTargets!A1:D20,DeptSales!A1:F2101,SalesTagetScript,”PYTHON”)

…this is similar to the previous SQL example but this time the command is PYTHON which will execute the Python Script passed in via the command’s preceding argument. The script will most likely return a table to Excel, but it could also, if not the last argument, create a table associated with its position, in this case table4, which could then be accessed by subsequent PYTHON or SQL scripts.

Up to 25 arguments can be passed, the last table produced is returned to Excel (either via a load, which wouldn’t be terribly useful, or more likely as a result of a command such as JOIN, SQL or PYTHON). HAMMER functions can of course be nested and can also issue “internal” HAMMER requests. The “flow” of commands is from left to right, with the preceding args usually setting the stage for subsequent commands. Alongside the all-powerful SQL and PYTHON commands, I’ll most likely add a set of “noSQL” offerings such as JOIN, LOJOIN (left-outer), DISTINCT, REDUCE (a SELECT .. GROUP BY… with PYTHON as the MAP?), UNION, INTERSECT. These will likely also be available through helper functions such as =JOIN(ThisTable,ThatTable).

Unlike microETL, there’s no persistence across function calls i.e. HAMMER will play by Excel’s Functions no-side-effects rule. Each call to a function will build up and tear down its in-memory SQLite environment (including calls to HAMMER from within PYTHON).(UPDATE: Specify the 1st COMMAND as “APPDB” to simulate microETL’s persistence across function calls)  Likewise each call to PYTHON will be a separate engine instance. Likewise each call to a HAMMER function will create its own PYTHON engine. Tables  and Python artefacts created by prior “steps” in a single HAMMER call are available to subsequent steps.

HAMMER embeds Python under the guise of IronPython, so a lot of the power and speed of CPython will not be available, but on the other hand, the full power of the .NET CLR will be, not a bad swap.

And a huge advantage, HAMMER will be an asynchronous function (i.e. will run in its own thread). This will allow multiple long running transforms to be handled within the same Excel instance, a major shortcoming of microETL. This requires Excel 2007 or 2010, but will still work synchronously for Excel 97-2003. Having said that, there’s a requirement for Net4.o for IronPython so this add-in is more suited for modern versions of Excel and for OS >= XP SP3.

HAMMER is an array function, yeah I know, normal folk tend to steer clear of Excel arrays. Which is to be expected, they’re not the most intuitive end-user-facing construct that the industry has ever come up with. But, they are the “Excel way” for passing tables in and out of formulas and, once mastered, open up a new world of power to Excel users. I will be adding helper functions to make using arrays a bit easier (an autoSize wrapper function, that’ll resize the selection area if it’s too small or too big, at the cost of a 2nd pass at the enclosed functions, might also port microETL’s SQL “paste” functionality).

I had intended to have an example to download with this post, but have discovered a last-minute bug that needs fixing. So it’ll be most likely next week before I’ve a version to show.

UPDATE:

I’ve managed to sort out the bug, so here’s an example…

http://bit.ly/datasmith

Here’s a list of the HAMMER commands implemented so far …

Use setup.xls to install (or simply open in Excel), see IrelandFOIexample_hammer (2007/2010) for examples (there’s also a copy of the same functionality via microETL in IrelandFOIexample_microETL.xls – note the speed difference).  There’s Excel 2000/2010 32bit and  Excel 2010 64bit versions included plus a 64/32Bit NET4.0 version of HAMMER.exe (command line version) plus a 64/32 Bit NET2.0 HAMMER-NoPython.exe.

UPDATE:

Many come to this page looking for a HAMMER tutorial, there isn’t one yet, best option is to read these posts http://blog.gobansaor.com/category/hammer/

Read oldest first (this one) followed by the next oldest and so on, most commands as they were introduced appear in a blog post. Also the command LISTCOMMANDS will list all current commands.

Advertisements