Those with a datasmith’s hammer, see every problem as a table.

I picked the name HAMMER for my new micro ETL tool to capture the spirit of how it’s intended to be used. It’s an everyday tool, sufficient for many of the tasks a datasmith faces and when not, will help (or at least not get in the way of) with the marshalling of data that needs to be processed by more powerful tools.

The saying “If you only have a hammer, you tend to see every problem as a nail” could be rephrased as “If you only have a datasmith’s HAMMER, you tend to see every problem as a table“!. But that’s OK, as nearly every commercial IT problem can be expressed as a set of tables, and in any case, HAMMER is not intended to be a datasmith’s only tool. Its close integration with Excel, recognises the prime importance of spreadsheets to most data wranglers and its use of CSV & the SQLite database format as its persistence & transport mechanism, (à la “SQLite as the MP3 of data“) recognises that datasets will often need to be shared with other tools.

Finding the perfect tool for the job it an IT obsession that most of our customer’s care little for; it’s only when our choice of tool affects their bottom-line (i.e excessive cost or wasted time) that end users take much notice of the backroom technology. The most important skill for a datasmith, is the ability to understand the structures and forms that data can take and to be able to derive and build business value from that data. The technology is an aid, nothing more, technology (and indeed  applications) will come and go, data lives on.

HAMMER encapsulates the  minimum set of technical skills an aspiring datasmith should learn:

  • Spreadsheets, in particular Excel. No need to know every nook and cranny, nor expect it to handle every task. But do learn about pivot tables and array formulas; if you have Excel >= 2007, learn about Excel  tables (aka Lists). If you have Excel 2010, make sure to download PowerPivot. Become comfortable with “formula programming”, don’t expect  a ribbon command to solve every problem.
  • SQL – learn the basics of selecting, joining and group-by; again no need to become a master; SQLite is not only an excellent way to learn SQL, it’s also a powerful tool to have once you’ve picked up the basics.
  • Learn a scripting language –  Python is one of the best and is relatively easy to learn. Again, mastery is nice, but not essential, learn the basics of IF-THEN-ELSE logic, loops and iterations, array and list handling and string manipulation.  Your efforts at coding do not have to be pretty or elegant, just functional. Python skills will also transfer across platforms, CPython (the original and the best), IronPython (.NET and HAMMER) and Jython (JVM, here’s an cool example of Python as a scripting language to automate GUIs).

All this talk of picking the right tools brings to mind the old golf pro-am story where the amateur was constantly pointing out to the pro (Jack Nicklaus I think) what club to play. At a particularly tricky shot to the green, the pro had enough when his choice of club was again “criticised”. So, he proceeded to take out every club including woods & putters, placed a ball for each one and hit every ball on to the green.

We’re not all as talented as Jack Nicklaus, so having at least an good enough tool for the job at hand  is important. But it does show, that focusing on the end-game is what matters, not becoming fixated with a  particular way of doing things.

Enough of the moralising, and talking of being fixated on a particular tool  😉 here’s this week’s list of new features to be found in HAMMER:

New commands:

TXT – will load a text file such that each line is a new row in a single columned (column named “text”) table.

LOADDB – like OPENDB, opens the named database but only if the database file exists (OPENDB will create a new empty database if file not found). Intended primarily as end-point to request-process-response loop, see DELEGATE/SUBMIT  below.

SAVEDB – saves the current “main” database using the previous argument as it’s name. (shorthand for … “main”,”filename.db”,”SAVENAMEDDB”).

SUBMIT – same as SAVEDB, but the value of the prior argument has a “.request” appended to make a file name in the format “databasename.request” for saving to. Also, if the previous argument = “” or “{GUID}” will generated a globally unique name using  a GUID. The main use-case for this command is to send the saved database for processing by an external program, maybe a Talend or Keetle ETL job , or a HAMMER.EXE job.

DELEGATE – same a SUBMIT, but doesn’t do any processing (i.e. all commands other than DELEGATE, which is expected to be the last command, are ignored), instead it’ll package the request in the saved database with the expectation that an external HAMMER.EXE, or another Excel-based HAMMER call, with do the processing.

Changes and helper functions to support DELEGATE/SUBMIT processing:

The DELEGATE and SUBMIT commands are how HAMMER implements its version of steam-powered servers.

The intention is that databases named with extensions of “.request” are picked up and processed by “servers” waiting for such files. (The transport of such files between server and the served is a separate issue, might be simply a matter of placing files on a shared drives, or DropBox!) Theses servers may then populate that database with new data ( or not, e.g. might generate a CSV). When finished, a text file of the same name but with the “.request” replaced by a “.response” is written to indicate that processing is complete.

Both the Excel udf HAMMER and HAMMER.EXE (non-excel command line version of UDF function), have been changed such that when a single argument is passed, that argument is expected to be a DELEGATE generated “.request” database. The database’s “pickled” arguments and data will then unpacked and the “delegated request” processed.

HAMMER.exe, if started with no arguments, implements a simple DELEGATE “server”, i.e. will wait for “.requests” files in its default folder and process each file as above.

Three RTD helper functions have been added to HAMMER udf:

waitOnHammerResponse(database (without the .response),[folder name]) – will return “Waiting” until it finds a database.response file, will then return the original .request filename (i.e. not the response file as it’s simply a marker, the database.request will  contain original data and any new data generated).

waitOnHammerRequest(database (without the .request),[folder name]) – will return “Waiting” until it finds a database.request file, will then then return the request filename.

waitOnFile(filenaname,[folder name]) – like the functions above but without the response/request expectations.

Here’s a list of the HAMMER commands implemented so far …

Get the latest version of HAMMER here …

Advertisements

2 responses to “Those with a datasmith’s hammer, see every problem as a table.

  1. Pingback: VBA Multithreading, .NET integration via HAMMER | Gobán Saor

  2. Pingback: HAMMER and Threads Redux – The adventure continues… | Gobán Saor