Hammer and Threads

I’ve released an updated version of HAMMER to which I’ve added a few more noSQL “set” functions such as UNION, UNIONALL, INTERSECT and EXCEPT (aka MINUS). Like JOIN and LOJOIN (left outer JOIN) before them, they act on the previous pair of loaded (or generated) tables. I’ve also added an “internal” HAMMER command which like PYTHON or SQL uses the last table as its source. Source tables of internal HAMMER commands consist of one or more columns with each column being a sequence (top to bottom) of arguments for an individual HAMMER call. External HAMMERs load table data by referencing Excel ranges, the internal version loads table data from the list of already loaded tables.

Each internal HAMMER call runs in its own in-memory SQL instance and its own IronPython instance. You might ask what’s the point of the internal HAMMERs? The two main use-cases are to enable code-generated HAMMER calls and, the big one, allow ‘internal’ multi-threading.

Each column in a HAMMER source table is assigned its own thread, the resulting tables are, when all the threads have finished, UNION ALL’d and returned as a single table. In effect, your own personal MAP REDUCE tool, with the multithreaded HAMMER steps providing the MAP and subsequent SQL or PYTHON steps providing the REDUCE.

Although the ‘external’ HAMMER udf is multi-threaded, the degree of threading is controlled by Excel and is limited (but can be over-written) to the number of processors available. Excel may not always split the calculation graph across threads as it depends on how the individual formulas inter-act. This makes sense for CPU-intensive tasks, no point in having more threads than the number of cores capable of handling the CPU load. But other tasks that are non-CPU bound such as fetching data from a Web Service or hard disk can benefit from multiple threads in excess  of available CPUs; this is where internal HAMMER threads come in.

I’ve included an example (hammerThreads.xlsx) using the same web-scraping exercise I used in my PiCloud post  This time I’ve split the task over five threads (each handling 20 requests). Note this can be very slow, the website in question can take forever to respond. Try experimenting with more or fewer threads to see where the sweet spot is (remember to put your Excel instance on “manual calculation” when changing the configuration!).

The Python script used to fetch the data is also worth examining as it probably represents better my ideas on the use of IronPython than the previous FOI example. The FOI script uses SQL a lot and this is fine, but I look at Python as more part of the noSQL side (along with JOIN, UNION etc) than the SQL side (represented by, of course, the SQL command) of HAMMER. In the new example, the building of the output table happens using Python’s “native” list-of-lists (sometimes the “list of lists” are  actually .NET ragged arrays, but behave the same). The lists are returned to HAMMER via a helper function tableReturn(myList); likewise the db.pySQLQuery(“Select .. from tableArgNo”) call could be replaced with a call to fetchTable(argNo) where argNo is the table’s argument position.  The script also uses a .NET provided WebClient class to do the fetching rather than normal Python classes; when in Rome!

Both Python (IronPython 2.7) and Multi-threading internal HAMMERs (built using the new Parallel System.Threading class) require the .NET4 runtime and although I intend to release a .NET2 version of the tool, both of these features will be missing. (To see what version of .NET you’re running and for links to the latest versions visit  http://www.hanselman.com/smallestdotnet/ using Internet Explorer).

Here’s a list of the HAMMER commands implemented so far …

You can download the latest version here …