HAMMER on the Range

The HAMMER function is at heart an array formula. For those of you familiar with Excel array processing (likely to be a minority) this makes perfect sense, as HAMMER’s main purpose in life is to process and return tabular data, and prior to 2007’s Excel Tables (actually 2003’s somewhat similar Lists), arrays were Excel’s only nod to the existence of tables as units of data.

The functionality introduced by Excel Tables is a huge improvement and if you haven’t checked it out I recommend you do; it and 2010’s PowerPivot are two of the most important enhancements to Excel since the PivotTable.

Nevertheless, if you are to write UDFs to handle tabular data you’re still talking arrays (although you can reference a table within a UDF, returning tabular data still requires an array). Array formulas also continue to a very powerful skill for those who wish to master Excel and again if you’ve not done so, do check them out. But for those of you who have an aversion to  CTRL+SHIFT+ENTER, HAMMER (and HAMMER Inside UDFs) offers a number of array helper functions:

  • HAMMERtoFit – will resize the selected destination range to fit the returned array (result remains an array). Has the disadvantage that the underlying function call will be called twice if the array area requires resizing.
  • HAMMERtoSheet – will output the array to a new sheet as a non-array table. This (and a properly sized array call to HAMMER) is the fastest method of returning variable sized datasets to Excel.
  • HAMMERtoRange – will paste the array to the range address specified (as a non-array table), will also run an optional VBA macro before/after the paste.

The first two functions have been available for some time. The toRange helper is new to this release (V1.1.1). Before I describe the new function in detail, it’s useful to understand how HAMMER (and  any C based XLL ) sees and serves-back array data. An array range passed to such a function will be converted into a two dimensional multi-type array. The two types of data that will be passed over are: Doubles  (all numerics and dates), and Strings (everything else). For those of you familiar with the Excel Object Model, that’s the equivalent of the VALUE2 property of the Range object. The apparent date conversion to a Double (e.g. 01-JAN-2001 passed as 36892) is, in fact, not a conversion, as dates in Excel are always simply numbers with date formatting applied.

Likewise,when data is returned to Excel, the data comes back in the same “raw” format, requiring date or currency formatting to be applied to the relevant cells.

The HAMMER function and the two helper functions toSheet and toFit operate to these conventions. The third helper function HAMMERtoRange (along with the XLRANGE set of commands) can work around this restriction (at a certain performance cost).

HAMMERtoRange:

The function is called in the exact same way as the parent HAMMER function but with an additional argument. This argument is a comma separated list of 1,2 or 3 elements.

If only one element supplied it’s assumed to be the address to paste the table to; if two elements, first is the address, 2nd is the name of a VBA macro to run after the data is pasted; if 3 elements, the 1st is a macro to run prior to the paste , 2nd is the address, and 3rd is  the post-paste macro.  Examples:

  • =HAMMERtoRange(…,”Sheet2!a4″)
  • =HAMMERtoRange(…,”MyNamedRange”)
  • =HAMMERtoRange(…,”Sheet2!a4,myCleanupMacro”)
  • =HAMMERtoRange(…,”,Sheet2!a4,”)
  • =HAMMERtoRange(…,”mySetupMacro,Sheet2!a4,myCleanupMacro”)
  • =HAMMERtoRange(…,”onlyASetupMacro,Sheet2!a4,”)

The address is specified as a “string” i.e. doesn’t reference a range object (otherwise the result would be an infinite loop). If you do mange to initiate a loop, exit it using the Esc key, if you don’t, the loop will eventually end in one of two ways, either with a nice exception or a, not so nice, failure of Excel. The toFit helper will never loop as it is using Excel native array functionality and Excel will protect against this; the toSheet has the potential to loop but less likely than the toRange.

You may find the HAMMERtoRange function much slower that the other helpers (due to its dependence on COM Automation rather than the C API) but only for large datasets. Making use of the pre/post macros doesn’t require much VBA skills as the outputs of “record macro” sessions are ideal for this type of processing.

The address examples above,such as “Sheet1!A1”, are the “top left-hand corner” of the resulting table. The function will clear any data within the “.CurrentRegion” of that cell, but will first check if an Excel Table already occupies that region, and if so, reconstitute a table of the same name after the new dataset is pasted.

If a cell is sourced from a SQLite date column (actually a date “cell” as SQLite, like Excel, uses manifest typing) the data will be formatted as a date (this is unlike the behaviour of toFit and toSheet). This will not normally be obvious if the source data is loaded into HAMMER via a range (see discussion above) as the data will have been delivered to SQLite as a Double. To get around this either:

  • Update the affected column within HAMMER using the SQLiteDate function via SQL e.g. “Update table1 set datejoined = SQLiteDate(datejoined)”
  • or, use the pre/post macros to format the date columns as dates
  • or, load the dataset using these newly added COMMANDS: XLRANGE or EXCELRANGEASTEXT.

XLRANGE: This command expects an address in “string” format (i.e. not a referenced range) from which it will load a table using Excel Range’s Value property. Using Value rather than Value2 means the internal function logic will know if a cell is a date or not and store that data in SQLite date format. The address can either be a “top left-hand corner” type single-cell address or a full range; if a single-cell address, then the cell’s “.CurrentRegion”  is taken to be the extent of the table to be loaded.

XLRANGEASTEXT: will load data using the Cell object’s Text property, so what you see is what you get. FOr example, if a cell has a percentage format, a value of 1 will be loaded as the text “100%” not as the numeric 1.

XLRANGEASVALUE2: is like XLRANGE but uses the .VALUE2 property i.e. simulates a “normal” range load.

All of these are much slower than a straight range load (ASTEXT is particularly slow). Given the potential performance hit you might well ask why offer a ASVALUE2 option (the others at least provide the potential useful service of preserving type and structure information)? The answer lies in a feature that ‘non-referencing string addresses’ offer, i.e. changes to datasets ‘pointed at’ by such addresses will not trigger a recalculate (as Excel is not aware of any relationship). This can be useful when building up long lists or tables for processing by HAMMER without any (potentially, long running) processing being triggered by each cell change (saves having to set the workbook to manual calculation and allows for the build out of several source tables prior to activating a re-calc).

Download latest version of HAMMER from here:

Advertisements

5 responses to “HAMMER on the Range

  1. Hi Tom,
    I follow your articles regularly. My question is not on the Hammer but on PALO since you refer to it as well quite often.

    I need to load data for some simple reporting and simple ad-hoc (like choose certain columns and filters etc.). The data set has only 2 dimensions (customer and time). Customer has about 20 attributes (county, format, segment etc.). And then there are large number of measures (about 1000). The data size is limited by relational standards (10K customers * 40 time segments). Is this something the PALO community edition server can handle? I think PowerPivot can handle it but I need this solution to be accessible from the web.

    Please give me your opinion if possible.

    Thanks
    Sean

  2. Sean,

    As you said, PowerPivot would eat it up.

    But Palo will be able to handle it (assuming you have the memory). When dealing with hype-cube tools such as Palo, you really need to leave your relational baggage behind. The actual size of the cube (in memory) will potentially be 10000 * 40 * 1000 ,elements (the actual number will be determined by the sparseness of the data i.e. does each customer have each measure for each time period.

    The Customer “dimension” (a loaded relational term if ever there was) is also not a single dimension in hyper-cube terms. Usually each significant hierarchy-member-attribute of customer would be modelled as a separate dimension (e.g. Country->Sate->County)/ Attribute members of Palo dimensions are also possible but are more intended for “informational” attributes and not for those likely to be members of major hierarchies.

    In short when designing a hyper-cube, think hierarchies and cells, not relational star-schemas.

    (The 1000 measures seems excessive, are you sure there are not hierarchies hiding in there ?)

    Tom

  3. Thank you Tom for your feedback. I’ve never really worked with cubes but just relational databases and schemas. So my thinking is stuck in that realm. Are there any favorite sites you’ve that will help bridge this gap?

    I’ve this data modeled in MicroStrategy (the free version they offer). MSTR is very good tool but the free version is hobbled hence I am looking for open source stuff. I need to make this available over the web. Pentaho will not work well as they have a very limiting interface (and I’m not really looking for “reports” for printing). PALO seems to have what it takes even though I am looking for speed that cube offers.

    As far as the data, there is no sparseness. There are some hierarchies but not very deep. Most are informational attributes. Measures are 1K as it’s financial data (so you have Total Assets and then a bunch of measures that break down these assets and then even more).

    Now to search for how to design this cube (for loading it at leaest Talend and Pentaho both now have PALO tranformations).

    Your blog is very informative and keep up with it whenever you post.

    Sean

  4. Seán,

    Palo is ideal for financial reporting (that, and budgeting, is its original use-case) so as long as you model it correctly you’ll have no problems (you’ll know if you haven’t, it’ll be clunky to use).

    This book http://www.amazon.com/OLAP-Solutions-Building-Multidimensional-Information/dp/0471400300 helped me finally get my relational-trained mind around hyper-cube concepts. Now in its 2nd edition still relevant today.

    In cube terms, Total Asset isn’t a measure as such, its a hierarchy level on a measure (or an “account”). In fact products like Essbase use the term Accounts to denote the Measures dimension (in a pure hyper-cube, which Palo is, a measure is just another dimension). Thinking of your measures as “accounts” may help you conceptualise the model required.

    Tom

  5. Pingback: HAMMER Alongside, as a COM Server | Gobán Saor