Category Archives: excel

Building an Excel 2013 Percentile dashboard without PowerPivot or a PivotTable

At the end of my Playing DAX Percentiles on the mean (or is that median) Streets of Ireland post I suggested that plain old Excel (POE) might be a preferable alternative for this particular problem rather than using PowerPivot. In fact, for many Excel users, PowerPivot will only be an option when using a PC hosted client-side workbook. Excel 2013 will allow workbooks to be published on the web and also be accessible via native apps on Windows 8 tablets (and eventually iPad & Android native apps if rumours are to be believed).

But, PowerPivot (aka Data Model) functionality will only be available to those with an enterprise-class SharePoint licence (for web publishing) and not at all for native-app-deployed workbooks. So, knowing how to use POE to construct dashboards continues to be a skill worth having.

But how to construct a responsive percentile calculating dashboard sans PowerPivot?

You might think, no problem, I’ll use a traditional PivotTable; alas, like PowerPivot, it lacks the ability to calculate percentiles as standard; but, unlike PowerPivot, offers no method of constructing a DIY measure to do so!

Next up, you might look at the SUMIFS family of “pivot” functions, but they too are limited to the usual aggregates; and SUBTOTAL is likewise limited to the usual suspects.

Luckily, via the (black!) magic of array formulas there is a way.

In fact, the “trick” below when shown to me by a “civilian” datasmith many years ago, convinced me that I should perhaps invest some time getting my head around this powerful “array formula magic”.

If you’re already comfortable with array formulas and are wondering if DAX is too complex to master, don’t worry, you’ll have little trouble mastering DAX. Likewise, if you’re a DAX whiz, you should check out array formulas.

Basing the dashboard on the Property Register converted to an Excel Table (a 2007+ feature that many are still unaware of) enables the use of  Slicer selectors in Excel 2013 to quickly give a dashboard feel. It’s still possible to use the Excel Table filters directly on the table (the built-in date and text filters are particularly useful).

Also, in Excel 2013, a chart can be directly “animated” by a range/table without the need for a PivotTable cache, again making the building of “PivotTableLess” dashboards easier.

A lot of this can also be accomplished in sub 2013 versions of Excel (including, most importantly, the array formula “trick”) but Excel 2013 just makes it all so much easier and, of course, the ability for every user (from Home to Pro) to save and/or publish workbooks via “the cloud” is a major advance (big thanks to Google Docs, without you this might never have happened 😉 ).

UPDATE:

See David Hager comment, there’s a new 2010+ AGGREGATE function which has many more options than SUBTOTAL, including Percentiles, so you can ignore the trick below if using 2010/2013. The trick would still be useful to supply an array to a bespoke formula (I used it first to calculate a particular type of weighted average) or to functions such as IRR or XIRR, so still useful to know even in modern versions of Excel.

UPDATE:

See the TheDataSpecialist’s comment for a even better modern make-over of the SUBTOTAL(3, trick.

Below is the formula to calculate a median using only “visible” rows within a filtered table/range. Note: it’s an array formula, so it must be entered using the CTRL SHIFT and ENTER keys.

{=MEDIAN(IF(SUBTOTAL(3,OFFSET(Prices[Price],ROW(Prices[Price])-MIN(ROW(Prices[Price])),,1)),Prices[Price],””))}

I used this Excel formula beautifier to make it more readable.


MEDIAN
(
    IF
    (
        SUBTOTAL
        (
            3,
            OFFSET
            (
                Prices[Price],
                ROW
                (
                    Prices[Price]
                ) -
                MIN
                (
                    ROW
                    (
                        Prices[Price]
                    )
                ),
                 ,
                1
            )
        ),
        Prices[Price],
        ""
    )
)

This logic behind this is explained here, but essentially, the OFFSET() portion returns either a filtered range of one or zero rows (counted by the SUBTOTAL(3,…) part). if one ,it’s a visible row, if zero, it’s not visible, so ignore.

If this makes no sense (it didn’t to me the 1st time I saw it), then don’t worry, just make a note of it and use the trick in blind faith :).

You can see this in action in “the cloud” here as a read-only “Excel Web App”, (you can also download the workbook to see its internals).

If you’re reading this sometime in the future the above link may not work as it’s published using a beta version of Office 2103, here’s a direct link to the Excel 2013 workbook.

And if you haven’t yet installed Excel 2013, here’s a cut-down 2007/2010 version.

Advertisements

Cross Join Three Tables via DAX Query to seed a Date Dimension.

This is a variation on an old SQL trick to seed a Date (aka Time) dimension. You’ll need such a table in many, if not all, PowerPivot models as the proper functioning of Time Intelligence functions require a separate table containing a date column populated with a no-gaps sequence of dates covering the potential time span of the model.

If you’re sourcing your data from an existing data warehouse, most likely you’ll already have access to a date dimension. But if not, you’ll need to generate one. You could use the auto-fill facility in Excel to do so, but it can be awkward to use when generating a many-year table (e.g. pension & mortgage models might need current year +/-  20 or even 90 years).

To seed the dataset you’ll need three one-column tables; one for DAYs with values of 1 to 31, another for MONTHs with values 1 to 12 and finally YEARs listing the years required e.g. 1995 to 2060 or 2011 to 2013. Next, load these tables into a PowerPivot model, as linked-tables or whatever, and apply the following DAX Query against the model.

Evaluate(
ADDCOLUMNS(
SUMMARIZE(
ADDCOLUMNS(
CROSSJOIN(Years,Months,Days),
“FullDate”,DATE([Years],[Months],[Days])),[FullDate]
),
“Year”,YEAR([FullDate]),
“MonthNo”,MONTH([FullDate]),
“Month”,FORMAT([FullDate],”MMM”)
)
)
ORDER BY [FullDate]

You can apply the above DAX query using this trick (Note: this will not work for Excel 2013, but no worries, as  in Excel 2103, DAX Queries are a supported data-table source, see here).

Or, you could  install the excellent open source DAX Studio Excel Add-in (again doesn’t work in Excel 2013 but will eventually, I would think).

So how does this work?

Let’s start with the most inner function call:

CROSSJOIN(Years,Months,Days)

This simply cross-joins the three tables. The cross-join creates a table consisting of every single combination of days, months & years.

Next:

ADDCOLUMNS(
CROSSJOIN(Years,Months,Days),
“FullDate”,DATE([Years],[Months],[Days])
)

ADDCOLUMNS is the equivalent to the PowerPivot add-in’s “Calcuated Columns” in a DAX Query. This will add a new column call “FullDate” to the table using the DATE function to form a valid date.

Now you might think that many cross-join combinations would generate an invalid date (e.g. 30th March) but DATE is a forgiving function and will convert the likes of 30th of Match to the 2nd of April (or 1st in leap years). In fact you could use a 1 to 366 DAYs table and a single value (i.e. 1 ) MONTHs table! The result will include some duplicate dates but the next layer in the onion will take care of that:

SUMMARIZE(
ADDCOLUMNS(CROSSJOIN(Years,Months,Days),”FullDate”,DATE([Years],[Months],[Days])),
[FullDate]
)

SUMMARISE is like a SQL Group By, and will remove any duplicate dates in the [FullDate] column.

You could stop now as the minimum requirement for a date dimension has been produced, but in reality you’ll need a few descriptive/selective/sorting attributes to be assigned to each date.

You could do this via an Excel Table or via Calculated Columns when you import the table into a PowerPivot model. Or, as here, add them via a DAX Query:

ADDCOLUMNS(
SUMMARIZE(ADDCOLUMNS(CROSSJOIN(Years,Months,Days),”FullDate”,DATE([Years],[Months],[Days])),[FullDate]),
“Year”,YEAR([FullDate]),
“MonthNo”,MONTH([FullDate]),
“Month”,FORMAT([FullDate],”MMM”)
)

Again using ADDCOLUMNS to add calculated columns: a “Year” column (1995,1996 etc.) for selecting, a MonthNo column (1,2,3 …) for sorting and a “Month” column (Jan,Feb) for descriptive/selection purposes.

Finally, add a SORT BY [FullDate] to return the table in sorted order.

The table can then be saved as a CSV or an Excel worbook or loaded back into the source database. All, or a filtered range, can then be loaded into future models as required.

SAP RFC_READ_TABLE functionality in HAMMER

The code below is a typical VBA routine used to fetch data from SAP into Excel.

It uses the “SAP.Functions” COM object as exposed by the SAP GUI Client, fetching the data via RFC_READ_TABLE; an automated SE16 in effect.

The credentials required are the same as those you would use to log into your desktop client, and whatever internal tables you can see via SE16, those same tables will be fetchable via RFC_READ_TABLE.

This automated fetching of data is ideal when some self-service reporting is a requirement (you know, standard DW extracts offer most of what you need, but there’s always something missing 🙂 ).

I figured this would be a good candidate as a HAMMER command. Not just to take advantage of HAMMER’s natural table handling but also its multi-threading capability. Being able to spawn one or more background threads (or delegate to HAMMER.exe command line process(es) ) would be very handy for SAP datasmiths.

Problem is, the code below works, and I’ve converted it to VB.NET, made it more generic and added it as a HAMMER command; but I can’t test it, as I no longer have access to a SAP R3 Instance!

The command SAPREADTABLE takes three parameters:

  • a CSV list of SAP logon credentials: System,Client,User,Password,Language
  • a CSV list of table information, 1st argument the table name, the rest field names e.g. KNA1,KUNNR,NAME1,NAME2,LAND1
  • a filter statement (like a SQL where) e.g. LAND1 in (‘DE’,’NL’)

Example:

“Test SYS,600,tom,pwd,EN”,”KNA1,KUNNR,NAME1″,”LAND1 = ‘DE'”

UPDATE: April 29, 2012

Could somebody with access to SAP R3 test this out for me?  Done, tested (found a small bug, now fixed) and working (thanks to a kind person who allowed me access to a test server, you know who are, thanks again).

Fetch the modified latest version below (fixed bug that produced an extra blank column and extra blank row in result table, my typical “1 off” bug when converting from VBA to VB.NET, obviously I’ll never learn 🙂 )

If you get a scary “ABEND – SYSTEM FAILURE” error, don’t panic you haven’t broken the company ERP system, it’s usually due to a malformed filter statement e.g. LAND1=’NL’ (no spaces) rather than LAND1 = ‘NL’.

To download the latest version of the code, go to this page on my website.

Follow the HAMMER tag on  this blog for information on commands and examples (best start with the oldest and work forward …)

Need a pure VBA version, here it is :

SAP RFC_READ_TABLE VBA Example:


Option Explicit
Option Base 0

Public Function RFC_READ_TABLE(tableName, columnNames, filter)

Dim R3 As Object, MyFunc As Object, App As Object

' Define the objects to hold IMPORT parameters
Dim QUERY_TABLE As Object
Dim DELIMITER   As Object
Dim NO_DATA     As Object
Dim ROWSKIPS    As Object
Dim ROWCOUNT    As Object
' Where clause
Dim OPTIONS As Object
' Fill with fields to return.  After function call will hold
' detailed information about the columns of data (start position
' of each field, length, etc.
Dim FIELDS  As Object
' Holds the data returned by the function
Dim DATA    As Object
' Use to write out results
Dim ROW As Object

Dim Result As Boolean
Dim i As Long, j As Long, iRow As Long
Dim iColumn As Long, iStart As Long, iStartRow As Long, iField As Long, iLength As Long
Dim outArray, vArray, vField
Dim iLine As Long
Dim noOfElements As Long

'**********************************************
'Create Server object and Setup the connection
'use same credentials as SAP GUI login
On Error GoTo abend:
  Set R3 = CreateObject("SAP.Functions")
  R3.Connection.SYSTEM = ""
  R3.Connection.Client = ""
  R3.Connection.User = ""
  R3.Connection.Password = ""
  R3.Connection.Language = "EN"

  If R3.Connection.logon(0, True) <> True Then
   RFC_READ_TABLE = "ERROR - logon to SAP Failed"
   Exit Function
  End If
'**********************************************

'*****************************************************
'Call RFC function RFC_READ_TABLE
'*****************************************************

  Set MyFunc = R3.Add("RFC_READ_TABLE")
   Set QUERY_TABLE = MyFunc.exports("QUERY_TABLE")
   Set DELIMITER = MyFunc.exports("DELIMITER")
   Set NO_DATA = MyFunc.exports("NO_DATA")
   Set ROWSKIPS = MyFunc.exports("ROWSKIPS")
   Set ROWCOUNT = MyFunc.exports("ROWCOUNT")

   Set OPTIONS = MyFunc.Tables("OPTIONS")
   Set FIELDS = MyFunc.Tables("FIELDS")

   QUERY_TABLE.Value = tableName
   DELIMITER.Value = ""
   NO_DATA = ""
   ROWSKIPS = "0"
   ROWCOUNT = "0"
   OPTIONS.Rows.Add
   OPTIONS.Value(1, "TEXT") = filter ' where filter

    vArray = Split(columnNames, ",") ' columns
    j = 1
    For Each vField In vArray
        If vField <> "" Then
            FIELDS.Rows.Add
            FIELDS.Value(j, "FIELDNAME") = vField
            j = j + 1
        End If
    Next

   Result = MyFunc.CALL

   If Result = True Then
     Set DATA = MyFunc.Tables("DATA")
     Set FIELDS = MyFunc.Tables("FIELDS")
     Set OPTIONS = MyFunc.Tables("OPTIONS")
     R3.Connection.LOGOFF
   Else
     R3.Connection.LOGOFF
     MsgBox MyFunc.EXCEPTION
     Exit Function
   End If

  noOfElements = FIELDS.ROWCOUNT
  iRow = 0
  iColumn = 0
  ReDim outArray(0 To DATA.ROWCOUNT, 0 To noOfElements - 1)
  For Each ROW In FIELDS.Rows
    outArray(iRow, iColumn) = ROW("FIELDNAME")
    iColumn = iColumn + 1
  Next

'Display Contents of the table
'**************************************
iRow = 1
iColumn = 1

For iLine = 1 To DATA.ROWCOUNT

       For iColumn = 1 To FIELDS.ROWCOUNT
         iStart = FIELDS(iColumn, "OFFSET") + 1
    '       If this is the last column, calculate the length differently than the other columns
         If iColumn = FIELDS.ROWCOUNT Then
            iLength = Len(DATA(iLine, "WA")) - iStart + 1
         Else
             iLength = FIELDS(iColumn + 1, "OFFSET") - FIELDS(iColumn, "OFFSET")
        End If
    '       If the fields at the end of the record are blank, then explicitly set the value
        If iStart > Len(DATA(iLine, "WA")) Then
             outArray(iRow, iColumn - 1) = Null
        Else
            outArray(iRow, iColumn - 1) = Mid(DATA(iLine, "WA"), iStart, iLength)
        End If

       Next

       iRow = iRow + 1
Next

RFC_READ_TABLE = outArray
Exit Function

abend:

RFC_READ_TABLE = Err.Description

End Function

Public Sub Paste_sheet1()
Dim lArray
Dim lAdjust As Long

lArray = RFC_READ_TABLE("KNA1", "KUNNR,NAME1,NAME2", "LAND1 = 'DE'")
If TypeName(lArray) = "String" Then
    MsgBox "Problem calling RFC is it here " & CStr(lArray)
Else

    ' adjust if zero based array
        If LBound(lArray, 1) = 0 Then lAdjust = 1 Else lAdjust = 0

    [Sheet1!A1].Resize(UBound(lArray, 1) + lAdjust, UBound(lArray, 2) + lAdjust) = lArray

End If

End Sub

Excel – as a fractional horsepower HTML5 server

You may have been wondering what’s the driving force behind the various changes I’ve made to HAMMER over the last few weeks,  namely threading support, a simple HTTP server and JavaScript. The driving force is to better position HAMMER (and through it, Excel) as a fractional horsepower HTTP server (see this post for more on fractional horsepower engines). Features such as threading and JavaScript are useful for many things; threading, for example, makes debugging scripts easier (see the Debug sheet and code in the sample InProcess_oData workbook) and also makes long running ETL processes easier to control and monitor. But, enabling the set-up of simple task-specific behind-the-firewall data servers, with as little ceremony as possible, is the ultimate goal.

But why, what purpose do these mini servers serve?

They’re obviously not intended as beyond-the-firewall public servers, they wouldn’t scale or be secure enough for such a task. Providing in-house feeds to other web enabled clients would be a more sutable task. For example, providing a feed from a “hub” workbook containing a PowerPivot model to other “spoke” workbooks (PowerPivot enabled or not) – a poor man’s alternative to doing the same via a SharePoint farm, if you like.

But it’s another seemingly unrelated technology that’s really sparked my interest in perfecting the fractional horsepower server: HTML5.

Generally when people think of HTML5 (if indeed they think of it at all), it’s mobile platforms that come to mind. (As it’s primarily Apple and Google, through their shared WebKit browser core, that have driven the development and adoption of HTML5). So what has this to do with Excel and the boring, but oh so profitable world, of corporate IT? Well, next time you’re in any spot where the global mobile workforce gathers, airport waiting lounges, hotel lobbies, etc. look at the technology kit that they’re using.

Only a few years back, the vast majority would have had a Windows laptop, if indeed they had any “data processing” device. Now, many, if not all, will either be using a smart-phone or a tablet device (iPhone or iPad but also increasingly Android powered phones/pads). All of these workers are still likely to have a laptop in a carry case or back in the hotel room, and certainly will have a laptop/desktop or their workplace desks. But on the move mobile is where it’s going.

So how do front-line datasmiths respond to this? Currently many of us build reporting solutions and really-simple-systems using Excel as the delivery agent, moving all or part of this to a mobile delivery agent will inevitably become increasingly attractive and/or demanded.

MS is already responding to this, e.g. PowerPivot and standard Excel spreadsheets are capable of being rendered via SharePoint’s Excel Services. But what if you don’t have access to a SharePoint farm, or you need a more robust UI, such as could currently be built using a VBA/.NET add-in? This is where HTML5 and fractional horsepower servers come in.

For me, there are two aspects of HTML5 that I think will make developing and deploying such “systems” possible and relatively easy:

It’s HTML5’s local storage, that’s makes a fraction horsepower server scenario possible. In traditional web apps, it’s assumed that:

  • 1st the client is always connected to a server,
  • and that the server provides both the layout (html, javascript, css) and all the data (REST APIs etc.) that the web app consumes.

Now with HTML5 apps, the client doesn’t need to be always connected to its main server or to its data server(s). It can go offline, or it can stay connected to its main server (perhaps a public-internet-facing S3 hosted domain), and every now and then make contact with one or more data servers (which can be safely positioned behind the firm’s firewall).

An example:

A firm’s Sales Reps come into the office every Friday for wash-up meetings, to record sale completions and to get their journey-plans for the following week.

Each rep has a desktop computer, where they interface with the firm’s various systems. One such set of “systems” are PowerPivot based models that report on the year’s forecasts and actual sales.  Part of the process of preparing for a sales visit is creating a set of sales reports for each customer to be visited, last year sales, this year’s targets, and so on, sourced from the various PowerPivot models. Although the production of the reports is largely automated via Excel macros, currently the resulting sheets have to be printed.

There’s been talk of company supplied laptops for years and the budget for them has now at long last materialised. The reps however, have expressed a preference for using iPads when customer-facing, mainly because the sales conversation often require not only presenting the prepared sales reports and charts but also flicking through many of the 100 odd product manuals. Being able to hand around an iPad with high quality glossy images (and videos) of this year’s new products, plus a sales projection chart for the same products, is, they contend, a winner.

A simple mobile sales reporting app is therefore developed (using the JoApp framework and Google’s Chart API and this pure JavaScript columnar database) to cater for the type of sales reports the reps require. The existing Excel automation code is enhanced with a HAMMER server. The reps new iPads’ web-apps are configured to automatically download prepared and ad-hoc reports when they log-in to the office network.

This has worked so well, the reps now want the ability to feed back sales target changes, that they also wish to record on their iPads via another really-simple-system, to their personal Sales Plan workbooks.

Is this as simple as using a “pure” spreadsheets solution, no, but it’s nearly as simple as building a VBA/.NET powered Excel application to do the same. The problem with many Excel “applications” is that they often push Excel beyond its “comfort zone”. The benefit of a hybrid solution like above, is that Excel gets used where it’s really useful and powerful (reports and models, data gathering and dispersal) while at the same time taking advantage of the freedom and cost-benefits (and fun!) of the emerging mobile web.

When HAMMER met SWF

I use the term “micro ETL” a lot when writing about tools such as HAMMER, but what do I mean by the term?

The ETL bit is easy to explain:

ETL, as all you data-warehousing and business intelligence folks will know, is the Extracting, Transformation and Loading of data from source systems into a reporting/data warehousing system. The techniques of ETL are not unique to the DW/BI worlds but are used anywhere transfers of data are needed between one computer system and another, for example, master data take-on for new systems or transactional interfaces between front and back-office systems – this is often referred to as DI (data integration) but is essentially the same problem domain.

So what’s the “micro” bit about?

You might assume that the micro adjective implies small or indeed tiny datasets, and in many cases you would be correct. Most final-mile data analysis, like politics, is local. Most business decisions along with their implementation and monitoring require ‘localised’ data. That data will be pre-filtered and summarised to some degree, but a fair degree of data shaping will still happen close to the decision makers. Excel is often the tool of choice when data gets to this stage.

HAMMER is optimised for this world, it sees the world how Excel sees it, but also adds the power of SQL and scripting languages to pick up where Excel stops. But enabling better Excel based data shaping is not HAMMER’s only function. It can operate outside of Excel (HAMMER.exe) and it can be used to craft task-specific ETL tools (HAMMER Inside). In both cases I continue to use Excel as my IDE, teasing out a problem before fixing it in code or in an external HAMMER call; and I can also use Excel as the UI for the end products.

In such scenarios, micro applies not so much to the datasets (which can be anything from tiny to very large) but to the concept of deploying simple micro “fractional horsepower” data engines to solve complex ETL, DI or RSS (Really SImple Systems) requirements.

HAMMER is built to take advantage of the distributed grid of powerful data crunchers (be that PCs, laptops, in-house cheap servers or just-in-time pay-as-you-go cloud-based CPUs) that every business, big or small, can now call on.

This revolution in distributed power is similar to what happened with the deployment of fractional horsepower AC-powered electrical motors in the last century. No longer was manufacturing restricted to “dark satanic mills” which had to be built close to natural power sources (water and later coal seams); and had to conform to the multi-story classic mill design to harness that captured power through belts, pulleys and shafts. With the expansion of the AC power grids (and the parallel expansion of internal combustion engine carrying roadways) the factory began to take on its modern single-story (or single story with mezzanine) distributed profile than can be seen everywhere from China to Cork. A similar landscape change is happening in IT.

HAMMER can take advantage of the “distributed engines” easily enough but the workflow, the actual control and distribution of tasks, data and decisions requires the ad hoc implementation  of either steam-powered or classic centralised server processes. I badly needed a more pre-built modular approach, micro Workflow to complement  micro ETL (and micro BI via PowerPivot ?), if you like. Last week I had started to think seriously about how/what to do about this (JSDB powered grids were featuring high on the list) when this appeared.

Perfect timing, Amazon’s SWF (Simple Workflow service) is exactly what I need!

SWF allows for the control and distributed deployment of stateless data processors. HAMMER was designed primarily as a stateless data processor (with state being persisted either in Excel or on disk as simple CSV/JSON flat files). Its default use of in-memory, rather than disk-based, SQLite assumes both abundant CPU and RAM (like is the case with your average 64bit laptop) and the existence of an external state-machine (which Excel and now SWF provide).

I’ve spent any spare time I had this week doing a deep dive into SWF and figuring out how HAMMER can take full advantage of this technology, not just for classic ETL, but for distributed decision control processes and RSS solutions. The result, in Dublin slang, is that I’m both “delira and excira” (delighted and excited). This is, to use that term again, yet another AWS game- changer.