Category Archives: Ireland

Building an Excel 2013 Percentile dashboard without PowerPivot or a PivotTable

At the end of my Playing DAX Percentiles on the mean (or is that median) Streets of Ireland post I suggested that plain old Excel (POE) might be a preferable alternative for this particular problem rather than using PowerPivot. In fact, for many Excel users, PowerPivot will only be an option when using a PC hosted client-side workbook. Excel 2013 will allow workbooks to be published on the web and also be accessible via native apps on Windows 8 tablets (and eventually iPad & Android native apps if rumours are to be believed).

But, PowerPivot (aka Data Model) functionality will only be available to those with an enterprise-class SharePoint licence (for web publishing) and not at all for native-app-deployed workbooks. So, knowing how to use POE to construct dashboards continues to be a skill worth having.

But how to construct a responsive percentile calculating dashboard sans PowerPivot?

You might think, no problem, I’ll use a traditional PivotTable; alas, like PowerPivot, it lacks the ability to calculate percentiles as standard; but, unlike PowerPivot, offers no method of constructing a DIY measure to do so!

Next up, you might look at the SUMIFS family of “pivot” functions, but they too are limited to the usual aggregates; and SUBTOTAL is likewise limited to the usual suspects.

Luckily, via the (black!) magic of array formulas there is a way.

In fact, the “trick” below when shown to me by a “civilian” datasmith many years ago, convinced me that I should perhaps invest some time getting my head around this powerful “array formula magic”.

If you’re already comfortable with array formulas and are wondering if DAX is too complex to master, don’t worry, you’ll have little trouble mastering DAX. Likewise, if you’re a DAX whiz, you should check out array formulas.

Basing the dashboard on the Property Register converted to an Excel Table (a 2007+ feature that many are still unaware of) enables the use of  Slicer selectors in Excel 2013 to quickly give a dashboard feel. It’s still possible to use the Excel Table filters directly on the table (the built-in date and text filters are particularly useful).

Also, in Excel 2013, a chart can be directly “animated” by a range/table without the need for a PivotTable cache, again making the building of “PivotTableLess” dashboards easier.

A lot of this can also be accomplished in sub 2013 versions of Excel (including, most importantly, the array formula “trick”) but Excel 2013 just makes it all so much easier and, of course, the ability for every user (from Home to Pro) to save and/or publish workbooks via “the cloud” is a major advance (big thanks to Google Docs, without you this might never have happened 😉 ).

UPDATE:

See David Hager comment, there’s a new 2010+ AGGREGATE function which has many more options than SUBTOTAL, including Percentiles, so you can ignore the trick below if using 2010/2013. The trick would still be useful to supply an array to a bespoke formula (I used it first to calculate a particular type of weighted average) or to functions such as IRR or XIRR, so still useful to know even in modern versions of Excel.

UPDATE:

See the TheDataSpecialist’s comment for a even better modern make-over of the SUBTOTAL(3, trick.

Below is the formula to calculate a median using only “visible” rows within a filtered table/range. Note: it’s an array formula, so it must be entered using the CTRL SHIFT and ENTER keys.

{=MEDIAN(IF(SUBTOTAL(3,OFFSET(Prices[Price],ROW(Prices[Price])-MIN(ROW(Prices[Price])),,1)),Prices[Price],””))}

I used this Excel formula beautifier to make it more readable.


MEDIAN
(
    IF
    (
        SUBTOTAL
        (
            3,
            OFFSET
            (
                Prices[Price],
                ROW
                (
                    Prices[Price]
                ) -
                MIN
                (
                    ROW
                    (
                        Prices[Price]
                    )
                ),
                 ,
                1
            )
        ),
        Prices[Price],
        ""
    )
)

This logic behind this is explained here, but essentially, the OFFSET() portion returns either a filtered range of one or zero rows (counted by the SUBTOTAL(3,…) part). if one ,it’s a visible row, if zero, it’s not visible, so ignore.

If this makes no sense (it didn’t to me the 1st time I saw it), then don’t worry, just make a note of it and use the trick in blind faith :).

You can see this in action in “the cloud” here as a read-only “Excel Web App”, (you can also download the workbook to see its internals).

If you’re reading this sometime in the future the above link may not work as it’s published using a beta version of Office 2103, here’s a direct link to the Excel 2013 workbook.

And if you haven’t yet installed Excel 2013, here’s a cut-down 2007/2010 version.

Advertisements

Playing DAX Percentiles on the mean (or is that median) Streets of Ireland

A significant event in Ireland’s property market happened this week, there’s now a publicly accessible register of all residential property sales.

This is something that should have always been available but became absolutely essential as Ireland’s property boom unravelled, with prices down 60% since 2007. Not just, to help potential vendors and purchasers find a price point in a collapsed market, but also for all householders, now that a self-accessed property tax is due mid next year.

The database is intended as a search & compare tool rather than a property index and is missing at lot of the dimensionality that a proper index would have (such as no of beds, square footage etc.). As such, it’s not an ideal PowerPivot candidate – only 50000 odd records (due to low market activity – 1% turnover V. the 4/5% of a healthy market) and a large text “high-cardinality” “fact field” (or degenerate dimension if you like) i.e. the house address; limiting xVelocity’s ability to compress the dataset.

Also, “normal” SUM() and AVERAGE() statistic functions are not that useful in analysing such data; percentile and counts tend to be more informative. PowerPivot can handle the counts but lacks a percentile function!

Nevertheless, I decided to let PowerPivot loose on it.

I first added a small amount of dimensional decoration. I grouped the counties into regions such as Dublin, Greater Dublin (Kildare,Meath,Wicklow), Border, Midlands, South Leinster, West, & Munster.

I also added a date dimension, nearly always a good idea. To that I added a MAT (moving annual) attribute as, in general, I wished to look only at the last 12 months (more realistic prices). Before that pricing seems to be all over the place as the market collapsed. This method of updating MAT (or YTD etc.) attributes to suit the “current period” is an old DW trick that’s still useful even now with DAX’s time intelligence functionality (particularly when the base DAX formulas are tending on the complex side, as below).

Next, concoct a Percentile formula.

My first attempt involved using the TOPN function to separate out the portion of the table “above” the percentile, then use MIN() to find the value. Worked, was fast; but then I noticed it only worked where the “top” was less that 5000’ish elements if it was greater than that, the function returned the whole table! A bug, no doubt, and not the use-case envisaged for the function, I suppose 😦

I then went back to using a formula that I had used before based on Marco Russo’s median example. But when I applied it to the model I noticed it was very slow, using 99% of a CPU and using a burst of memory approximately 3 times that of the workbook at rest!

Okay back to the drawing board, this time I decided to use another new PP V2 function, RANK.EQ. This worked and is fast. See below :

percentilePrice:=
if(HASONEVALUE(Percentile_Selector[What_Precentile]),
	MINX(
		FILTER(propertyprices,
				RANK.EQ(propertyprices[Price €], propertyprices[Price €])
				<=
				COUNTROWS(propertyprices)*(1-Values(Percentile_Selector[What_Precentile]))+0.5
		),
		propertyprices[Price €])
,blank())

It’s a real pity that Percentile-like functions are missing from the current version of DAX as the above code, while not terribly complex, is not exactly easy either. And, while perfectly suitable for the type of dataset in this case, it won’t always match Excel’s in-built Percentile functions, for more on how to simulate Excel’s percentiles see http://www.powerpivotpro.com/2012/05/percentile-measures-in-dax-errata/ (but be aware these examples use the bug-prone TOPN).

Maybe the lesson here, is, sometimes good old plain Excel works better (and quicker) than PowerPivot, horses for courses.

Here’s the Irish Property Register in PowerPivot workbook …

Data Wrangler

A few weeks ago I came across (thanks to @lismissData Wrangler; a very promising data cleansing tool from the Stanford Visualization Group. Not only is Data Wrangler a web-service (which the group intend to open source) but it also allows transformations to be “recorded” in either Python or JavaScript (see here). It was this Python scripting feature that really caught my attention; would be very useful to be able to hack away at a dataset using the service, then transfer the script to microETL’s PyScript to adjust and integrate with Excel and SQL.

The demo video and test datasets give a good overview of the tool but the proof of the pudding is in trying out some real world dirty data; I chose a fine example of the art of Freedom Of Information datasets, issued by a Republic of Ireland government department. As an example of how not to do something (unless your intention is to make the recipient regret asking for the FOI in the first place) this is excellent. (I suppose we should be grateful it’s in Excel not Word or PDF or even PowerPoint). You can download it here http://bit.ly/Ireland_FOI_example (the data as released is in the FOI sheet).

As I said, Data Wrangler is promising, but needs some more work (to be fair, the group warns it’s a work in progress). The tool choked on the FOI dataset, too many columns I think, so not ready for the real world yet but I’ll be keeping an eye on its progress. Don’t let my experience put you off, it looks more than capable of handling smaller but still quite messy datasets.

If you’ve downloaded my example workbook, you’ll see how I managed to cleanse the data using microETL’s Python & SQL scripting functionality (the PyScript is in the Python sheet, with the SQLScript in the Control sheet). I could have cleansed the data using pure Excel and some VBA  and perhaps I would have if this was a format requiring parsing on a regular basis; I could then save the transformation as a single file macro-enabled workbook, ideal for sharing, no need for add-ins etc. But it was a once-off, and even if it wasn’t, it’s quite likely the format supplied in answer to a subsequent FOI request would be different. This is the sort of work that microETL’s Python & SQL scripting is designed for; quick and dirty data wrangling, but with the ability ro persist, and modify the resulting transformations if so required.

If you wish to try out this example, there’s a new version of microETL (Alpha1.08) available for download. You’ll notice a new folder structure (the usual sub-folders are now under a single sub-folder call microETL) to make installation of the add-in somewhat neater; and there’s also a setup.xls that’ll do all the hard work of installing (and un-installing) the microETL add-in. Note: you still need to manually install Python 2.7 to enable the PyScript’ing functionality.

If you need help with your Excel, ETL or  data cleansing tasks, I can help.

PowerPivot Powered Budgets & Targets

Last week I used PowerPivot for the first time in a classic budgeting scenario. The existing, Excel based, system was straining at the seams due to recent changes in the organisation (merger, followed by lots of changes, resulting in the replacement of long-established reporting and budgeting hierarchies).

The budget process followed the, more or less, classic lines of using adjusted previous years figures to create high-level targets for coming years, agreeing those targets with various interests and then “driving” these high-level targets down various hierarchies to inform front-line staff what would be expected of them in the years to come.

PowerPivot did a marvellous job of providing the necessary figures required to set the initial budgets and to help inform those who must agree them. This involved lots of moving-annual-totals, percentage-increases and so on. A dream compared to the purely excel methods employed in the past.

The “driving down” logic was however kept outside PowerPivot, as the hierarchies involved were unbalanced in the main and the rules were complex but already existed and worked (and probably more importantly were understood and agreed by the various “interested parties”) in Excel. All that was required was the creation of the numerous “flattened cross-join tables” to support the existing logic. The various SQLite “hierarchy helpers” I detailed in my previous Handling Flat, Parent-Child and Nested Set Hierarchies post did most of the heavy lifting. As the process involved the “cross-join” of  hierarchies at various levels I used a great deal of “nested set” SQL to achieve the required result; simple enough, but did become tedious and made the resulting logic somewhat un-approachable for those with limited SQL. What I needed was another “helper function”. The CROSS_JOIN_HIER function was born!

FUNCTION: CROSS_JOIN_HIER

First argument is the hierarchy table name (see previous post for description of this table). The 2nd is the output table name. This table must already exist and have at least the same number of columns as the number of subsequent arguments.

The remaining arguments specify the source of the tables (single column lists) to cross-join in order to populate the output table. The arguments can be of three types.

  • The name of an existing table. This table must consist of single column named “Name”. This allows for complex (or perhaps, simple lists in a different sort order) to be generated outside the function call.
  • A request for a list of nodes from some level within a hierarchy. Such requests can consist of one of the following:
    • Integer between 0 and 99. If the column name associated with this argument (e.g. arg 1 implies column 1 of output table, arg 2 column 2 etc…) is the same name as a valid hierarchy, fetch all nodes at that level (e.g. if 0, then fetch top level nodes, if 1, 2nd level etc..). Allows for up to 99 levels (i.e max value  98). Level 99 is special, it’ll return all “leaf nodes” i.e. those nodes with no children.
    • aNode,(+ or -)integerValue e.g. “Beer,+2”, in this case fetch all nodes 2 levels below Beer in the hierarchy corresponding with column name. The value before the comma must be a valid node name (assumes names are unique within hierarchies). A value after the comma of “-1” will fetch parent node, a value of “-2” will fetch grandparent. In both “+” and “-” if the level to navigate down/up is greater than the levels available, the last available level is returned.
    • In both cases above, the name of the column can be over-ridden by prepending the hierarchy name such that: “Product,Beer,+1” will drive down 1 level from “Beer” in the Product hierarchy. And, “Product,99” will return a leaf nodes of the Product hierarchy.
  • A comma separated list to create a “manual” level. For example, “Budget,Actual”, “2010,2011,2012” or “Beer”

The resulting single column tables are then cross-joined with the resulting multi-column table (a column for each “source” argument) which is used to populate the output table.

This is a “stored procedure” like function, so should be called using “Select function(arg1,arg2 …);” syntax.

Example: click for larger version.

Download microETL from http://www.gobansaor.com/microetl and unpack to a folder, locate alberto_hier workbook (2007/2010 format) and go to the makeBudgetTable sheet.

The microETL project  is password protected; if you need access to the code just email me I’ll send you on the password.

PowerPivot & The Parable of the Snow Sock

For those of us in North West Europe this winter has been one of the coldest on record; not only cold but early, in this part of the world winter usually only kicks in after Christmas. The beauty of a perfect White Christmas Day did make up for the previous month of extreme cold and inconvenience but only just; and would not have at all if I had not invested in a simple piece of relatively new technology; a pair of snow socks.

For it was this pair of snow socks that enabled our household to continue to get to work and school, to Christmas shop, visit a sick relative and drive on Christmas Day to the winter wonderland that was a frozen Blessington Lake.

As the cold spell continued the main roads (and eventually the motorways) became hazardous to travel on; minor roads and housing estates became close to impassable for many normal cars. As the ability of the county council to keep roads functioning decreased (due to an early winter catching it on the hop with insufficient salt stocks and no Plan B) so my family and our neighbours realised than self-help was our best bet. People started investing in snow shovels, sand and salt, and hiring JCBs; gritting hills, clearing school and community hall carparks, helping dig out neighbours’ and strangers’ cars. And I invested in a pair of Weissenfels WeissSock Snow Socks (from http://www.micksgarage.ie/tyre-snow-socks.aspx).

Snow chains or even winter tyres are not seen as necessary by most drivers in Ireland or the UK; our winters (including this and last year’s nearly as cold one) are not that severe. Most are unaware of the alternative, the snow sock, a relatively new concept; in effect, a fabric snow chain. A fraction of the cost of winter tyres; easier to use (and nearly as effective) as snow chains.

Having a pair meant we had our own self-service road treating device, insuring we could safely get to the local town or nearest motorway and  when the M7 motorway was backed-up being able to use the old un-treated N7 road to bypass it.

PowerPivot is to Excel what a Snow Sock is to a car in a snowy winter. The ideas and a lot of the technology used in both products have been around for some time, but the accessibility, low-cost and “good enough” packaging are game changers.

Like our neighbourhood waiting on an over-stretched, under-funded county council to solve our “last mile” problems, countless small companies, departments and individuals in large organisations are likewise awaiting their IT Godots often in vain and if delivered, often too little and/or too late.

Two technologies that I’ve been introduced to in the last 12 months:

both liberating, both having the ability to make aspects of my life much better and much easier. I expect to see a lot more people adopt both in the year to come.

So if you’re stuck for some New Years’s resolutions this year:

  • Buy some snow socks now; i.e. do NOT wait until the next “snow event”!
  • Learn PowerPivot.

Happy New Year 😉

 

UPDATE: Looks like Marco & Alberto are doing one of their PowerPivot courses in Dublin this March, so no excuses for not learning PowerPivot http://sqlblog.com/blogs/marco_russo/archive/2011/02/15/new-powerpivot-workshop-dates-copenhagen-dublin-and-zurich.aspx