DAX’s back – Create DAX Measures in Excel 2013 without PowerPivot Add-in

I mentioned in my last post that it was possible to create DAX measures (and indeed Calculated Columns, KPIs etc.) without using the PowerPivot add-in but instead using AMO to do so. The facility has always existed in HAMMER but only as a hidden option that I often used to iterate quickly through a set of measure variations (or load standard sets) during model development; but I wasn’t 100% convinced it was worth the effort to make it more production quality in order to add it to HAMMER’s public interface. But, with the extraordinary removal of PowerPivot add-in as an option for all retail versions and all O365 SME versions of Office 2013, I decided to revisit the code and make it available as a new HAMMER command “PPMAKEMEASURES”.

This zip file contains an example of an Excel 2013 workbook (drawing data from a Northwind oData public feed) with DAX measures created by HAMMER rather than via the PP add-in.

The HAMMER formula and associated “list-of-measures” table can be found within.To use the formula on this, or another workbook, you need to activate HAMMER.xll (or HAMMER-64.xll, if using Excel 64bit) by either opening like a normal workbook (will not be installed as a permanent add-in) or put it in your add-ins folder and register as per a normal XLL (you can also use VBA’s registerDll functionality for automatic only-this-instance registering).

The first time you apply this method of DAX measure creation to a workbook, you should save, close & re-open the workbook; otherwise, you run the risk of the created measures being overwritten if you subsequently create an implicit measure (by moving a field to the Values section of a pivot table). If saved, this will no longer be a problem. If you use this in Excel 2010 you’ll always need to save,close & reopen in order to see the new measures. (Also, in 2010 you’ll need to precede any PP commands with the PPCONNECTION command e.g. =HAMMER(“PPCONNECTION”,Table2[#All],”PPMAKEMEASURES”) UPDATE: It appears you also need the PPCONNECTION in 2013 although I don’t appear to need it in my O365 version!).

To get a list of DAX Measures within a workbook use the PPLISTMEASURES command (e.g. =HAMMERToRange(“PPCONNECTION”,”PPLISTMEASURES,”Sheet2!A1″) will paste a table to Sheet2 starting at cell A1). The table will be in the format required by the PPMAKEMEASURES command and is intended as a quick way of transferring measures from one workbook (even if 2010) to another. (Note: this will list both implicit and explicit measures, best avoid creating implicit measures using this method!)

The table is position based (i.e. heading values not important, just as long as there is a heading line); columns represent – Table Name, Measure Name, DAX, Format and Visible. If the 3rd column, i.e. the DAX expression, is blank, the measure will be deleted, otherwise, any existing measures will be deleted and replaced with the new definitions.

This has only been tested against  O365 versiond of Excel 2013, retail versions may be different. I needed to hard-code the location of the AMO & ADOMD dlls (unlike 2010 they’re no longer in GAC and are not on a PATH – well, likely they are, but only in the virtual file system now used by Office).

The location for the AMO library for 32bit O365 Home Premium is  “C:/Program Files/Microsoft Office 15/root/vfs/ProgramFilesCommonX86/Microsoft Shared/OFFICE15/DataModel/Microsoft.Excel.AMO.dll” let me know if your setup is different. (UPDATE: As Marco discovered, enterprise and most likely retail versions do not use VFS, I‘ll post a fix to this tomorrow). UPDATE: PATH issues with none O365 versions should now be fixed.

If you’re a SME or sole-trader and you intend to use PowerPivot extensively you really need to purchase (as I have) the O365 Professional Pro subscription for your “datasmiths”. The rest of your crew can use Retail or O365 offerings (as they can still “consume” DAX powered models). But if, for some reason, you can’t or won’t purchase ProPlus, or you just want to take advantage of the fast build-redesign-build cycle offered by PPMAKEMEASURES, then hopefully, this will be of use.

Oh, and Happy St. Patrick’s day!

Advertisements

Look’s like Star Schema is back on the menu!

looks-like-meats-back-on-the-menu-boysAs the release date for Excel 2013 gets nearer (in fact Home Premium has already arrived) the pricing and package offering are becoming clear(er). For many of us the emerging picture is not pretty, mainly due to a u-turn on Microsoft’s part i.e. removing the PowerPivot add-in as an option for most low-end Office 365 packages and ALL retail versions.

Rob Collie goes into the likely back story (and politics) behind such an obviously (well to those of us at the Excel coal face) stupid move. MS not only led some of her most ardent evangelists up the proverbial “garden path” with PowerPivot under Excel 2010 (never before has a major feature been so restricted in a subsequent version) but also gave a false and misleading impression of the new Office 365 packages during the current trial period (i.e. where all versions have the PP add-in (and PowerView!) as an option). Being lied to is the expression the keeps coming to mind.

250px-Gandalf_the_White_returnsThere is, however, a silver lining, Excel 2010’s Data Model. If I had never seen the power of DAX Measures (which is, in essence, what lack of the add-in deprives you of) I would be completely bowled over by the data handling power of the Data Model (it is, in effect, the Vertipaq Columnar Engine with all its power and glory – minus the ability to create (but can consume) explicit DAX measures & columns ). But I have, so hence my disappointment 😦

But even without the add-in, the Data Model opens up a whole world of data munging potential.

At its simplest, it makes “VLookup hell” a thing of the past. This alone, as any who have laboured under the tyranny of VLookup (and her cousins Index & Match) will agree, is a huge reason to upgrade to Excel 2013.

Also, all versions of Excel (OK, not all: RT and basic Web Apps do not support the DataModel) can consume a DAX-rich model produced by somebody else who has access to the add-in. Now before you get too excited, models produced by the free and widely available 2010 version must still be upgraded by the 2013 version before they can be used by 2013’s Data Model. UPDATE:  Previous statement not quite true, it is possible to transfer the binary data file holding the model from 2010 to 2013 (but must be a V2 generated model, V1s or V2s upgraded from V1, will not work).  Have done a POC and will likely add this facility as a public command to HAMMER in the near future.

It is also possible to create DAX Measures and even Calculated Columns against all Data Model enabled versions using an AMO powered .NET add-in. Last week, being a hacker at heart, I created a POC using Python within my HAMMER micro ETL toolset to do just that. No PowerPivot add-in required to populate even the humblest Excel version with sophisticated DAX models!

And there’s more!  You also use DAX Queries to generate tabular output from a plain Data Model, see Chris Webb’s example. Again, a PP-add-in-free-zone

But why the title, why are Star Schema’s back on the menu and were they ever off the menu?

In this post (and the related posts linked to within) I’ve argued that the strict “laws and practices” of dimensional modelling (as popularised by Kimball et al.) can be relaxed (and often ignored).  The world that spawned such ‘best practices’ was not a world of DAX powered, or even MDX powered, data models, it was one where the RDBMS’s version of a hyper-cube – the start schema – ruled.

The only tool was SQL (with its limited aggregate functions) and the single-issue star-schema was the only viable data structure. But sophisticated summary tables techniques (requiring an army of DBAs)  and new RDBMS optimizers were needed to get a reasonable response time. Plus, much of the work (and most of the cost) was in the ETL phase. That’s the world I cut my BI teeth in. How we would have wished for a modern columnar engine! And now, 20 or so years later, such a thing exists, in a spreadsheet! So dust off your Kimball Dimensional modelling books, the plain-old ETL’d star-schema is back in town!

But what about the ETL required to build such pure stars? Again, add-in-free DAX Query to the rescue! Like this, to create a Time Dimension. Dimensional data could easily be enhanced (e.g. adding ABC type attributes) using similar techniques.

Fact data could be more of a problem (due to the 1M row limit of an Excel table – the output medium of Excel-based DAX Queries). But again, if you have access to even the most basic of relational databases (like the inbuilt SQLite engine in  HAMMER) even the shaping of very large fact tables can be handled. For more demanding requirements maybe use AWS’s Data Pipeline  “ETL-in-the-cloud”?

Anybody with experience of using pure stars will of course be aware of the downside i.e. the restrictive nature of the end result. Such data models tend to deal with a limited number of “questions” especially compared to DAX-destined data models (which can be very close to their OLTP sources in structure).

But that can also be an upside, depending on the likely end-users of the “cube”. The simple single-issue pure star-schema is often better suited to an environment where most are “consumers” rather than “explorers” of data.

In any case, it would be nice to have the option to service both end-user types. So please Microsoft, at the very minimum, revert back to the 2010 scenario i.e. PowerPivot as  downloadable add-in. If you do, the phrase “Oh, it’s just Excel” will for the history books.

UPDATE: Aug 2013

MS have seen sense! Excel 2013 Standalone Retail version, now comes with both PowerPivot and Power Query (aka Data Explorer) add-ins. http://support.microsoft.com/kb/2817425

 

Níl aon Analytics mar d’Analytics féin.

Nollaig Shona Daoibh

Nollaig Shona Daoibh

The title is a play on the well know Irish seanfhocal (saying): “Níl aon tinteán mar do thinteán féin.” – “There’s no hearth (tinteán) like your own hearth”.

Apt for this time of year when families come together to celebrate Christmas.

Analytics ( aka Business Intelligence – even though I much prefer the term Decision Support above either term) is like politics and family, local and highly context sensitive. If it wasn’t, every ERP/CRM or packaged software application seller would have negated the reason for it using their respective “reporting modules” . They haven’t, and they wouldn’t, because although you bend your business to fit around such “best practice” and/or “good enough”  solutions your business (like your family) is unique. The standard patterns are there but the reality on the ground is always, always, different.

So if this Christmas you want to give a gift to that special datasmith in your life, introduce them to PowerPivot by buying them this book.

This is the book we should have had 2 or 3 years ago, it would have saved all of us early adopters a lot of pain and experimentation. Up until this book, anytime a “civilian datamsmith” would ask for a recommendation on a resource to help get started with DAX and PowerPivot I’d been at a loss (never really a problem when BI professionals asked the same question, as the Italians had that area well covered). But now, no hesitation,  if you want to master DAX within Excel, this is the book for you.

Nollaig Shona Daoibh, a Merry Christmas to you all.

Building an Excel 2013 Percentile dashboard without PowerPivot or a PivotTable

At the end of my Playing DAX Percentiles on the mean (or is that median) Streets of Ireland post I suggested that plain old Excel (POE) might be a preferable alternative for this particular problem rather than using PowerPivot. In fact, for many Excel users, PowerPivot will only be an option when using a PC hosted client-side workbook. Excel 2013 will allow workbooks to be published on the web and also be accessible via native apps on Windows 8 tablets (and eventually iPad & Android native apps if rumours are to be believed).

But, PowerPivot (aka Data Model) functionality will only be available to those with an enterprise-class SharePoint licence (for web publishing) and not at all for native-app-deployed workbooks. So, knowing how to use POE to construct dashboards continues to be a skill worth having.

But how to construct a responsive percentile calculating dashboard sans PowerPivot?

You might think, no problem, I’ll use a traditional PivotTable; alas, like PowerPivot, it lacks the ability to calculate percentiles as standard; but, unlike PowerPivot, offers no method of constructing a DIY measure to do so!

Next up, you might look at the SUMIFS family of “pivot” functions, but they too are limited to the usual aggregates; and SUBTOTAL is likewise limited to the usual suspects.

Luckily, via the (black!) magic of array formulas there is a way.

In fact, the “trick” below when shown to me by a “civilian” datasmith many years ago, convinced me that I should perhaps invest some time getting my head around this powerful “array formula magic”.

If you’re already comfortable with array formulas and are wondering if DAX is too complex to master, don’t worry, you’ll have little trouble mastering DAX. Likewise, if you’re a DAX whiz, you should check out array formulas.

Basing the dashboard on the Property Register converted to an Excel Table (a 2007+ feature that many are still unaware of) enables the use of  Slicer selectors in Excel 2013 to quickly give a dashboard feel. It’s still possible to use the Excel Table filters directly on the table (the built-in date and text filters are particularly useful).

Also, in Excel 2013, a chart can be directly “animated” by a range/table without the need for a PivotTable cache, again making the building of “PivotTableLess” dashboards easier.

A lot of this can also be accomplished in sub 2013 versions of Excel (including, most importantly, the array formula “trick”) but Excel 2013 just makes it all so much easier and, of course, the ability for every user (from Home to Pro) to save and/or publish workbooks via “the cloud” is a major advance (big thanks to Google Docs, without you this might never have happened 😉 ).

UPDATE:

See David Hager comment, there’s a new 2010+ AGGREGATE function which has many more options than SUBTOTAL, including Percentiles, so you can ignore the trick below if using 2010/2013. The trick would still be useful to supply an array to a bespoke formula (I used it first to calculate a particular type of weighted average) or to functions such as IRR or XIRR, so still useful to know even in modern versions of Excel.

UPDATE:

See the TheDataSpecialist’s comment for a even better modern make-over of the SUBTOTAL(3, trick.

Below is the formula to calculate a median using only “visible” rows within a filtered table/range. Note: it’s an array formula, so it must be entered using the CTRL SHIFT and ENTER keys.

{=MEDIAN(IF(SUBTOTAL(3,OFFSET(Prices[Price],ROW(Prices[Price])-MIN(ROW(Prices[Price])),,1)),Prices[Price],””))}

I used this Excel formula beautifier to make it more readable.


MEDIAN
(
    IF
    (
        SUBTOTAL
        (
            3,
            OFFSET
            (
                Prices[Price],
                ROW
                (
                    Prices[Price]
                ) -
                MIN
                (
                    ROW
                    (
                        Prices[Price]
                    )
                ),
                 ,
                1
            )
        ),
        Prices[Price],
        ""
    )
)

This logic behind this is explained here, but essentially, the OFFSET() portion returns either a filtered range of one or zero rows (counted by the SUBTOTAL(3,…) part). if one ,it’s a visible row, if zero, it’s not visible, so ignore.

If this makes no sense (it didn’t to me the 1st time I saw it), then don’t worry, just make a note of it and use the trick in blind faith :).

You can see this in action in “the cloud” here as a read-only “Excel Web App”, (you can also download the workbook to see its internals).

If you’re reading this sometime in the future the above link may not work as it’s published using a beta version of Office 2103, here’s a direct link to the Excel 2013 workbook.

And if you haven’t yet installed Excel 2013, here’s a cut-down 2007/2010 version.

Playing DAX Percentiles on the mean (or is that median) Streets of Ireland

A significant event in Ireland’s property market happened this week, there’s now a publicly accessible register of all residential property sales.

This is something that should have always been available but became absolutely essential as Ireland’s property boom unravelled, with prices down 60% since 2007. Not just, to help potential vendors and purchasers find a price point in a collapsed market, but also for all householders, now that a self-accessed property tax is due mid next year.

The database is intended as a search & compare tool rather than a property index and is missing at lot of the dimensionality that a proper index would have (such as no of beds, square footage etc.). As such, it’s not an ideal PowerPivot candidate – only 50000 odd records (due to low market activity – 1% turnover V. the 4/5% of a healthy market) and a large text “high-cardinality” “fact field” (or degenerate dimension if you like) i.e. the house address; limiting xVelocity’s ability to compress the dataset.

Also, “normal” SUM() and AVERAGE() statistic functions are not that useful in analysing such data; percentile and counts tend to be more informative. PowerPivot can handle the counts but lacks a percentile function!

Nevertheless, I decided to let PowerPivot loose on it.

I first added a small amount of dimensional decoration. I grouped the counties into regions such as Dublin, Greater Dublin (Kildare,Meath,Wicklow), Border, Midlands, South Leinster, West, & Munster.

I also added a date dimension, nearly always a good idea. To that I added a MAT (moving annual) attribute as, in general, I wished to look only at the last 12 months (more realistic prices). Before that pricing seems to be all over the place as the market collapsed. This method of updating MAT (or YTD etc.) attributes to suit the “current period” is an old DW trick that’s still useful even now with DAX’s time intelligence functionality (particularly when the base DAX formulas are tending on the complex side, as below).

Next, concoct a Percentile formula.

My first attempt involved using the TOPN function to separate out the portion of the table “above” the percentile, then use MIN() to find the value. Worked, was fast; but then I noticed it only worked where the “top” was less that 5000’ish elements if it was greater than that, the function returned the whole table! A bug, no doubt, and not the use-case envisaged for the function, I suppose 😦

I then went back to using a formula that I had used before based on Marco Russo’s median example. But when I applied it to the model I noticed it was very slow, using 99% of a CPU and using a burst of memory approximately 3 times that of the workbook at rest!

Okay back to the drawing board, this time I decided to use another new PP V2 function, RANK.EQ. This worked and is fast. See below :

percentilePrice:=
if(HASONEVALUE(Percentile_Selector[What_Precentile]),
	MINX(
		FILTER(propertyprices,
				RANK.EQ(propertyprices[Price €], propertyprices[Price €])
				<=
				COUNTROWS(propertyprices)*(1-Values(Percentile_Selector[What_Precentile]))+0.5
		),
		propertyprices[Price €])
,blank())

It’s a real pity that Percentile-like functions are missing from the current version of DAX as the above code, while not terribly complex, is not exactly easy either. And, while perfectly suitable for the type of dataset in this case, it won’t always match Excel’s in-built Percentile functions, for more on how to simulate Excel’s percentiles see http://www.powerpivotpro.com/2012/05/percentile-measures-in-dax-errata/ (but be aware these examples use the bug-prone TOPN).

Maybe the lesson here, is, sometimes good old plain Excel works better (and quicker) than PowerPivot, horses for courses.

Here’s the Irish Property Register in PowerPivot workbook …