Category Archives: PowerPivot

Testing a DAX measure using Excel functionality – XIRR example

Power Pivot’s DAX is missing many of the Excel functions we’ve come to take for granted, but it’s often possible to simulate them using pure DAX, see here for my attempt at a DAX Percentile function.

Even if the logic behind such attempts is sound (or not), the result is likely to be judged against Excel’s version. Also, in many cases, due to the limitations of DAX, crafting an acceptable solution may not be possible. The attached workbook shows a method to either test, or replace, such a DAX measure using existing Excel functionality.

The example attempts to emulate the XIRR function in DAX and test the result against Excel’s version.

It uses a VBA User Defined Function which can “hook into” a pivot table which has been converted to static  CUBE functions. The UDF can then extract the “cross-tab” filters for each cell and generate a DAX Query call via ADO to get the filtered sub-set from the Power Pivot model. It then applies the Excel XIRR function to this data.

The DAX version of XIRR uses a brute force method to pass a range of rates to a NPV calculation, picking the average of the lowest return(s). In the example there are 3000 odd rates (-100% to 200% with one place of decimals so  …10.1%,10.2% …).

A smaller range would result in a faster measure, but be more likely to fail when the yield is outside that range. Likewise, increasing the range of rates, for example, by increasing the number of decimal points, would add to the time taken to calculate.

So, be warned, this is not a generic solution, it would need to be adjusted for the real-world yields that are likely to be encountered and perhaps modified to allow pre-calculation at lower levels and/or allow for restricted “likely-rate” ranges as a first pass, only going to a wider, slower range for outliers.

You’ll notice a definite lag when filtering the example, made worse by the use of CUBE functions which are not as efficient as single Pivot Table callbacks. So, irregardless of how correct or not the measure is, it’ll be a performance non-runner unless properly tuned. In fact, using the callback to Excel method as the solution rather than as the test, may end up as the only practical way to do this.

Here’s the Excel 2013 version, and here’s an Excel 2010 one.

NOTE: CUBE formulas used with pre-2013 PP Models reference “PowerPivot Data”, this will not be updated to “ThisWorkbooksDataModel” when the workbook is subsequently auto-upgraded to the Excel 2013 Data Model.

Below, is the [xirr] measure, with one supporting [npv] measure and a supporting calculated column. There’s also a calculated column [StandardisedDate] for use by the VBA UDF.

xirr:=IF(
ISBLANK(SUM(Cashflow[Cashflow])),
BLANK(),
CALCULATE(
AVERAGE(RATES[Rate] ),
FILTER(
RATES,
ABS(Cashflow[npv])
=
MINX(
RATES,
ABS(Cashflow[npv])
)
)
)
)

npv:=SUMX(
SUMMARIZE(
Cashflow,
Cashflow[DaysSince],
"grouped CashFlow",
sum(Cashflow[Cashflow])
),
[grouped CashFlow] /
IF(
Cashflow[DaysSince]=0,
1,
POWER(
1+ AVERAGE(RATES[Rate]),
(Cashflow[DaysSince]/365))
)
)

Cashflow[DaysSince]= trunc([Event Date]) - trunc(minx(FILTER(cashflow,( [Asset] = earlier([Asset]) && [Cashflow] <0 )),[Event Date]))

Cashflow[StandardisedDate] = =Date(1901,1,1)+[DaysSince]

Have fun.

Advertisements

A frown, or two – Data Explorer to PowerPivot gotchas.

As I continue my discovery of all things Data Explorer and M I’m not only becoming more impressed with the product but also, as I try more ambitious tasks, I’m hitting some, hopefully, beta edition teething problems. I would use the “frown” button but unfortunately it’s determined to force me to use and/or setup an Outlook Profile; too many steps, easier to blog about it 🙂

The example I picked this time is a spreadsheet response to an Irish Government FOI (Freedom Of Information) request on expenses paid within a department. The result is typical of many reporting tools, repeating column values blanked out to make it “easier to read” but also making it difficult to re-analyse. In essence, the report contains a flattened view of four tables, Payee, Claim, Claim Items & Claim Item Details. The task I set myself was to separate out the four tables and link them back together using a PowerPivot Data Model.

And here’s the M code ( and here’s an Excel 2013 workbook version of the example):

[
 Source = Excel.CurrentWorkbook(){[Name="foitable"]}[Content],
 SourceTable = Table.AddIndexColumn(Source,"RowNo",1),
 AddPayeeID = Table.FillDown(Table.AddColumn(SourceTable,"PayeeID", each if [Full Name] = null then null else [RowNo]),"PayeeID"),
 AddClaimID = Table.FillDown(Table.AddColumn(AddPayeeID,"ClaimID", each if [Claim Date] = null then null else [RowNo]),"ClaimID"),
 modSource = Table.FillDown(Table.AddColumn(AddClaimID,"ClaimItemID", each if [Start Date] = null then null else [RowNo]),"ClaimItemID"),
 PayeeTable = Table.SelectRows(Table.SelectColumns(modSource,{"Full Name","Grade","PayeeID"}), each [Full Name] <> null),
 ClaimsTable1 = Table.SelectRows(Table.SelectColumns(modSource,{"Claim Date","Payment Date","ClaimID","PayeeID","Status","Total Claimed","Amount Paid"}), each [Claim Date] <> null),
 ClaimsTable = Table.TransformColumns(ClaimsTable1,{{"Total Claimed",each if _ = "NULL" then 0 else _},{"Amount Paid",each if _ = "NULL" then 0 else _}}),
 ClaimsItemsTable = Table.SelectRows(Table.SelectColumns(modSource,{"Start Date","ClaimID","PayeeID","ClaimItemID","Expense Type","Euro Line Amount"}), each [Start Date] <> null),
 ClaimsItemsDetailsTable = Table.SelectRows(Table.SelectColumns(modSource,{"ClaimID","PayeeID","ClaimItemID","Quantity","Unit","Net Total (EUR)"}), each [#"Net Total (EUR)"] <> "NULL")
]

..again a fraction of the time of doing this using say Python and much more approachable for those with limited programming skills. To create a single table query from the above”foitables” query, simply create a new DE query with “source = foitables[tableNameRequired]”.

But, you need to be careful how you do this, otherwise you can easily lose work and have to repeat some import tasks again & again. So here’s a list of gotchas I encountered:

Gotcha No. 1

Don’t connect your PowerPivot internal table to a DE Workbook connection directly. At least, not until you’re sure you’ve no further changes to make to the DE “code”.

If you modify the code, in any way, the connection is no longer valid (looks like the code is attached as a Base64 encoded string to the connection) and all your hard(ish) work within PowerPivot will be lost.  Even if not adding DAX formulas, you will have to do work on all imported DE tables, not just creating relationships, but all fields are imported as text, so dates & currencies etc. will need to have types applied.

Only code changes, not normal refreshes, cause this destroy-the-connection action to happen. Although I’d isolated the 4  for-import tables from the likely-to-change multi-table source “record”, it was no matter, change the one-step-removed query or a for-import query, and the connection is toast.

How to get around this? Well, as stated above, you could just wait until the M “model” is finalised/tested before doing anything outside DE. If you’re using Excel 2010, then creating a “linked table” rather than using the connection directly will get around this, but you’ll end up with the usual “linked table” refresh awkwardness.

If you have 2013, then the solution is much better, again don’t use the connection directly, but use instead the “load directly to Data Model” option at DE design stage. This seems to create a different type of internal PowerPivot connection which is obviously related to the DE output table and which continues to be valid even after code changes are applied.

GOTCHA NO. 2

The other no-no, is to avoid messing with DE connection’s “SQL”, in fact even looking at it can cause problems (that might be an exaggeration, it was late). The default “Select * from [Query1]” is about the limit of it and even it it, if “touched” but left as the same code, causes the query to return no columns, or in some cases, only 1 column, go figure? This strange behaviour can be seen in both 2010/2013 PowerPivot. In Excel 2010, direct connections using a workbook table appear not to work at all. Again, Excel 2013 seems to be more stable, for example if you pick the “All DE Queries” connection as a table source in Excel 2010 it’ll crash Excel, while in 2013 it gives an error message.

GOTCHA NO. 3

The Refresh button on the Table Query Ribbon (or a implicit refresh when leaving M design mode) breaks the Excel calculation graph. If you’ve referenced the DE table in a formula (such as using Table1[#All]) the formula will be “nuked” with the dreaded #ref, meaning you’ll have to go back an amend each one. This doesn’t happen if you use the Data Ribbon’s Refresh or RefreshAll buttons. So again, make sure the M code is “working” before linking to it in a formula, and even then, do not use the now-working-table’s Table Query Refresh button, ever, ever again!

M’s back – Create a PowerPivot Time Dimension using Excel’s PowerQuery

When I first heard of PowerQuery (aka Data Explorer in the past), the new ETL-like addin for Excel 2010/2013, I was only so-so impressed; likely to be yet another UI based ETL tool that does the easy bits, and one or two “packaged” hard bits, but  chokes when a real hard-case needs to be tackled!

That was before it became clear, that behind the UI facade lay a powerful and elegant functional language called M (yep, the same M that first appeared in Project Oslo). If data shaping & cleansing is what you do, this you’ve got to see.

To try it out, I picked on that perennial of BI data models, the Date (aka Time) Dimension. If you do any serious work with PowerPivot you’ll at some stage need such a table, for why, see here. I used the same variation on an old SQL trick as in creating a Date Dimension using DAX, but this time in M.


let
    days={1..31},
    months={1..12},
    years={2009..2015},
    monthLU={"Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"},
    dayT=Table.FromList(days,each {_},{"Day"}),
    monthT=Table.FromList(months,each {_},{"Month"}),
    yearT=Table.FromList(years,each {_},{"Year"}),
    dayJ = Table.AddColumn(dayT, "joiner", each 1),
    monthJ = Table.AddColumn(monthT, "joiner", each 1),
    yearJ = Table.AddColumn(yearT, "joiner", each 1),
    yearmonth=Table.Join(yearJ,"joiner",monthJ,"joiner"),
    yearmonthday=Table.Join(yearmonth,"joiner",dayJ,"joiner"),
    addDate=Table.AddColumn(yearmonthday,"Date",each let x = try #date([Year],[Month],[Day])
                                                     in if x[HasError]
                                                     then #date(1,1,1)
                                                     else x[Value]),
    removeErrors=Table.SelectRows(addDate,each not([Date]=#date(1,1,1))),
    cleanCols=Table.SelectColumns(removeErrors,{"Date","Year","Month"}),
    renameCols = Table.RenameColumns(cleanCols,{"Month","MonthNo"}),
    addtextDate = Table.AddColumn(renameCols,"TextDate",each Date.ToText([Date],"YYYYMMDD")),
    addQtr = Table.AddColumn(addtextDate,"Qtr",each "Qtr" & Text.From(Date.QuarterOfYear([Date]))),
    DateDim = Table.AddColumn(addQtr,"Month",each monthLU{[MonthNo]-1})
in
    DateDim

PowerQuery’s Table function library appears not to have a cross join, that’s why I needed to add a “joiner” column to the three tables, and it took me a little while to get my head around M’s error handling; but other than that, even with my current elementary knowledge of M and DE’s function library, it was both a quick & a friction-free exercise (helped by the step by step try/feedback build-loop, very similar to working a problem with Excel).

If Data Explorer (aka PowerQuery) is new to you have a look at Chris Webb’s excellent series of articles on the subject.

The Model of a very modern PowerPivot – without the add-in

My last post demonstrated a method by which explicit PowerPivot DAX measures could be created in any version of Excel 2013 without  the PowerPivot add-in. Here’s another method, this time using Excel 2010’s PowerPivot add-in (available for all versions of 2010) to design the model and the OpenXML SDK to copy the resulting model to an Excel 2013 workbook.

I had thought originally that this wasn’t possible; when I first tried it, the model copied OK but the workbook connection couldn’t access it. Turned out, this was due to using a V1 model that although upgraded to V2 still used the “Sandbox” cube name rather than the now expected “Model” name. So, all models must be created in PP V2 for this to work.

Also, the “receiving” Excel 2013 workbook must be primed with a data model (easiest way without the add-in is to create two tables and link via relationships option, or just use the Model_2013_template workbook provided).

You can download a POC workbook Example_CopyModels_no_Addin demonstrating this method from here. 

Note:

You’ll need to modify the paths of the example workbooks to suit your environment.

Also, if you copy the workbook out of the downloaded folder, make sure to copy the datasmith xll of the right “bitness” and copy the DocumentFormat.OpenXml.dll or ensure one is available on the GAC or PATH.

Before using the datasmith xll for the first time, right-click and unblock, this can also be a good idea for workbooks with external connections, otherwise Excel can sometimes keep certain functionality “locked” 1st time you access such files.

Having copied the model into the Excel 2013 workbook, save & re-open before making any further changes.

If you look  at the code you’ll see it’s also possible to copy models for 2010 to 2010 and from 2013 to 2013 or indeed from 2013 back to 2010. This general coping of models from workbook to workbook was the use-case I had in mind when I first discovered this (not as a way of getting around lack of add-in). Very useful during model development allowing different views to be tried against a first developed (or commonly used) data model. Could also be used when live to allow a single model-refresh to be used by many “view” workbooks.

Have fun.

Look’s like Star Schema is back on the menu!

looks-like-meats-back-on-the-menu-boysAs the release date for Excel 2013 gets nearer (in fact Home Premium has already arrived) the pricing and package offering are becoming clear(er). For many of us the emerging picture is not pretty, mainly due to a u-turn on Microsoft’s part i.e. removing the PowerPivot add-in as an option for most low-end Office 365 packages and ALL retail versions.

Rob Collie goes into the likely back story (and politics) behind such an obviously (well to those of us at the Excel coal face) stupid move. MS not only led some of her most ardent evangelists up the proverbial “garden path” with PowerPivot under Excel 2010 (never before has a major feature been so restricted in a subsequent version) but also gave a false and misleading impression of the new Office 365 packages during the current trial period (i.e. where all versions have the PP add-in (and PowerView!) as an option). Being lied to is the expression the keeps coming to mind.

250px-Gandalf_the_White_returnsThere is, however, a silver lining, Excel 2010’s Data Model. If I had never seen the power of DAX Measures (which is, in essence, what lack of the add-in deprives you of) I would be completely bowled over by the data handling power of the Data Model (it is, in effect, the Vertipaq Columnar Engine with all its power and glory – minus the ability to create (but can consume) explicit DAX measures & columns ). But I have, so hence my disappointment 😦

But even without the add-in, the Data Model opens up a whole world of data munging potential.

At its simplest, it makes “VLookup hell” a thing of the past. This alone, as any who have laboured under the tyranny of VLookup (and her cousins Index & Match) will agree, is a huge reason to upgrade to Excel 2013.

Also, all versions of Excel (OK, not all: RT and basic Web Apps do not support the DataModel) can consume a DAX-rich model produced by somebody else who has access to the add-in. Now before you get too excited, models produced by the free and widely available 2010 version must still be upgraded by the 2013 version before they can be used by 2013’s Data Model. UPDATE:  Previous statement not quite true, it is possible to transfer the binary data file holding the model from 2010 to 2013 (but must be a V2 generated model, V1s or V2s upgraded from V1, will not work).  Have done a POC and will likely add this facility as a public command to HAMMER in the near future.

It is also possible to create DAX Measures and even Calculated Columns against all Data Model enabled versions using an AMO powered .NET add-in. Last week, being a hacker at heart, I created a POC using Python within my HAMMER micro ETL toolset to do just that. No PowerPivot add-in required to populate even the humblest Excel version with sophisticated DAX models!

And there’s more!  You also use DAX Queries to generate tabular output from a plain Data Model, see Chris Webb’s example. Again, a PP-add-in-free-zone

But why the title, why are Star Schema’s back on the menu and were they ever off the menu?

In this post (and the related posts linked to within) I’ve argued that the strict “laws and practices” of dimensional modelling (as popularised by Kimball et al.) can be relaxed (and often ignored).  The world that spawned such ‘best practices’ was not a world of DAX powered, or even MDX powered, data models, it was one where the RDBMS’s version of a hyper-cube – the start schema – ruled.

The only tool was SQL (with its limited aggregate functions) and the single-issue star-schema was the only viable data structure. But sophisticated summary tables techniques (requiring an army of DBAs)  and new RDBMS optimizers were needed to get a reasonable response time. Plus, much of the work (and most of the cost) was in the ETL phase. That’s the world I cut my BI teeth in. How we would have wished for a modern columnar engine! And now, 20 or so years later, such a thing exists, in a spreadsheet! So dust off your Kimball Dimensional modelling books, the plain-old ETL’d star-schema is back in town!

But what about the ETL required to build such pure stars? Again, add-in-free DAX Query to the rescue! Like this, to create a Time Dimension. Dimensional data could easily be enhanced (e.g. adding ABC type attributes) using similar techniques.

Fact data could be more of a problem (due to the 1M row limit of an Excel table – the output medium of Excel-based DAX Queries). But again, if you have access to even the most basic of relational databases (like the inbuilt SQLite engine in  HAMMER) even the shaping of very large fact tables can be handled. For more demanding requirements maybe use AWS’s Data Pipeline  “ETL-in-the-cloud”?

Anybody with experience of using pure stars will of course be aware of the downside i.e. the restrictive nature of the end result. Such data models tend to deal with a limited number of “questions” especially compared to DAX-destined data models (which can be very close to their OLTP sources in structure).

But that can also be an upside, depending on the likely end-users of the “cube”. The simple single-issue pure star-schema is often better suited to an environment where most are “consumers” rather than “explorers” of data.

In any case, it would be nice to have the option to service both end-user types. So please Microsoft, at the very minimum, revert back to the 2010 scenario i.e. PowerPivot as  downloadable add-in. If you do, the phrase “Oh, it’s just Excel” will for the history books.

UPDATE: Aug 2013

MS have seen sense! Excel 2013 Standalone Retail version, now comes with both PowerPivot and Power Query (aka Data Explorer) add-ins. http://support.microsoft.com/kb/2817425