Why use a dimensional data model for PowerPivot? This is not about why one should, or should not, convert an OLTP model into a dimensional one, this post looks at the why from another direction i.e. morphing a “flat” table model as used to feed traditional Excel Pivot Tables (and other pivoting platforms) into a more PowerPivot/DAX friendly format.
Many of the applications of PowerPivot I come across are simply barely modified traditional Pivot Tables migrated to PowerPivot. I probably come across a lot these due to automate PowerPivot refresh code, as having make the cross-over it then dawns that the Excel Object model doesn’t support PowerPivot automation (something that Excel 2013 thankfully addresses), so many come to me looking for help automating the process.
But why move such models to PowerPivot? The usual reasons are PowerPivot’s ability to handle very large datasets and to take advantage of all that DAX offers, not just for measures, but also for ETL purposes.
Migrants from Pivot Table land, quickly come to appreciate the in-built relationship functionality as a relief from the drudgery (and error potential) of VLOOKUP and INDEX/MATCH but often simply use it to continue to produce single flat tables. Eventually, the visual benefit of having separate table “containers” for different types of data is noticed. But this separation of data into dimensions is not just a categorisation trick it’s much more.
The Time Intelligence Functions are particularly good at bringing DAX window-shoppers into the store. And it’s here that the first real need for a “dimension” raises its head. You can get away without a Date (aka Time) Dimension and many did/do (mainly due to the fact that in the early days of PowerPivot the “rules” weren’t very clearly spelled out). But, to make best use of these functions, you need to follow these rules:
- Create a Date Dimension, see here for a definition and the DAX to create one.
- Ensure any relationships between this table and others are based on Date columns (not Date +Time columns!). Or if using >V1, the option to mark the table as a date table is available; this allows non date columns to be used as Prime/Foreign keys (as would be the norm in traditional Data Warehouses).
- Never (well hardly ever) use the Date column on the many side of the relationship (usually a “fact” table”) as the date parameter for Time Intelligence functions; use the Date Dimension’s date column instead.
The need for a no-gaps list of dates is due to the fact that in most models the Date dimension is a so-called sparse dimension i.e. not every date will have a transaction recorded against it.
So why does that matter?
By using this guaranteed no gaps list the Time Intelligence functions (mainly “windowing” functions like Year To Date (YTD), Moving Annual Totals (MAT) etc.) can be sure to have valid data point to work against no matter what other selections are in force.
So, for example, if a particular combination of selectors showed sales for Jan through May but no sales for the rest of the year, a Year To Date (YTD) measure would show the cumulative YTD figures up to May but would also continue to show values for the remaining months of the year as long as the date referenced is on a Date Dimension and not on a “fact” table. In SQL terms, the date dimension table is outer joined to the fact table, always ensuring a date value (usually the max date value in the current selection) even if no fact values. If however, the date on the fact table is used, those months with no sales (and no rows) would show YTD as blank.
Also, the Time Intelligence functions perform a hidden trick. When such a function realises that the date column fed to it is the unique date column key of a table it implicitly applies an ALL() against that table. This has the effect of removing any other date filters so that the “window” can be extended outside the current date context.
The ability to use an ALL() filter against a set of known and not yet known attributes grouped by a logical “dimension” is also the primary reason for ensuring other dimensional grouping such as customer, product etc. are separated out into their own tables. It is of course possible to produce the same result on a flat table by specifying ALL() against each column that represents a particular dimension within the base table. The problem with this approach is, as a model matures or is copied and modified it is very common to add new attributes to a dimension, without the “ALL() against a table” trick, each measure using that dimension would need to be checked and possibly modified every time an attribute is added (or deleted).
Most major dimensions are sparse like the Date Dimension. Their members will not always be represented within all possible slices of the dataset. Being able to create measures that, for example, measure the ratio of customers that purchased this month against total number of customers is much easier with separate dimensional tables.
YTD being the goody-goody following all the rules works as expected.
YTD_BasedOnFactDate tries to be smart, uses the date on the Orders table and explicitly applies ALL(DateDim) to expand its time window. Works but not when the month contains no transactional data.
YTD_BadVersion ignores all the rules and doesn’t work.