Tag Archives: Date Dimension

M’s back – Create a PowerPivot Time Dimension using Excel’s PowerQuery

When I first heard of PowerQuery (aka Data Explorer in the past), the new ETL-like addin for Excel 2010/2013, I was only so-so impressed; likely to be yet another UI based ETL tool that does the easy bits, and one or two “packaged” hard bits, but  chokes when a real hard-case needs to be tackled!

That was before it became clear, that behind the UI facade lay a powerful and elegant functional language called M (yep, the same M that first appeared in Project Oslo). If data shaping & cleansing is what you do, this you’ve got to see.

To try it out, I picked on that perennial of BI data models, the Date (aka Time) Dimension. If you do any serious work with PowerPivot you’ll at some stage need such a table, for why, see here. I used the same variation on an old SQL trick as in creating a Date Dimension using DAX, but this time in M.


let
    days={1..31},
    months={1..12},
    years={2009..2015},
    monthLU={"Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"},
    dayT=Table.FromList(days,each {_},{"Day"}),
    monthT=Table.FromList(months,each {_},{"Month"}),
    yearT=Table.FromList(years,each {_},{"Year"}),
    dayJ = Table.AddColumn(dayT, "joiner", each 1),
    monthJ = Table.AddColumn(monthT, "joiner", each 1),
    yearJ = Table.AddColumn(yearT, "joiner", each 1),
    yearmonth=Table.Join(yearJ,"joiner",monthJ,"joiner"),
    yearmonthday=Table.Join(yearmonth,"joiner",dayJ,"joiner"),
    addDate=Table.AddColumn(yearmonthday,"Date",each let x = try #date([Year],[Month],[Day])
                                                     in if x[HasError]
                                                     then #date(1,1,1)
                                                     else x[Value]),
    removeErrors=Table.SelectRows(addDate,each not([Date]=#date(1,1,1))),
    cleanCols=Table.SelectColumns(removeErrors,{"Date","Year","Month"}),
    renameCols = Table.RenameColumns(cleanCols,{"Month","MonthNo"}),
    addtextDate = Table.AddColumn(renameCols,"TextDate",each Date.ToText([Date],"YYYYMMDD")),
    addQtr = Table.AddColumn(addtextDate,"Qtr",each "Qtr" & Text.From(Date.QuarterOfYear([Date]))),
    DateDim = Table.AddColumn(addQtr,"Month",each monthLU{[MonthNo]-1})
in
    DateDim

PowerQuery’s Table function library appears not to have a cross join, that’s why I needed to add a “joiner” column to the three tables, and it took me a little while to get my head around M’s error handling; but other than that, even with my current elementary knowledge of M and DE’s function library, it was both a quick & a friction-free exercise (helped by the step by step try/feedback build-loop, very similar to working a problem with Excel).

If Data Explorer (aka PowerQuery) is new to you have a look at Chris Webb’s excellent series of articles on the subject.

Advertisements

Cross Join Three Tables via DAX Query to seed a Date Dimension.

This is a variation on an old SQL trick to seed a Date (aka Time) dimension. You’ll need such a table in many, if not all, PowerPivot models as the proper functioning of Time Intelligence functions require a separate table containing a date column populated with a no-gaps sequence of dates covering the potential time span of the model.

If you’re sourcing your data from an existing data warehouse, most likely you’ll already have access to a date dimension. But if not, you’ll need to generate one. You could use the auto-fill facility in Excel to do so, but it can be awkward to use when generating a many-year table (e.g. pension & mortgage models might need current year +/-  20 or even 90 years).

To seed the dataset you’ll need three one-column tables; one for DAYs with values of 1 to 31, another for MONTHs with values 1 to 12 and finally YEARs listing the years required e.g. 1995 to 2060 or 2011 to 2013. Next, load these tables into a PowerPivot model, as linked-tables or whatever, and apply the following DAX Query against the model.

Evaluate(
ADDCOLUMNS(
SUMMARIZE(
ADDCOLUMNS(
CROSSJOIN(Years,Months,Days),
“FullDate”,DATE([Years],[Months],[Days])),[FullDate]
),
“Year”,YEAR([FullDate]),
“MonthNo”,MONTH([FullDate]),
“Month”,FORMAT([FullDate],”MMM”)
)
)
ORDER BY [FullDate]

You can apply the above DAX query using this trick (Note: this will not work for Excel 2013, but no worries, as  in Excel 2103, DAX Queries are a supported data-table source, see here).

Or, you could  install the excellent open source DAX Studio Excel Add-in (again doesn’t work in Excel 2013 but will eventually, I would think).

So how does this work?

Let’s start with the most inner function call:

CROSSJOIN(Years,Months,Days)

This simply cross-joins the three tables. The cross-join creates a table consisting of every single combination of days, months & years.

Next:

ADDCOLUMNS(
CROSSJOIN(Years,Months,Days),
“FullDate”,DATE([Years],[Months],[Days])
)

ADDCOLUMNS is the equivalent to the PowerPivot add-in’s “Calcuated Columns” in a DAX Query. This will add a new column call “FullDate” to the table using the DATE function to form a valid date.

Now you might think that many cross-join combinations would generate an invalid date (e.g. 30th March) but DATE is a forgiving function and will convert the likes of 30th of Match to the 2nd of April (or 1st in leap years). In fact you could use a 1 to 366 DAYs table and a single value (i.e. 1 ) MONTHs table! The result will include some duplicate dates but the next layer in the onion will take care of that:

SUMMARIZE(
ADDCOLUMNS(CROSSJOIN(Years,Months,Days),”FullDate”,DATE([Years],[Months],[Days])),
[FullDate]
)

SUMMARISE is like a SQL Group By, and will remove any duplicate dates in the [FullDate] column.

You could stop now as the minimum requirement for a date dimension has been produced, but in reality you’ll need a few descriptive/selective/sorting attributes to be assigned to each date.

You could do this via an Excel Table or via Calculated Columns when you import the table into a PowerPivot model. Or, as here, add them via a DAX Query:

ADDCOLUMNS(
SUMMARIZE(ADDCOLUMNS(CROSSJOIN(Years,Months,Days),”FullDate”,DATE([Years],[Months],[Days])),[FullDate]),
“Year”,YEAR([FullDate]),
“MonthNo”,MONTH([FullDate]),
“Month”,FORMAT([FullDate],”MMM”)
)

Again using ADDCOLUMNS to add calculated columns: a “Year” column (1995,1996 etc.) for selecting, a MonthNo column (1,2,3 …) for sorting and a “Month” column (Jan,Feb) for descriptive/selection purposes.

Finally, add a SORT BY [FullDate] to return the table in sorted order.

The table can then be saved as a CSV or an Excel worbook or loaded back into the source database. All, or a filtered range, can then be loaded into future models as required.