A significant event in Ireland’s property market happened this week, there’s now a publicly accessible register of all residential property sales.
This is something that should have always been available but became absolutely essential as Ireland’s property boom unravelled, with prices down 60% since 2007. Not just, to help potential vendors and purchasers find a price point in a collapsed market, but also for all householders, now that a self-accessed property tax is due mid next year.
The database is intended as a search & compare tool rather than a property index and is missing at lot of the dimensionality that a proper index would have (such as no of beds, square footage etc.). As such, it’s not an ideal PowerPivot candidate – only 50000 odd records (due to low market activity – 1% turnover V. the 4/5% of a healthy market) and a large text “high-cardinality” “fact field” (or degenerate dimension if you like) i.e. the house address; limiting xVelocity’s ability to compress the dataset.
Also, “normal” SUM() and AVERAGE() statistic functions are not that useful in analysing such data; percentile and counts tend to be more informative. PowerPivot can handle the counts but lacks a percentile function!
Nevertheless, I decided to let PowerPivot loose on it.
I first added a small amount of dimensional decoration. I grouped the counties into regions such as Dublin, Greater Dublin (Kildare,Meath,Wicklow), Border, Midlands, South Leinster, West, & Munster.
I also added a date dimension, nearly always a good idea. To that I added a MAT (moving annual) attribute as, in general, I wished to look only at the last 12 months (more realistic prices). Before that pricing seems to be all over the place as the market collapsed. This method of updating MAT (or YTD etc.) attributes to suit the “current period” is an old DW trick that’s still useful even now with DAX’s time intelligence functionality (particularly when the base DAX formulas are tending on the complex side, as below).
Next, concoct a Percentile formula.
My first attempt involved using the TOPN function to separate out the portion of the table “above” the percentile, then use MIN() to find the value. Worked, was fast; but then I noticed it only worked where the “top” was less that 5000’ish elements if it was greater than that, the function returned the whole table! A bug, no doubt, and not the use-case envisaged for the function, I suppose 😦
I then went back to using a formula that I had used before based on Marco Russo’s median example. But when I applied it to the model I noticed it was very slow, using 99% of a CPU and using a burst of memory approximately 3 times that of the workbook at rest!
Okay back to the drawing board, this time I decided to use another new PP V2 function, RANK.EQ. This worked and is fast. See below :
percentilePrice:= if(HASONEVALUE(Percentile_Selector[What_Precentile]), MINX( FILTER(propertyprices, RANK.EQ(propertyprices[Price €], propertyprices[Price €]) <= COUNTROWS(propertyprices)*(1-Values(Percentile_Selector[What_Precentile]))+0.5 ), propertyprices[Price €]) ,blank())
It’s a real pity that Percentile-like functions are missing from the current version of DAX as the above code, while not terribly complex, is not exactly easy either. And, while perfectly suitable for the type of dataset in this case, it won’t always match Excel’s in-built Percentile functions, for more on how to simulate Excel’s percentiles see http://www.powerpivotpro.com/2012/05/percentile-measures-in-dax-errata/ (but be aware these examples use the bug-prone TOPN).
Maybe the lesson here, is, sometimes good old plain Excel works better (and quicker) than PowerPivot, horses for courses.