Tag Archives: drill-thru

PowerPivot Show Details

By the continuing interest in my “PowerPivot – Show Detail not allowed!” post a lot of people are still obviously surprised when their attempts to “show details” on a PowerPivot cell fails. I still believe this is a major missing but I’ve learned to live with it and have a semi-acceptable replacement in Flattened PivotTables.

So, for example, you might have a Sales Fact on which you have some detail-level columns (such as Invoice No. or Customer Telephone) that you occasionally wish to drill-down to in order to further investigate pivot results. If you never wished to drill to that level of detail, there’s a case for not including them on your “fact” table, as by their nature they have a high “cardinality” and will not compress as well as low-cardinality columns (such as Invoice Type). But of course, columns such as Invoice No can also be used for measures (e.g. count(Invoice No)) thus “justifying” this memory overhead.

Back to the drill-down. The double-click “show detail”s we’ve come to expect doesn’t work, but it’s still possible to generate a table containing the detail-level attributes by using a new feature:  Flattened PivotTables.

This is a new PivotTable design option found only on the PowerPivot menu. (It’s also possible to flatten PivotTables (both standard & PowerPivot based) using the ‘PivotTable Tools’ >’Report Layout’ menu but the new method is somewhat easier).

The trick is to select the most restrictive and low-cardinality columns first (after selecting at least one measure), setting the necessary filters (on the columns themselves, or by using “non-table” slicers) to achieve the view required and then, adding the detail-level columns to the table.

If you do this in reverse then the WorkBook is likely to choke on you if there are 10’s of thousands (or indeed millions) of the detail-level column’s distinct values.

Another method of avoiding this premature “explosion” of rows is to:

  • Click the Field List button under PivotTable Option menu, this will display the “normal” field list alongside PowerPivot’s list.
  • Select the Defer Layout Update option. Layout table, then click Update.

Not perfect I’ll admit, hopefully the next version of PowerPivot will at least allow WorkBook based cubes to show details (I can see the logic of disallowing or severely restricting this option for SharePoint hosted cubes) but until then this is as good as it gets.

Update: Seems the next version of PowerPivot, due this year, will remedy this !

Update: An so it has …

Advertisements

PowerPivot – Show Detail not allowed!

Last week, I at long last set aside some time to put PowerPivot through its paces, triggered by my purchasing of Excel 2010 (in itself a momentous occasion as without the attraction of PowerPivot I would have followed my, and most other Office users’, usual pattern of waiting 3-5 years or so before investigating the ‘latest’ edition).

The verdict?

In general, still impressed, the concept demo’d to me over a year ago has evolved into an impressive first version. Some things such as hierarchies did not make it (hierarchies are constructed from cross-joins of field /attribute sets as per a normal pivot table) so no hierarchy rules as would be the case with standard OLAP cubes; but perhaps for many end users the ‘traditional’ construct-on-the-fly hierarchies will be more approachable.

The DAX functionality is better than I expected; easier for non-techies than MDX but still powerful.

Importing data is hassle free and intuitive, the VertiPaq engine does a wonderful job of compressing the imported data and the resulting in-memory column-store is certainly very fast. I like the ‘linked-table’ option which allows for normal Excel tables (the one useful new feature that Excel 2007 introduced) to be added to the PowerPivot star-schema.  Being able to import any datasource publishable in AtomPub format (such as Google Docs spreadsheets!) is also nice.

So it’s all good then? Yes, except for one really annoying missing; no drill-through (aka Show Detail, aka drill-thru) allowed.

What? surely some mistake!

Afraid not, I initially thought the measures I attempted to Show Detail on were too complex (as the error message “Show Details cannot be completed on a calculated cell” suggested). Then I assumed it had been cut to meet a delivery deadline and would appear in a subsequent version. But no, Donald Farmer confirmed that is was intentionally removed as the feedback from IT organisations was not to allow end-users the ability to drill-through to potentiality millions of rows when running under a SharePoint server. As for those of us running PowerPivot on the client, we already have all the data, so no need for show-detail!

Okay, I can understand IT’s reluctance to allow a multi-million row drill-through but surely that should be the decision of individual IT groups to allow or not, and if allowed, to provide the ability to limit the amount of data returned.

A million row result is of limited use, most Show Details are a few thousand at max, and typically sub-1000, so sensible limits can easily be enforced.

The client-side (the side that matters most to me) is a very different story. Here the excuse that the data is already there, is exactly that, an excuse. Using the same logic the drill-through on a normal pivot-table should be unnecessary. Yet, if you watch end-users construct pivots they use it constantly; not just to discover the detail behind a figure but as often as not as a way of validating that the model they’ve constructed is correct.

This spot-checking of figures is the main ‘test methodology’ used in the wild. Spreadsheet ‘developers’ do not construct sophisticated test harnesses and procedures. You might argue they should, but they don’t and likely never will. And as for the ‘multi-million row result’ problem, end-users are not idiots they’re just end-users, they’ll do it once, and learn to be more careful the next time (or they’ll use the limit-rows option).

This lack of drill-though will definitely mean I will continue to use normal pivot tables for situations that would otherwise be better solved using PowerPivot. As many such models will be based on relatively small datasets ( sub-100,000  ‘facts’) it might be suggested there’s no need for PowerPivot. But this is to miss the ‘intellectual’ power (as opposed to the massive data crunching power of VeriPaq) at the heart of PowerPivot; the star-schema.

Most of  the commentary on PowerPivot has focused on its ability to handle really large datasets but this emphasis on ‘big-data’ (something the rest of the BI industry share) often ignores the power of human-scale small-data (i.e. the world of the spreadsheet jockeys). The power of a star-schema to model BI problems (be they small or large) is something I’ll return to in a later post. (UPDATE: http://blog.gobansaor.com/2010/07/09/star-schemas-to-boldly-go-where-no-excel-spreadsheet-has-gone-before/)

Star-schema models (particularly when the speed of access worries are removed by an in-memory column-store) are superior in many respects to the fully denormalised flattened tables that we currently build to support pivot-tables and are also more flexible than the multi-dimensional cell approach of pure MOLAP cubes. Combining such data models with the user-friendliness of spreadsheets,alongside the added magic of a modelling language such as DAX (and some MDX where necessary) on a datasmith’s laptop is the true beauty of PowerPivot.

So, lack of drill-through aside, “Well done Microsoft!”

UPDATE:

For a semi-acceptable alternative to no drill-thru see http://blog.gobansaor.com/2011/03/22/powerpivot-show-details/

Palo OLAP and sparse dimensions.

Last week I tried out both the latest Palo 2.5 release and its sister product, ETL-Server.  Although I’ve not done any proper benchmarks, 2.5 does appear to be faster than the previous release and the Excel add-in also behaves better when co-habiting with other add-ins and macros (the previous release’s use of, and response to, application level events meant it often caused the entire Excel session to grind to a halt when other macros were processing data).

There is however, a major memory leak problem when using cubes with sparse dimensions (such as the Biker database Invoice dimension), at one stage my Excel session had a working set size of 750M!

The failure of  MOLAP cubes to effectively handle sparse datasets has always been something that ROLAP star schema advocates (myself included) have pointed to as a major short-coming.  Over time, products such as Essbase have managed to get around this limitation (but often only with careful up front cube design by skilled staff). Palo hasn’t quite made it to that level yet.  But then, maybe it shouldn’t, perhaps simplicity of setup and use should be Palo’s goal, not the ability to handle a telco’s ‘grain-at the-level-of-call’ fact table.

Two of the new features in Palo 2.5 that I’d been looking forward to, Zero Suppression and Drill-through are both relevant to the handling of cube sparsity.

  • Drill-through; to reduce the sparseness of a dimension, by moving the ‘grain’ up a level in the consolidation hierarchy, e.g. have a base level of Invoice Type instead of Invoice Number or indeed, dropping the dimension entirely.
  • Zero Suppress; to filter out excessive elements, e.g. only show Invoice Numbers that were raised this month rather than showing every invoice number ever raised depending on a non-zero value to indicate that it belongs to this month.

It was my testing of Zero Suppression using the Biker database that appears to have caused Excel to go on a memory binge.  It doesn’t worry me that much as the right design can reduce the need for zero suppression e.g. in a customer dimension, ‘hide’  customers who trade with you infrequently under a “Others” consolidation.  Also, the use of drill-through would eliminate the need for many sparse dimensions, but, alas, the drill-through functionality is severely nobbled in Palo 2.5.

Firstly, the functionality is only available in Excel if you purchase the €8,000 Palo Supervision Server which is all fine and dandy if you have a need for all the good things that Supervision Server offers.  Having said that, I did managed to bypass this requirement by calling the ETL-Server’s SOAP API directly from Excel with the help of Simon Fell’s superb PocketSoap library, so all is not lost on that front. But …

… the other reason I’m not overly impressed with the feature, is its implementation; particularly for something Jedox is asking you to pay €8000 for!   Here’s what you get.

Within the ETL-Server’s cube ‘export’ you can specify “<info>” fields such as invoice number, that will not form part of the final cube coordinate list.  These coordinates and the info fields along with ‘value’ field are then written to a Derby (aka JavaDB) database table, where the schema is the ETL project name and the table name matches that of the cube.  So you end up with, in essence, a ‘fact table’ at a finer ‘grain’ than the cube and the role of degenerate dimensions being provided by the “info” fields.  The resulting table is not indexed, so large datasets will be a problem and for it to work, a cube can only be ‘owned’ by a single ETL project.

The other thing to note is that drill-thru only works for ‘N’ coordinates i.e. no consolidation elements.  This is unlike Excel’s Pivot table which allows you to double-click at any cell, to reveal the underlying dataset.

On the plus side, the new ETL-Server is actually very good and well put together.  Better off not thinking of it as a generic ETL tool but as a specialist Palo loader, like Oracle’s SQL*Loader.  And like SQL*Loader’s role on many Oracle projects, for a large percentage of Palo projects ETL-Server will  be the only ETL tool needed.