In my Handling Flat, Parent-Child and Nested Set Hierarchies & PowerPivot Powered Budgets & Targets posts I discussed my use of Nested Sets as a means of modelling unbalanced and ragged hierarchies. My 1st instinct when confronted with a non-trivial hierarchy, flattened or not, is to reach for my trusty Nested Set builders. However it’s not just as an ETL re-modelling method that a datasmith is likely to encounter nested sets, they also exist as variations-on-the-theme in production system data models. One of the most common use is in the modelling of systems’ ACLs (Access Control Lists). This use of range queries (often utilising “positive ranges” MINUS “negative ranges” plus or minus “ragged lists”) can be very complex and would prove difficult to model using the flattened cross-join options available in PowerPivot (point solutions by “hard coding” a fixed set of relationships are possible, but a generic “handle all comers” solution would be very difficult).
But should such ACL-like models be used in PowerPivot cubes? In general I would say no. For two reasons:
- The people building and utilising self-service BI solutions, like PowerPivot, tend to be high on the “need to know” hierarchy. They tend to have and need full visibility of an organisation’s data. Such security as is needed will tend to be at the data source level i.e. what full datasets they have access to, or what filtering of such data is required, and is usually controlled by an IT-staffed gateway service of some sort.
- As PowerPivot is, in the main, about just-in-time task-specific analysis, the simplification (via flattening or ‘hard coding’ of relationships) will usually do the trick.
But, what if you really needed to model such a thing in PowerPivot, is it possible? Yes, is the simple answer. See below for an example of a Nested Set Measure (click for larger view):
The example shows a simple “get me and mine” downwards range query, but the full power of Nested Set queries is possible. The downside of this approach will be the “flatness” of the views achievable using Excel pivot tables. The addition of a “display column” where node names are indented by the number of spaces in line with their “depth” can help. But where I see this approach being useful, the problem with pivot table representation would not be a major problem.
This type of cube would most likely be implemented as a “hub” cube or accessed via Excel’s CUBE formula. In other words the presentation of the data would be through other cubes or simply via “Excel reports”.
An example of where this approach might be useful would be in preparing data for generating employee performance reports. Employees might have complex KPIs that depend on their viability of the organisation e.g. temporary tele-sales staff might only see and be measured against their “own data”, while established sales staff might be measured against their team’s performance. Also, staff such as marketing support who although embedded in sales teams might be measured against sales of a brand within a specific geographic region.
Is this easy, or is it indeed self-service BI? There would, in the scenario above, still be a lot of self-service i.e the excel reports and the “spoke” cubes; even the Nested Set measures could be coded by “civilians” once they were provided with a template and with perhaps some re-modeling of the source hierarchies by professional ETL’ers.
But as I keep saying “ETL is hard, get over it!“. Modern operational systems can be devilishly complex, and even when not, the correct reading of a system’s data model is a skill that takes sometime to master – and having basic SQL skills is essential.
That’s why SQL is the heart of my microETL tool, because it is a serious ETL tool and if SQL (and I’m talking very basic SQL here, the sort SQLite provides) is beyond you (or you have better things to be doing with your time 🙂 ) then you should assign the back-end ETL to others.
The good news is that once the ETL back-end is “sorted” the new breed of self-service BI tools really do work; just don’t expect them to magic away the inherent complexity of source systems. Maybe one day they will, but that day is not today.