Slowly Changing Dimensions – time to stop worrying.

Anybody who has worked on a major data warehouse project has war stories about that most awkward of BI concepts, the slowly changing dimension (see http://en.wikipedia.org/wiki/Slowly_changing_dimension).

The problem is not so much the need to keep a history of changes to master data attributes nor how to model these in the underlying data warehouse but how, or if, to model SCDs on end-user facing cubes.

SCDs have the potential to make end-user interaction with a cube more complex and error prone. A similar problem arises when a dimension can have several “roles” such as a sales fact with an order date,  dispatch date and invoice date.

Where the nature of the problem requires the exposure of dimensional change in a cube model then there’s no alternative. But, when the justification is that there might be a need in the future or that occasionally in the past that need existed then the argument that front-end cubes should solve specific not generic problems should apply. Holding that line was not always possible when the cost of developing cubes was high due to professional IT involvement and the well founded fear that the money might be wasted as requirements changed over time. The result was often cubes that were more generic (and long-lived, often to the point of irrelevancy) compared to a design that was more tightly focused.

This is where I think self-service BI tools such as PowerPivot or Palo changes things; if those who pay for BI need no longer insist that a design is future proof or that a single model be capable of handling all the current views of the business they can concentrate on building just-in-time highly focused (and hence usually simpler) models.

Different groups in the same business can have variations that suit them e.g. HR/Payroll might was to see a sales rep who has changed sales area reported against the rep’s actual sales while Sales/Marketing might be interested in the potential/history of the rep’s current sales area. Likewise, with date roles, call centre management might wish to report against order date, while the warehouse guys focus on dispatch date, and Accounts only have interest in those transactions with an invoice date.

But what about single-version-of-the-truth worries, what about reconciliation? This is where professional IT come back into the picture. Self service BI doesn’t mean do-it-all-by-yourself BI, no more than a self-service sausage and bacon hotel breakfast means you first catch a pig. The marshalling and delta management of data remains an IT responsibility, very small organisations and/or skilled end-users may be the exception to the rule, but ETL and data repository design are non-trivial problems and look likely to remain so. But with more and more of the front-end been delegated to (or snatched by) business end-users, the nature of back-end ETL and data modeling will change. Many of the resources once applied to a fully IT owned BI project will be focused on speedy, cost effective and accurate data provisioning, a less glamorous and largely invisible role compared to putting pretty dashboards on the CEO’s desktop. As a result, many IT staff with front-of-house skills who are now regarded as IT “assets” will more and more see themselves (and be seen) as front-line business ones.

So is this end of the road for the all-rounder, the catcher of pigs, frier of bacon and breakfast waiter rolled into one? No, I think not, there will continue to be a need for those individuals (or small teams) who can quickly and effectively tackle the once-offs, the emergencies and the need-to-knows that “normal” business practice continually brings to the surface. Small business (or isolated departments in larger concerns) will also continue to need tailored “end-to-end solutions”.

While I do see the continuing need for all-rounders their background (and hence the tools and product features to support them) will change from IT folks with business skills  to end-users with IT skills. Tools such as PowerPivot are an important first step on that journey, more end-user focused ETL tools (and services) is the next step. In the mean time, the world will have to depend on the likes of me (IT guy with business skills armed with a hodge-podge of ETL tools and techniques to bridge the raw-to-actionable data gap) for most of their BI all-rounders.

Advertisements