Excel In Data

Latest blog post: Slowly Changing Dimensions – time to stop worrying?

My speciality is data, more specifically, data at the edges. Data that has been extracted from, or needs to be generated outside of, formal systems. Such data tends to be the extremely valuable and useful, but only if shaped into a format that has meaning to those who can use and analyse it. I’ve been extracting, marshalling and shaping such data for most of my 30 years in the IT business, I’m a datasmith.

The technical term for what I do is ETL (Extract, Transform & Load). It’s the name given to the skill-set and tools associated with the movement of datasets. It’s the unglamorous, but essential backbone, to a whole range of IT deliverables: Data Warehousing, BI (Business Intelligence), Planning & Forecasting, New System Commissioning, Systems Integration.

In the past my skills would only have been of use to large IT departments, as major “data shaping” activities such as BI & data warehousing were “owned” by IT; small businesses and departmental workgroups were outside the loop except as providers of requirements (and money) and as long-suffering “end users”.

This alienation from the process of data shaping and the need to simply “get things done” led to the evolution of “shadow IT”, usually involving Excel is some form or another. Although many attempts have been made to control and tame this wild-west of data creation and consumption, they’ve largely failed. With good reason, the best people to get value out of data are those who depend on and generate the data in the first place.

Microsoft has long recognised this and has over the years improved Excel’s ability to handle and pivot lists (aka tables, the primary data artefacts of business). Excel’s PivotTable is undoubtedly the world’s primary BI tool.

Microsoft has, with Excel 2010, introduced an even more powerful end-user pivoting tool, the free PowerPivot add-in. The world of self-service data analysis has changed for ever; not only can PowerPivot handle enormous amounts of data but it enables end-users and data analysts alike to build highly complex analytical models at a fraction of their previous cost.

PowerPivot offers the greatest returns to those organisations or individuals who have:

  • access to report-friendly data sources, such as those provided by existing data warehouses.
  • The resources to purchase and maintain an Enterprise SharePoint farm (backed by Enterprise SQLServer) allowing PowerPiviot dashboards to be shared (and controlled) via a web interface. Such infrastructure, although ideal, comes with an expensive price tag.

So, if your organisation or your team is missing one or both of the above is PowerPivot not for you?

Far from it, PowerPivot is still extremely useful without either. Data that’s not in the ideal format can still provide useful analyses, and both Excel and PowerPivot can help cleanse such data. Likewise, just like an ordinary workbook, a PowerPivot workbook can be emailed or placed on a shared drive (all the supporting data is embedded in the .xlsx file). The only problem is, the time and resources associated with this activity can be substantial if you don’t have the ‘micro ETL’  and general Excel VBA skills to help automate it. This is where I can help.

I’ve both the skills and the tools to build a rock-solid Excel-based ETL back-end for your PowerPivot projects. This can be as simple as a “good enough” data shaping automation for a single PowerPivot cube or a sophisticated ‘micro-Data Warehouse’ which would act as a foundation for a more long-term self-service BI strategy. The only software product you’ll need to purchase is Excel, I’ll provide the rest. Most of the time Excel 2010’s  & Powerpivot’s in-built ETL capabilities, automated with some VBA, will be all that is required but when more ‘firepower’ is required:

  • I’ve assembled my own Excel based ETL tool specifically to enable me to quickly and cheaply build such micro-ETL solutions, it’s a collection of best-of-breed open-source tools (such as SQLite) driven by, and integrated into Excel (versions 2000 – 2010, 32bit & 64bit).
  • I can call on Palo OLAP (an open source, Excel focused budgeting & forecasting tool)
  • and on Talend, a leading open source ETL and data-management suite.

I can also automate the work-flows and non-ETL repetitive tasks associated with such projects. Tasks such as: generating questioners, marshalling and recording the resulting replies, generating non-PowerPivot workbook reports/pivots (remember, unlike previous versions, Excel 2010 can exist side-by-side with older versions of Office, only need to provide it to those who need its full power).

As for building PowerPivot cubes, I’ve a long background in the design & delivery of ROLAPMOLAP solutions. PowerPivot is essentially a 3rd way of producing OLAP cubes, heavily influenced by the dimensional modelling technique commonly know as the star-schema, but with the presentational simplicity of the Excel’s PivotTable and an Excel-like formula language (DAX) bolted on. PowerPivot models built with an understanding of the usefulness of the star-schema approach will allow for richer and simpler data analystics. I can provide that knowledge or help you master it.

My involvement can be either on a “deliverable” basis, i.e. I deliver the required solution, usually fixed-price for a fixed deliverable or on a consultancy basis, where the “deliverable” is more an ongoing process and would often involve a degree of skills transfer.

I excel in data, with tools such as PowerPivot so can you.