Tag Archives: IronPython

The Model of a very modern PowerPivot – without the add-in

My last post demonstrated a method by which explicit PowerPivot DAX measures could be created in any version of Excel 2013 without  the PowerPivot add-in. Here’s another method, this time using Excel 2010’s PowerPivot add-in (available for all versions of 2010) to design the model and the OpenXML SDK to copy the resulting model to an Excel 2013 workbook.

I had thought originally that this wasn’t possible; when I first tried it, the model copied OK but the workbook connection couldn’t access it. Turned out, this was due to using a V1 model that although upgraded to V2 still used the “Sandbox” cube name rather than the now expected “Model” name. So, all models must be created in PP V2 for this to work.

Also, the “receiving” Excel 2013 workbook must be primed with a data model (easiest way without the add-in is to create two tables and link via relationships option, or just use the Model_2013_template workbook provided).

You can download a POC workbook Example_CopyModels_no_Addin demonstrating this method from here. 

Note:

You’ll need to modify the paths of the example workbooks to suit your environment.

Also, if you copy the workbook out of the downloaded folder, make sure to copy the datasmith xll of the right “bitness” and copy the DocumentFormat.OpenXml.dll or ensure one is available on the GAC or PATH.

Before using the datasmith xll for the first time, right-click and unblock, this can also be a good idea for workbooks with external connections, otherwise Excel can sometimes keep certain functionality “locked” 1st time you access such files.

Having copied the model into the Excel 2013 workbook, save & re-open before making any further changes.

If you look  at the code you’ll see it’s also possible to copy models for 2010 to 2010 and from 2013 to 2013 or indeed from 2013 back to 2010. This general coping of models from workbook to workbook was the use-case I had in mind when I first discovered this (not as a way of getting around lack of add-in). Very useful during model development allowing different views to be tried against a first developed (or commonly used) data model. Could also be used when live to allow a single model-refresh to be used by many “view” workbooks.

Have fun.

Advertisements

Exposing an Excel PowerPivot model as a Web Service

In my last post I demonstrated a simple way to interrogate an Excel based PowerPivot model using either MDX,DMV or the new DAX Query Table commands. This is a great way to dig deep into a PowerPivot model, for example, this new Denali DMV discover_calc_dependency can be use to  describe the dependencies between a model’s DAX measures.

Being able to directly access the PowerPivot model opens up all sorts of interesting options, one such is a web service.

In the past I’ve written about exposing Excel as a simple DIY web service, see here for a JavaScript example and here for a Python version. Both were capable of serving PowerPivot data but they both had to trigger events in Excel to do so.

With this newly discovered ability to query the model directly, I figured a third attempt at a PowerPivot Web Service was overdue.  This time I’m using the new kid on the block, HAMMER, and in particular using its in-built IronPython interpreter.

I could have built a simple server using VB.NET or C# and added it as a HAMMER command, but the nature of a web service makes it hard to frame as a parametrised command. It was for this type of requirement that I added the ability to script using Python, i.e. situations where a generic command would always come up  short, for example:

  • Should the service allow only local connections?
  • Should it use authentication, and if so, what type?
  • Should it allow free-format commands or offer only packaged commands?
  • Should it use HTTPListener (which is in-built and very powerful, but requires Admin privileges) or use a more basic socket based utility that can run without Admin privileges.
  • Should it block or handle asynchronous requests. (For services that access the Excel Object Model, it has to be blocking, but perhaps if the service simply accessed the AS engine, it night support multi-threaded queries? Must try it sometime.)

For this example I decided to use HTTPListener (in blocking mode and without authentication) and to expose services that accept free-format MDX,DMV or (if Denali) DAX Table Queries.

See the PPWebService workbook example in the latest (V0.5.0) version of HAMMER. To activate, make sure latest version of HAMMER is installed (or use activate button on Hammer sheet) and click the Expose PowerPivot Model as Web Service button (again on Hammer sheet). This will start a server on port 8070.

The service exposes 6 endpoints:
  • /exit – this will shutdown the service.
  • /html – this will return data as a HTML table (ideal for importing into Excel using a Web Query).
  • /xml – this returns a ‘canonical’ XML table (ideal for importing to Excel as an XML Map).
  • /json – returns a JSON table (as a list of lists).
  • /odata – returns a ATOM-base ODATA feed suitable for direct import into PowerPivot using its Data Feed import facility.
  • /csv – returns in CSV format.

Each service (except /exit) expects to be followed by a MDX, DMV or DAX command. e.g. /html/evaluate(InvoiceHeaders). In the case of /csv an optional file name may be specified e.g. /csv/invhead.csv/evaluate(InvoiceHeaders).

It’s also possible to issue a POST request with the command in the message body.

As with all URL requests, the commands may need to be URL encoded (use the HAMMER command “_URLENCODE” if need be).

For the latest versions and articles on HAMMER follow the HAMMER tag on my blog …


VBA Multithreading, .NET integration via HAMMER

In a previous post I urged all potential datasmiths to learn a scripting language (I suggested Python). But what of VBA, the granddaddy of the scripting world? Well yes, if you have a need to automate Excel then you must learn VBA. VBA is to Excel as JavaScript is to the modern browser, its tight integration with Excel’s Object Model and its superb debugging facilities makes it the optimal choice for automating Excel.

VBA is now a one trick pony (Office automation), but Python opens all sorts of API scripting and product automation doors. My use of IronPython as HAMMER’s scripting language is one such door, a door to the otherwise mainly-closed-to-VBA world of multi-threading and easy .NET library integration. [UPDATE: Feb 2012 – HAMMER now also offers JavaScript as a scripting option]

The HAMMER function can be called from VBA using the Application.Run command like so…

retArray = Application.Run(“DATASMITH.HAMMER”,inArray,”Select dept,count(*) from table1″,”SQL”)

… the 1st parameter is the function name, parameter two is the 1st argument to the function, parameter three is the 2nd and so on.

By utilising HAMMER’s IronPython functionality (requires NET 4.0 runtime), VBA routines can access the full power of the .NET platform with data passed back and forth using tables. Admittedly for many complex .NET utilities utilising VB.NET or C# may be a better approach (due to better IDE and debugging features of such languages) but for standard library calls, IronPython is an ideal option. It also has the benefit that the “code” can be stored within the workbook.

HAMMER also offers the power of multi-threading to VBA via its internal threading functionality (requires Excel >= 2007 and Net4.0 runtime). The multi-threading example in the hammerThreads.xlsx workbook could easily be wrapped in VBA code, perhaps to allow it to be controlled by a user-form.

I’ve added two new commands specially designed for use within VBA scripted HAMMER scenarios:

  • APPDB – Opens an application-wide shared in-memory database. This will allow tables (and Python objects) created in one function call to be accessible in an other function call (assuming both issue the APPDB command as their 1st command). This replicates the functionality of microETL which by default exposes a application-wide SQLite in-memory instance and a common Python workspace.
  • CLOSEAPPDB – This will close and clear the shared c#-SQLite and IronPython instances. Equivalent of microETL’s xLiteReset() function.

Be careful not to if you …

  • use the APPDB instance from in-cell UDF calls to HAMMER that are likely to be scheduled as multi-threaded (the helper functions HammerToSheet & HammerToFit are safe, as they are always single-threaded),
  • or use within “internal threaded” HAMMER commands

as although c#-SQLite is thread safe, the implementation logic is not. UPDATE: Feb 2012

I’ve added new threading features to HAMMER, see here, and my implementation is now thread-safe (well I think it is!); however, the warning still applies, think twice before updating a single database instance (such as APPDB) across multiple threads – but read-only access should work fine.

Here’s a list of the HAMMER commands implemented so far …

You can download the latest version of HAMMER here …

Those with a datasmith’s hammer, see every problem as a table.

I picked the name HAMMER for my new micro ETL tool to capture the spirit of how it’s intended to be used. It’s an everyday tool, sufficient for many of the tasks a datasmith faces and when not, will help (or at least not get in the way of) with the marshalling of data that needs to be processed by more powerful tools.

The saying “If you only have a hammer, you tend to see every problem as a nail” could be rephrased as “If you only have a datasmith’s HAMMER, you tend to see every problem as a table“!. But that’s OK, as nearly every commercial IT problem can be expressed as a set of tables, and in any case, HAMMER is not intended to be a datasmith’s only tool. Its close integration with Excel, recognises the prime importance of spreadsheets to most data wranglers and its use of CSV & the SQLite database format as its persistence & transport mechanism, (à la “SQLite as the MP3 of data“) recognises that datasets will often need to be shared with other tools.

Finding the perfect tool for the job it an IT obsession that most of our customer’s care little for; it’s only when our choice of tool affects their bottom-line (i.e excessive cost or wasted time) that end users take much notice of the backroom technology. The most important skill for a datasmith, is the ability to understand the structures and forms that data can take and to be able to derive and build business value from that data. The technology is an aid, nothing more, technology (and indeed  applications) will come and go, data lives on.

HAMMER encapsulates the  minimum set of technical skills an aspiring datasmith should learn:

  • Spreadsheets, in particular Excel. No need to know every nook and cranny, nor expect it to handle every task. But do learn about pivot tables and array formulas; if you have Excel >= 2007, learn about Excel  tables (aka Lists). If you have Excel 2010, make sure to download PowerPivot. Become comfortable with “formula programming”, don’t expect  a ribbon command to solve every problem.
  • SQL – learn the basics of selecting, joining and group-by; again no need to become a master; SQLite is not only an excellent way to learn SQL, it’s also a powerful tool to have once you’ve picked up the basics.
  • Learn a scripting language –  Python is one of the best and is relatively easy to learn. Again, mastery is nice, but not essential, learn the basics of IF-THEN-ELSE logic, loops and iterations, array and list handling and string manipulation.  Your efforts at coding do not have to be pretty or elegant, just functional. Python skills will also transfer across platforms, CPython (the original and the best), IronPython (.NET and HAMMER) and Jython (JVM, here’s an cool example of Python as a scripting language to automate GUIs).

All this talk of picking the right tools brings to mind the old golf pro-am story where the amateur was constantly pointing out to the pro (Jack Nicklaus I think) what club to play. At a particularly tricky shot to the green, the pro had enough when his choice of club was again “criticised”. So, he proceeded to take out every club including woods & putters, placed a ball for each one and hit every ball on to the green.

We’re not all as talented as Jack Nicklaus, so having at least an good enough tool for the job at hand  is important. But it does show, that focusing on the end-game is what matters, not becoming fixated with a  particular way of doing things.

Enough of the moralising, and talking of being fixated on a particular tool  😉 here’s this week’s list of new features to be found in HAMMER:

New commands:

TXT – will load a text file such that each line is a new row in a single columned (column named “text”) table.

LOADDB – like OPENDB, opens the named database but only if the database file exists (OPENDB will create a new empty database if file not found). Intended primarily as end-point to request-process-response loop, see DELEGATE/SUBMIT  below.

SAVEDB – saves the current “main” database using the previous argument as it’s name. (shorthand for … “main”,”filename.db”,”SAVENAMEDDB”).

SUBMIT – same as SAVEDB, but the value of the prior argument has a “.request” appended to make a file name in the format “databasename.request” for saving to. Also, if the previous argument = “” or “{GUID}” will generated a globally unique name using  a GUID. The main use-case for this command is to send the saved database for processing by an external program, maybe a Talend or Keetle ETL job , or a HAMMER.EXE job.

DELEGATE – same a SUBMIT, but doesn’t do any processing (i.e. all commands other than DELEGATE, which is expected to be the last command, are ignored), instead it’ll package the request in the saved database with the expectation that an external HAMMER.EXE, or another Excel-based HAMMER call, with do the processing.

Changes and helper functions to support DELEGATE/SUBMIT processing:

The DELEGATE and SUBMIT commands are how HAMMER implements its version of steam-powered servers.

The intention is that databases named with extensions of “.request” are picked up and processed by “servers” waiting for such files. (The transport of such files between server and the served is a separate issue, might be simply a matter of placing files on a shared drives, or DropBox!) Theses servers may then populate that database with new data ( or not, e.g. might generate a CSV). When finished, a text file of the same name but with the “.request” replaced by a “.response” is written to indicate that processing is complete.

Both the Excel udf HAMMER and HAMMER.EXE (non-excel command line version of UDF function), have been changed such that when a single argument is passed, that argument is expected to be a DELEGATE generated “.request” database. The database’s “pickled” arguments and data will then unpacked and the “delegated request” processed.

HAMMER.exe, if started with no arguments, implements a simple DELEGATE “server”, i.e. will wait for “.requests” files in its default folder and process each file as above.

Three RTD helper functions have been added to HAMMER udf:

waitOnHammerResponse(database (without the .response),[folder name]) – will return “Waiting” until it finds a database.response file, will then return the original .request filename (i.e. not the response file as it’s simply a marker, the database.request will  contain original data and any new data generated).

waitOnHammerRequest(database (without the .request),[folder name]) – will return “Waiting” until it finds a database.request file, will then then return the request filename.

waitOnFile(filenaname,[folder name]) – like the functions above but without the response/request expectations.

Here’s a list of the HAMMER commands implemented so far …

Get the latest version of HAMMER here …

HAMMER a new Excel ETL tool for the PowerPivot age …

So, why am I developing this new datasmithing tool when I already have microETL to do the same task?

To be honest, when I started, my aim was to simply port  a lot of my useful framework code to .NET as I’m doing more and more work in pure .NET these days.

It’s a lot easier to do so with a project driving the work rather than the drudgery of line by line conversions simply for the sake of conversion.

But over the last few months of investigating the process of moving some of more useful code over to .NET; discovering C#-SQLite as a near drop-in replacement for SQLite; realising that IronPython 2.7 marks the coming-of-age of Python under .NET; discovering the ease of multi-threading programming under Net 4.o; I began to see the potential of building a tool more suited to this new emerging world of technological plenty; of 64bit, multi-core high-RAM machines playing host to ground-breaking tools such as PowerPivot, than the 32bit, single-threaded  world that microETL was born into.

I wanted to bring microETL back to its roots as an in-cell function. To make it sit more easily with the spreadsheet functional style of programming, a style of programming that has been more successful at attracting “civilians” to programming that any other method. To make the tool more in tune with the Excel-way of doing things.

At the same time, I also wanted the tool to be “detachable” from Excel, so that it could perform ETL functions server-side without the need for Excel. Ideally capable of being run from the command-line from a single no-install-required executable.

And so, the Datasmith’s HAMMER was born.

So where does this fit in with PowerPivot? When I design something like HAMMER I normally craft it around a serious of “use cases”. Below would be a one of my PowerPivot & HAMMER  use cases:

The Data Quality Team

The sales & marketing department of a wholesaling company have discovered PowerPivot and are excited about using it to help manage an upcoming merger with a recently purchased competitor, also in the wholesaling business. There’s a large overlap between customers and products supplied, and the sales structures, pricing and KPIs used by both companies are very different. The transition phase from two separate companies to a single operating unit will need detailed  planning and constant monitoring.  Oh, and there’s also the problem of the brand new ERP system that’ll replace the existing systems currently in use.

The group can see how PowerPivot models will help with this task but are worried about the sourcing and managing the data. They decide to appoint a small data quality team, with a mixture of IT and business experience; which will be responsible for mapping old  to old,old to new, managing the data deltas (business will still goes on, new customers, new products etc.).

Most of this work revolves around “master data”. Transactional data may be much larger in volume, but get the master data right then transactional data will be easy to handle as long as the processing capacity to handle the volumes is available (which thanks to PowerPivot and really fast high-RAM PCs it is).

In golf, there’s a saying “You drive for show, but you putt for dough”. Likewise in data: a lot of emphasis is put on the big-data transactional datastores, but real analytical benefit comes from the less sexier master-data, aka the dimensions. As a result the most important output from the data quality team will be a set of conformed dimensions.

Each week (and at month-end) the team will get extracts of the master and transactional data in various formats from both legacy systems and will also get the WIP master datasets from the team working on the new ERP system. From these they’ll construct point-in-time conformed-dimensions combined old with old and old with new; and will rebuild transactional feeds with new conformed keys alongside existing keys. These datatsets are then sent to the various analysis teams to enable them build Excel and PowerPivot models that will hopefully all “sing from the same hymn sheet”.

And how will this be accomplished? With the datasmith’s HAMMER of course (well it is my day-dream use-case!).  Both in-Excel and command-line HAMMER “scripts” will be used to wrangle the datasets and the complete sets packaged as SQLite database files.

When the analysis groups receive these files, they too will use HAMMER to extract the datasets they require (either straight to CSV for loading into PowerPivot) or into Excel for some further transformation work prior to modelling. To handle slowing-changing dimension scenarios, many of the teams will squirrel away each week’s data in other to be able to model yet-unknown-but-very-likely changes of organisational structures.

Although this is a use-case, it’s based on reality, a reality that would have been so much easier to manage if I, and the teams I worked with, had access to technology such as HAMMER and PowerPivot.

To download the latest version of the code, go to this page on my website.

Follow the HAMMER tag on  this blog for information on commands and examples (best start with the oldest and work forward …)