Python Powered PowerPivot

In my previous post I described a method of exposing PowerPivot (or indeed any Excel model) as a simple Web Service. As it goes, it’s elegant enough, but the requirement to have two processes to handle the web service (Excel itself and a JSDB.exe sub process) adds a degree of fragility, So, for example, not “playing” with the serving Excel process (as it will likely lock the service) or having to clean up the sub process window after shut down. It can also be slow, especially noticeable for quick requests (such as a AJAX validating request), as the process flows consists of:

  • JSDB.exe stores the URL supplied parameters into a SQLite WebService.db’s table called paraTable and commits the data to disk.
  • A COM request from JSDB to Excel to set the _Status cell to “Waiting”
  • Another COM request to set the <servicename> cell to a value (any value will do, I usually send the URL supplied parameters in JSON format).
  • The  microETL.SQLScript function associated with the service is activated by the cell’s change event.
  • The SQLScript runs and drops if exists, the WebService.db’s outTable; it then creates and fills and new outTable to suit the service’s tabular output and commits the data to disk.
  • The SQLScript writes back a value (any value other than “Waiting”) to the _Status cell. The JSDB.exe has, in the meantime, been looping making COM calls to check the value of this cell and when eventually set to not “Waiting” will read the values stored in the outTable and serve the data back to the waiting requesting agent.

As you can see, a lot of inter-process COM calls (slow) and writing/data from disk (SQLite is fast, but you’re still limited by the speed of your PC’s disk). Also, although JSDB’s built-in web framework is very good and compact, it doesn’t offer the simplicity and speed of development offered by the compact web frameworks available in languages such as Ruby (e.g. Sinatra) or Python (e.g Bottle). But, as you may or may not be aware, microETL has the ability to use Python as a scripting language, which opens up the possibility of using something like Bottle to web enable Excel. Which is exactly what I’ve done!

But first; people often ask why embed a Python interpreter into microETL? Can I not just use VBA as I quite obviously know it inside out 😉 Well of course for all things Excel-focused that is what I do, and for most other transformation or calculations that require some programming I would also opt for VBA. No reason not to, it works and works quickly. However, for some situations, VBA, even if it can handle them, is tough going. A good example would be the loading of a large XML feed; a common enough problem for users of PowerPivot as PowerPivot itself doesn’t offer a XML loader (except for oData feeds) so most folks fall back on Excel to do the heavy lifting.

For small XML documents Excel’s XML Map feature is ideal and again no reason not to use it (unless you’re using <“Excel 2003 Pro”, in which case some MSXML2.DOMDocument VBA code would be required). If however, the document is large (a few tens of thousands of XML “rows” for example) XML Map (and the VBA DOM object) will start to strain. Often you’ll get a result but only after a very long time, other times you or Excel will give up the ghost and quit.

You could then turn to MSXML’s implementation of SAX2 and use VBA to do “just-in-time stream” parsing of the XML document (will save on memory usage and should also result in less CPU abuse). See here for an example to get you started. But for real speed of handling you need a no nonsense non-validating streaming parser such as the venerable Expat. (I’ve posted before of using Expat to create a SQLite XML parsing Virtual Table.).

Yeah right, I hear you say, I’m going to start writing C code to parse my XML?

No, this is where Python comes in. C and Python are first cousins, C is the silent, unapproachable but very powerful one and Python is the friendly one who helps you get things done (and saves you from having to engage with the fearsome C). If there’s something useful out there written in C somebody will have wrapped it in Python and so it is with Expat (in fact it’s part of standard Python).

Download microETL from http://www.gobansaor.com/microetl for an example of a microETL powered Python script to download, parse, and insert into SQLite’s in-memory database, a large XML document. The resulting table is then pasted to an Excel range (Data!A1).

The XML document in the Expat example is in turn provided by an example of Excel Web Service using Python’s Bottle as the server.

To try out both examples:

  1. Unzip download into a folder; 1st run the PyWebService97-2003.xls workbook (not the WebService97-2003.xls workbook, it’s the JSDB example server). Locate the WebServiceControl sheet and click the Start Server button; the Excel process will go into “Not Responding” mode while listening on port 8080 on 127.0.0.1. You may be asked by your firewall to unblock Excel, you can safely do so as the port will only be accessible by processes running on the same machine.
  2. Open an other Excel process and load the Expat.xls workbook; locate ExpatControl sheet and click the Process XML  button. The button macro will issue a http://127.0.0.1:8080/xml call. Having downloaded and parsed the XML the resulting table is pasted to the Data sheet.
  3. If you wish, you could try using XML Map Import (find it under Developer Tab in 2007/2010) using the same end point to compare and contrast. (Warning: depending on the PC, this could take anything from a while to forever). For another comparison try a Web Query using the /table end-point (which returns the Currency table in  HTML table fromat) or loading a text file from the /CSV/Currency end-point which serves up a comma-separated file.
  4. To close down the PyWebService server, issue a http://127.0.0.1:8080/exit via your browser  (or click the Close Server button in the Expat.xls workbook).

Note:

You must have Python 2.7 (note the 2.7) installed on you PC. You can safely install different versions of Python side-by-side. It is possible to package the required python dlls and modules along with the rest of the code, but I haven’t gotten around to automating this yet and I keep forgetting how to do it, so I haven’t done it, so I’m lazy, so sue me 😉

If you’re running Excel 64bit this will not work as the 64bit bindings are not included but it will work if you’re running Excel 32bit  on a 64bit OS. I’m in the process of merging the 64bit and 32 bit versions of microETL and will republish the link when it’s 64bit enabled. But even then, you must have the 64bit version of Python 2.7 installed otherwise it’ll get very messy.

Python standard error and log files can be found in the /Python folder, each spreadsheet gets its own set. If the server on port 8080 doesn’t start, check the error log, if you see lots of errors to do with “sockets” then most likely some other service on your machine is using port 8080. In this case, go to the Python sheet in the PyWebService97-2003.xls workbook, find the mention of 8080 and change to 8081 or whatever.

Like the previous example, this, being Excel, is single threaded. Bottle may offer you the option of using the likes of Apache as your server, but you must stick with the standard Python Built-in HTTP development server. Each call to the server will/must block. Yes I know Excel 2010 now has multi-threaded  capabilities but not within VBA macros, which is what ultimately drives all this.

UPDATE:

For another variation on using Python (this time IronPython) within Excel/VBA see http://blog.gobansaor.com/2011/07/18/vba-multithreading-net-integration-via-hammer/

UPDATE:

For another variation on a Python powered Excel Web Service (this time IronPython) see http://blog.gobansaor.com/2011/09/21/exposing-an-excel-powerpivot-model-as-a-web-service/

 

 

 

Advertisements