I’ve been extending microETL’s functionality to make using it as a “Steam Powered” server easier and more automated. The major addition this week is an Excel friendly JSON parser. This will allow me to use JSON objects for interchange messages and for configuration files. I was going to use XML as there’s already an element of support within Excel for the format via XML Maps and the like, but I decided that JSON was too common a format to ignore, so I modified this existing VB6 example for my purposes. The major change, other than VBA-enabling it, is the use of detached ADO Recordsets to handle list-of-lists structures (aka tables).
So what’s the likely use for my brand new shiny JSON objects? (By the way don’t be put off by the talk of objects and the like, JSON is essentially a simple text based format see here for a description.)
I’ve demonstrated that Excel with the help of microETL can function as a Web Service endpoint; whereby, a web request is send to the TCP/IP port Excel is listening on; Excel then processes that request (in the meantime blocking any further requests as Excel is single threaded); and then issues a response with the necessary data and/or error messages.
This is not ideal. Excel is not a server product and while it can be used as one in very low-traffic environments it has the potential to be a serious bottle-neck. But, if you approach this problem using Excel as responder to a queue rather than as a always-on TCP/IP port listener, it becomes surprisingly fit-for-purpose as a server.
What do I mean by a queue-responder? An example will best explain.
Say, you’ve developed a very useful PowerPivot model that you wish to share with your co-workers. Unfortunately they’re a mixed bunch, some have access to Excel 2010 and potentially PowerPivot but most are still on Excel 2003, and another set are outside contractors who use a web gateway to access your internal systems.
Now when I say model, I mean the data relationships, cleansed data and measures you’ve developed not the imagery created on the workbook rendered via pivot tables, charts or the like. If you need to share at this level you can do so by sending the workbook to those colleagues with Excel 2010 (0r 2007 for non-dynamic view) or by investing in a SharePoint farm. Neither option is a runner in this case, as you don’t have the money nor technical resources to set-up SharePoint and the detail data behind the model is not suitable for sharing. So what to do?
Well if you had microETL “server edition” installed (remember,this is vapour-ware but most of the building blocks are in place) you would:
- Create one or more flattened PivotTables based on your super-duper model. Static elements would be “column” fields while “client selectable” elements would be assigned to either “pages” or “slicers“.
- Run the “service discovery” microETL function which would create one or more “service manifests” based on the flattened pivots and associated pagers and slicers. This manifest would be published as a JSON object and would be “discoverable” via a standard “discovery service”. The function would auto-build the SQLScripts necessary to handle the discovery service and the “services” associated with each flattened pivottable.
- Start a TIMER on the now microETL-server-enabled workbook. It would check a “resource” for any “requestMD5hash.request” JSON messages. If either an equivalent “requestMD5hash.busy” ( & not older than a certain time) or “requestMD5hash.response” exists then ignore; if not, create a “requestMD5hash.busy” file; open the .response file; process the required service using the data parameters supplied and write back a .response file.
- Repeat until the TIMER is disabled.
So what’s this “resource” and what’s the requestMD5hash stuff? The “resource”, in its simplest implementation, would be a “client accessible” folder; so it could be a Windows or Samba shared folder, or a folder on the same machine as the Excel “server” with the JSON messages being push/pulled to/from that folder by a simple web server. But it could also be an AWS S3 bucket or an AWS SQS queue. In fact, the .request->.busy->.response cycle is a simple queue implementation.
The requestMD5hash stands for the MD5 hash of the .request JSON message. This will uniquely indentify the message (i.e. the hash of the the service, the parameters and optionally any security IDs or time stamps) and will be used to both track the request and to provide a simple type of request caching.
Also as the MD5 hash can be calculated on the client-side so those situations where the “resource” is a “dumb folder” can be handled, as it provides a server-free means of generating an Unique ID (rather than the more usual server powered “here’s my request and a server sends back an ID” cycle).
So what about the clients? Those with Excel, would use an cut-down version of the microETL add-in (or macro-enabled workbook) to handle: The Request (encode in JSON) -> Wait for a response or come back later -> Response (decode JSON) cycle.
Those who access via a web gateway would delegate this cycle to a web app. The app might repackage the JSON response as a pretty web page or simply as a basic HTML table or XML Map (for import into un-macro’d Excel or other spreadsheets) or pass on the JSON as is.
The web app could also handle the wait period by doing some AJAXy “please wait” magic with a timeout message like “Excel servers not responding – Please resubmit at a later stage”. In this case if the server managed to catch up before the resubmit the response would be waiting and recognised via its “.request” MD5 hash.
What about security?
The “discovery” service would have the option of linking in with a simple User/Role facility, which would:
- Filter the “slicer” elements at the discovery stage (a manifest would default to sending a list of single column tables each one associated with a slicer, listings all the possible elements that can be filtered through that slicer). A security filter could restrict such lists.
- Validate the filters requested to ensure that the “allowable” slicer elements where not doctored to include blocked elements. (The slicer lists provided by the discovery service would ideally be used by the client workbooks or web apps to populate lookup lists to provide an element of client-side validation.)
Would it scale?
If you mean would it scale to handle your growth as you went from a small firm to a mega world-wide empire, no it wouldn’t. But it could scale to handle linear growth by simply making a copy of the workbook and starting it on another Excel process, or on another machine if front-ended by a simply round-robin assigner. The MD5 identifiers would also enable a simple caching mechanism enabling the handling of traffic spikes. Any client who contracts to interact with such servers would operate in a two message cycle (send the request, then send a separate series of one or more “is it ready” requests). This overhead is handled by the client and/or dumb-hubs and/or intermediate web-services, allowing Excel to concentrate on churning the model.
Anyway enough, it’s 20C outside which is very unusual for early April in Ireland, I’m off to act as a “server” of different sort, on barbecue duty …