Category Archives: Steam Powered Server

PowerPivot for Excel as an XML/A Server

Over the past three months or so I’ve taken a deep dive into the heretofore unknown world of AMO. I had up until then concentrated on using ADOMD to interact with PowerPivot but as the realisation dawned on me that the xVelocity engine in PowerPivot Excel was, in essence, the same engine powering SQLServer 2012’s tabular instances, I decided AMO needed some quality time. And, Oh Boy, have I been rewarded with a host of new and exciting possibilities; e.g. partitioned update of large tables, changing data sources at runtime, automated creation of Measures and Calc Columns  and …

…the ability to communicate with the PowerPivot engine using a SOAP Enveloped XMLA request, to which PowerPivot will respond in like fashion.

But Excel isn’t a HTML server, so what use is that?

Well, HAMMER can easily enable an Excel instance as a single or multi-threaded server and that’s what I did, and then tried accessing the PowerPivot Model cube from a variety of XMLA clients, including the excellent Saiku and a new exciting XML/A client, Roland Bouman’s XMLA4js JavaScript library.

Both worked, including both the DISCOVERY and EXECUTE command sets.

Obviously to use this in production would most likely require a reverse-proxy of some sort to front-end it, and even then, would only be suitable for a small client base. It’s a bit like the circus dog riding a bike, it’s not how well he rides, it’s that he rides at all.

But this scenario of a classic web server was not what really interested me, it was XMLA’s potential role in a steam-powered server setup that piqued my interest.

I’ve been mulling in my mind a simple means of allowing a internet-enabled client (most like a HTML5 mobile focused  web app) to query a PowerPivot model from anywhere, no VPNs, no dedicated servers. By using HAMMER to enable a PowerPivot workbook to act as a “pull server”, and having come across a neat trick to allow a Google Spreadsheet to operate as a proxy between such Excel “steam-powered servers” and remote JavaScript clients, I figured I could do just that.

Getting data out of PowerPivot in a tabular format via ADOMD DAX  queries was my initial approach; the draw back with this was, that the JavaScript client would then have to handle the pivoting of such tables. Seemed a pity when PowerPivot is a powerful and efficient pivot engine. Then I discover this XMLA feature, which in turn reminded me of Roland’s XMLA4js project; I had myself a means to specify a pivot (MDX via XMLA) and the means to render, in JavaScript, the resulting multidimensional XMLA delivered dataset.  Game on!

Obviously the interaction between the client and the “server” would be asynchronous in the extreme (with, how often the “server” searched for new requests and how many “servers” were assigned to a request tunnel, determining  how responsive the interaction would be). But HTML5 with its ability to store state locally (keeping track of outstanding requests, firing notification events when responses arrive etc.) and the understanding of the users of such a service that the tool in the background is “just” an Excel instance (maybe running on their own PC back in the office) would hopefully mitigate against a “I must have a response NOW” mentality.

This method of building simple focused web-apps would not only suit standard reporting needs but could also support more sophisticated “field” applications, such as “on-site” budgeting or complex pricing/discount calculations. The PowerPivot “server” (and its powerful hand-maiden, good ol’ fashioned Excel formulas) could remotely provide the back-in-the-office heavy lifting.

So is this a poor-man’s alternative to a SharePoint PowerPivot deployment?

Yes I guess, but I prefer to think of it as an alternative approach perhaps more suited to operational BI (rather than SharePoint’s BI dashboard approach – which in any case can be cost-effectively delivered by  this  PowerPivot hosted service ).

Also, remember models developed and initially exposed from Excel via this method could as the need (or the money) appeared be hosted directly in SQL Server 2012, while still retaining the client and/or proxy elements.

I’ve modified the InProcess_oData.xlsm example in the latest HAMMER version to demonstrate XML/A serving (see the XMLA Server Stuff sheet). The service will start on port 8479 and listen for the /xmla end-point.

To download the latest version of HAMMER, go to this page on my website.

Advertisements

Excel as a Google Visualization API Data Source

Google’s Visualization API is impressive and very easy to use. Okay, it’s closed source and must be served from Google’s servers, but if you’re happy using say, Google Spreadsheets, that’s unlikely to concern you.

The Guardian’s Miso Project, might one day provide us with a truly open visualisation alternative, so worth keeping an eye on it.

If you’ve not seen Google Charts in action do check out their “playground” and also, this Building Interactive Dashboards video, demonstrating some of the newer, and even easier to use, controls.

Utilising Google Charts API could be an alternative method of publishing PowerPivot generated datasets when the option to use SharePoint is not available (or perhaps not affordable).

One way to do this would be to publish “tabular reports” to a Google Docs account using a “steam-powered server” approach. This actually could be a very powerful method of disseminating PowerPivot generated reports, particularly if mobile devices are the target (most Google Charts are now HTML5 enabled); and I’ll come back to this in a future post.

But, Google Charts can consume data from any server, and can do so very easily if that server implements its Data Source Protocol. So, as a POC I’ve added such a server protocol to my InProcess-oData server example. The new end-point is /tq and like the /range endpoint expects to be followed by a valid range pointing at a table. See the example index.html file (no need for a server  just open in browser, having first started the InProcess-oData server on port 8081).

The protocol is not fully implemented (only supports JSONP, and only supports one request-at-a-time from any client), but it gives a flavour of what’s possible.  To use this you’ll need to download the latest version of HAMMER.

Ah, but what if your heart is set on all the SharePoint PowerPivot goodness (and the sight of all that JavaScript doesn’t appeal), but IT refuses to upgrade your ancient SharePoint farm  (or maybe even refuses to let SharePoint in any shape or form anywhere near their servers). Is all lost? Not at all, check this out, a PowerPivot hosted service (not just any old service, @powerpivotpro‘s service), now offering 30-day free trials.

To download the latest version of HAMMER, go to this page on my website.

Follow the HAMMER tag on  this blog for information on commands and examples (best start with the oldest and work forward …)

Excel – as a fractional horsepower HTML5 server

You may have been wondering what’s the driving force behind the various changes I’ve made to HAMMER over the last few weeks,  namely threading support, a simple HTTP server and JavaScript. The driving force is to better position HAMMER (and through it, Excel) as a fractional horsepower HTTP server (see this post for more on fractional horsepower engines). Features such as threading and JavaScript are useful for many things; threading, for example, makes debugging scripts easier (see the Debug sheet and code in the sample InProcess_oData workbook) and also makes long running ETL processes easier to control and monitor. But, enabling the set-up of simple task-specific behind-the-firewall data servers, with as little ceremony as possible, is the ultimate goal.

But why, what purpose do these mini servers serve?

They’re obviously not intended as beyond-the-firewall public servers, they wouldn’t scale or be secure enough for such a task. Providing in-house feeds to other web enabled clients would be a more sutable task. For example, providing a feed from a “hub” workbook containing a PowerPivot model to other “spoke” workbooks (PowerPivot enabled or not) – a poor man’s alternative to doing the same via a SharePoint farm, if you like.

But it’s another seemingly unrelated technology that’s really sparked my interest in perfecting the fractional horsepower server: HTML5.

Generally when people think of HTML5 (if indeed they think of it at all), it’s mobile platforms that come to mind. (As it’s primarily Apple and Google, through their shared WebKit browser core, that have driven the development and adoption of HTML5). So what has this to do with Excel and the boring, but oh so profitable world, of corporate IT? Well, next time you’re in any spot where the global mobile workforce gathers, airport waiting lounges, hotel lobbies, etc. look at the technology kit that they’re using.

Only a few years back, the vast majority would have had a Windows laptop, if indeed they had any “data processing” device. Now, many, if not all, will either be using a smart-phone or a tablet device (iPhone or iPad but also increasingly Android powered phones/pads). All of these workers are still likely to have a laptop in a carry case or back in the hotel room, and certainly will have a laptop/desktop or their workplace desks. But on the move mobile is where it’s going.

So how do front-line datasmiths respond to this? Currently many of us build reporting solutions and really-simple-systems using Excel as the delivery agent, moving all or part of this to a mobile delivery agent will inevitably become increasingly attractive and/or demanded.

MS is already responding to this, e.g. PowerPivot and standard Excel spreadsheets are capable of being rendered via SharePoint’s Excel Services. But what if you don’t have access to a SharePoint farm, or you need a more robust UI, such as could currently be built using a VBA/.NET add-in? This is where HTML5 and fractional horsepower servers come in.

For me, there are two aspects of HTML5 that I think will make developing and deploying such “systems” possible and relatively easy:

It’s HTML5’s local storage, that’s makes a fraction horsepower server scenario possible. In traditional web apps, it’s assumed that:

  • 1st the client is always connected to a server,
  • and that the server provides both the layout (html, javascript, css) and all the data (REST APIs etc.) that the web app consumes.

Now with HTML5 apps, the client doesn’t need to be always connected to its main server or to its data server(s). It can go offline, or it can stay connected to its main server (perhaps a public-internet-facing S3 hosted domain), and every now and then make contact with one or more data servers (which can be safely positioned behind the firm’s firewall).

An example:

A firm’s Sales Reps come into the office every Friday for wash-up meetings, to record sale completions and to get their journey-plans for the following week.

Each rep has a desktop computer, where they interface with the firm’s various systems. One such set of “systems” are PowerPivot based models that report on the year’s forecasts and actual sales.  Part of the process of preparing for a sales visit is creating a set of sales reports for each customer to be visited, last year sales, this year’s targets, and so on, sourced from the various PowerPivot models. Although the production of the reports is largely automated via Excel macros, currently the resulting sheets have to be printed.

There’s been talk of company supplied laptops for years and the budget for them has now at long last materialised. The reps however, have expressed a preference for using iPads when customer-facing, mainly because the sales conversation often require not only presenting the prepared sales reports and charts but also flicking through many of the 100 odd product manuals. Being able to hand around an iPad with high quality glossy images (and videos) of this year’s new products, plus a sales projection chart for the same products, is, they contend, a winner.

A simple mobile sales reporting app is therefore developed (using the JoApp framework and Google’s Chart API and this pure JavaScript columnar database) to cater for the type of sales reports the reps require. The existing Excel automation code is enhanced with a HAMMER server. The reps new iPads’ web-apps are configured to automatically download prepared and ad-hoc reports when they log-in to the office network.

This has worked so well, the reps now want the ability to feed back sales target changes, that they also wish to record on their iPads via another really-simple-system, to their personal Sales Plan workbooks.

Is this as simple as using a “pure” spreadsheets solution, no, but it’s nearly as simple as building a VBA/.NET powered Excel application to do the same. The problem with many Excel “applications” is that they often push Excel beyond its “comfort zone”. The benefit of a hybrid solution like above, is that Excel gets used where it’s really useful and powerful (reports and models, data gathering and dispersal) while at the same time taking advantage of the freedom and cost-benefits (and fun!) of the emerging mobile web.

Excel as a book of record.

In the past I’ve talked about Excel as a tool to develop Really Simple Systems. Such “systems” usually occupy the middle ground between continuing to do a task by hand or  investing time/money in using a packaged/bespoke “proper system”.

When such systems are primarily reporting in emphasis, the justification for using Excel is usually straight forward and compelling (and getting even more compelling with the appearance on the scene of Excel 2010’s  PowerPivot.) But, alarm bells sound across the world of professional IT when Excel is proposed as a “book of record” i.e. when it’s to be used to store and maintain a business critical dataset. And, with some considerable justification, the nightmare that is linked (or even worse, shared) workbooks is very real indeed. But yet, businesses continue to do so, and do so quite successfully.

I myself record my business as a series of Excel documents (Invoices, Timesheets, Expenses) in a set of folders (Financial Years subdivided into major customers).  Essentially a document-oriented database.

In the past I simply then used a VBA powered workbook to open the required “documents” and extracted the data necessary for whatever report I required (VAT, year-end etc.).  To better automate (i.e. less bespoke VBA) this task I’ve have made changes to HAMMER to help with this and with similar tasks for clients.

The following list of commands will be added to the next release of HAMMER. (In the meantime these new functions can be previewed here …)

LISTOFFILES

This command takes a single argument, the folder to search, and will return a table of files in that folder and in any sub-folders. The result can then be used to select a list of files for further processing.

Example:

=HAMMER(“C:\a\rss”,”LISTOFFILES”)

_XLTOJSONDOC

This command takes a list of workbooks, opens each one, checks for a list of named ranges and generates a JSON document. The command is intended to be called from within a VBA macro (as opening and closing workbook breaks the “no side effects” rule of UDFs). Most “_” commands such as “_MD5” etc. are likewise intended for “programming use”, but any command beginning with “_XL” must be restricted to macro (i.e. non-UDF) use.

See the example workbook FetchInventory for an example of this function in action. The function takes one argument (the name of the document to load) and expects a table where the last column is the full name of the workbook to open. Any columns in the source table will be copied to the new “JSON document” table with an additional column called “Document” which will hold a JSON document representing the key-name pairs and table(s) extracted from the workbook.

On opening a workbook, it is searched for a named range with the name of document concatenated with “_fields” (e.g. INVENTORY_fields). The value of this range is expected to be a CSV list of fields and tables to load. A single Excel “document” could contain multiple logical documents (each specified by its own “_fields” list) .

See the PartsInventory_bin4 for an example of a multi-document workbook (INVENTORY and EXAMPLE). The EXAMPLE document in this workbook also demonstrates the various types of tables handled.

Example:

lJSONObjects = oHammer.HAMMER(“C:\a\rss\StockTake1″,”LISTOFFILES”,”Select name,fullname from table2 limit 1″,”SQL”,”INVENTORY”,”_XLTOJSONDOC”)

lReturn = oHammer.HAMMERToRange(lJSONObjects,”Sheet2!A27″)

… will output

JSONDOCVIEW

This command is where the previous commands are leading to, i.e. extracting some real information value from your documents. It converts JSON documents into Excel friendly tables. It is, in essence, a Map function as in MapReduce. In a previous example I used a Python Map and a SQL Reduce, here, both Map and Reduce are via SQL (the command uses a series of SQL commands to perform its task).

Before I describe the function let me explain why I use an intermediate JSON format. I could just extract the data directly from each document and either store directly in Excel or create tables in SQLite of Access to hold this data. And in fact, that’s what I would have done in the past (seeExcel as a document-oriented NoSQL database). Now , however, I tend to favour using a free-format (i.e. no need for a fixed database schema) structure like a JSON document, so as the source documents evolve over time (which tends to happen not just during design stages but as the system matures) this will not break older documents.

So, for example, original Invoice workbooks might not have a backing time-sheet while newer Invoices do. As long as new and old documents share a core sub-set of data fields they can continue to be analysed together.

The command takes 5 arguments and a driving table (a record so far for HAMMER commands, most have a max of two arguments). The driving table’s last column is assumed to contain the JSON document to process, columns prior to this (if any) will be output unchanged for each resulting row.

The first argument specifies the name of the “inner” table to fetch (if any). Most real life documents consist of header details (the “outer”document) and one or more tables (“inner” details). Invoices, time-sheets,  stock-takes, all tend to follow this pattern. This command will effectively join each document’s outer details to a single inner table (if more than 1 inner table, a call for each one is required).

The second (field list in SQL format) and third (SQL where predicate format) arguments specify what inner fields to extract (if blank, then all) and what restrictions to impose (if any). So “InvNo, Date”,”InvNo > 12″ would only fetch documents where the InvNo > 12 and only include the InvNo and Date fields.

The fourth and fifth arguments do the same for the outer table (i.e. Header data).

If any of the columns specified  (inner or outer) can not be found, or if the predicates (inner or outer) result in no selection, no error is returned, the document simply returns no rows. Likewise if an inner table is specified and no such table exists, then no rows are returned for that document – in other words this is not an outer join, which is not usually a problem as in most cases a “header” without detail lines is meaningless. If an outer join is required, then extract the headers (outers) and details (inner tables(s)) separately and join using SQL.

Example:

=HAMMER(“Select Name,FullName,Document from invoice_docs”,”SQL”,”table_2″, “[PART NUMBER],QTY”, “QTY >30”, “Bin_Number”, “Bin_Number > 1”, “JSONDOCVIEW”)

would result in:

For more complex JSON objects use the JSON command to incrementally parse the text or use the VBA JSON module within microETL.  [UPDATE: Feb 2012 – … or use the JAVASCRIPT command]. But for most situations (especially if you control the expected format) JSONDOCVIEW should handle it.

As JSON is fast becoming the preferred transport format for web and mobile applications having the ability to parse and produce JSON form within Excel is very useful. It is possible, for example, to use a simple web technology such as http://robla.net/jsonwidget/ to craft another type of Really Simple System. This time with the collection happening on the web (most likely using AWS S3 pre-signed forms, so no HTML server required – keep it simple) but with the control and reporting remaining within Excel (a variation on my Steam Powered Server idea).

For an example of a really simple system  download this.

Latest version of HAMMER including the above commands now released …

PowerPivot mini-me server!

I’ve been extending microETL’s functionality to make using it as a “Steam Powered” server easier and more automated. The major addition this week is an Excel friendly JSON parser. This will allow me to use JSON objects for interchange messages and for configuration files. I was going to use XML as there’s already an element of support within Excel for the format via XML Maps and the like, but I decided that JSON was too common a format to ignore, so I modified this existing VB6 example for my purposes. The major change, other than VBA-enabling it, is the use of detached  ADO Recordsets to handle list-of-lists structures (aka tables).

So what’s the likely use for my brand new shiny JSON objects? (By the way don’t be put off by the talk of objects and the like, JSON is essentially a simple text based format see here for a description.)

I’ve demonstrated that Excel with the help of microETL can function as a Web Service endpoint; whereby, a web request is send to the TCP/IP port Excel is listening on; Excel then processes that request (in the meantime blocking any further requests as Excel is single threaded); and then issues a response with the necessary data and/or error messages.

This is not ideal. Excel is not a server product and while it can be used as one in very low-traffic environments it has the potential to be a serious bottle-neck. But, if you approach this problem using Excel as responder to a queue rather than as a always-on TCP/IP port listener, it becomes surprisingly fit-for-purpose as a server.

What do I mean by a queue-responder? An example will best explain.

Say, you’ve developed a very useful PowerPivot model that you wish to share with your co-workers. Unfortunately they’re a mixed bunch, some have access to Excel 2010 and potentially PowerPivot but most are still on Excel 2003, and another set are outside contractors who use a web gateway to access your internal systems.

Now when I say model, I mean the data relationships, cleansed data and measures you’ve developed not the imagery created on the workbook rendered via pivot tables, charts or the like. If you need to share at this level you can do so by sending the workbook to those colleagues with Excel 2010 (0r 2007 for non-dynamic view) or by investing in a SharePoint farm. Neither option is a runner in this case, as you don’t have the money nor technical resources to set-up SharePoint and the detail data behind the model is not suitable for sharing. So what to do?

Well if you had  microETL “server edition”  installed (remember,this is vapour-ware but most of the building blocks are in place) you would:

  1. Create one or more flattened PivotTables based on your super-duper model. Static elements would be “column” fields while “client selectable” elements would be assigned to either “pages” or “slicers“.
  2. Run the “service discovery” microETL function which would create one or more “service manifests” based on the flattened pivots and associated pagers and slicers. This manifest would be published as a JSON object and would be “discoverable” via a standard “discovery service”. The function would auto-build the SQLScripts necessary to handle the discovery service and the “services” associated with each flattened pivottable.
  3. Start a TIMER on the now microETL-server-enabled workbook. It would check a “resource” for any “requestMD5hash.request” JSON messages. If either an equivalent “requestMD5hash.busy” ( & not older than a certain time) or “requestMD5hash.response” exists then ignore; if not, create a “requestMD5hash.busy” file; open the .response file; process the required service using the data parameters supplied and write back a .response file.
  4. Repeat until  the TIMER is disabled.

So what’s this “resource” and what’s the requestMD5hash stuff? The “resource”, in its simplest implementation, would be a “client accessible” folder; so it could be a Windows or Samba shared folder, or a folder on the same machine as the Excel “server” with the JSON messages being push/pulled to/from that folder by a simple web server. But it could also be an AWS S3 bucket or an AWS SQS queue. In fact, the .request->.busy->.response cycle is a simple queue implementation.

The requestMD5hash stands for the MD5 hash of the .request JSON message. This will uniquely indentify the message (i.e. the hash of the the service, the parameters and optionally any security IDs or time stamps) and will be used to both track the request and to provide a simple type of request caching.

Also as the MD5 hash can be calculated on the client-side so those situations where the “resource” is a “dumb folder” can be handled, as it provides a server-free means of generating an Unique ID (rather than the more usual server powered “here’s my request and a server sends back an ID” cycle).

So what about the clients? Those with Excel, would use an cut-down version of the microETL add-in (or macro-enabled workbook) to handle: The Request (encode in JSON) -> Wait for a response or come back later -> Response (decode JSON) cycle.

Those who access via a web gateway would delegate this cycle to a web app. The app might repackage the JSON response as a pretty web page or simply as a basic HTML table or XML Map (for import into un-macro’d Excel or other spreadsheets) or pass on the JSON as is.

The web app could also handle the wait period by doing some AJAXy “please wait” magic with a timeout message like “Excel servers not responding – Please resubmit at a later stage”. In this case if the server managed to catch up before the resubmit the response would be waiting and recognised via its “.request” MD5 hash.

What about security?

The “discovery” service would have the option of linking in with a simple User/Role facility, which would:

  1. Filter the “slicer” elements at the discovery stage (a manifest would default to sending a list of single column tables each one associated with a slicer, listings all the possible elements that can be filtered through that slicer). A security filter could restrict such lists.
  2. Validate the filters requested to ensure that the “allowable” slicer elements where not doctored to include blocked elements. (The slicer lists provided by the discovery service would ideally be used by the client workbooks or web apps to populate lookup lists to provide an element of client-side validation.)

Would it scale?

If you mean would it scale to handle your growth as you went from a small firm to a mega world-wide empire, no it wouldn’t. But it could scale to handle linear growth by simply making a copy of the workbook and starting it on another Excel process, or on another machine if front-ended by a simply round-robin assigner. The MD5 identifiers would also enable a simple caching mechanism enabling the handling of traffic spikes. Any client who contracts to interact with such servers would operate in a two message cycle (send the request, then send a separate series of one or more “is it ready” requests). This overhead is handled by the client and/or dumb-hubs and/or intermediate web-services, allowing Excel to concentrate on churning the model.

Anyway enough, it’s 20C outside which is very unusual for early April in Ireland, I’m off to act as a “server” of different sort, on barbecue duty …