Expand Excel’s horizons – look to the cloud

http://www.flickr.com/photos/donabelandewen/470780785/I’ve added a new facility to microETL’s SQLScript  – the TIMER command.




TIMER takes up to 4 arguments:

  • 1st, the SQLScript to call.
  • 2nd, an optional SQLScript to call when the primary script finishes or is cancelled.
  • 3rd, the number of  milliseconds secondsto wait between each call to the primary script, defaults to 1.
  • 4th, the mode of the controlling user form; MODELESS (the default) users may continue to navigate about the workbook or MODAL, the user must exit the form before any access is allowed to the workbook. Note: in MODELESS mode, if the user attempts to change any workbook cells, the TIMER (and any other TIMERs running) will stop.

So how to demonstrate TIMER in action?

I thought I’d set my sights to a further horizon, leave the humble laptop behind; I’d go all modern and use the cloud.

A conversation with Giles Thomas of  Resovler One and now Dirigible fame (the first a desktop Python-power spreadsheet, the other a cloud-based one), lead to my discovery of PiCloud. This is an Amazon EC2 based service that allows the execution of Python code remotely; handling all the messy business of provisioning of servers and marshalling of code and results.

It is not only easier than using EC2 directly but can also be much cheaper to use, as PiCloud charge by the second (rather than EC2s “use any portion of an hour, pay for the full hour” payment terms).

I thought I’d give it a go.

I used the same data and web site as Daniel Ferry’s VBA ‘threading’  example; a list of properties to check against the realestateabc.com “What’s your home worth?” site; returning various bits of information such as estimated value, size, last sale date and price achieved etc.

Being Python I used the wonderful Beautiful Soup package to extract the needed data from the site’s HTML. I also submitted a separate request for each property, but if this were for real, I would most likely submit fewer jobs to handle multiple properties as most of the cost of each job in this example was probably due to set-up and tear down rather than the url fetching and parsing. Not to mention the time to submit 100 jobs in the first place.

Above you can see the two SQLScripts involved. The LaunchCloud script submits the jobs and makes a call to TIMER to run the CloudChecker script every second. The CloudChecker script checks for results from PiCloud and pastes them back to the source table. When no more results due, it issues a STOP command which forces the TIMER to stop.

Above is the pyGetDetails script which is passed to PiCloud (via the cloud.call() statement) to do the actual fetching and parsing of the URL associated with each property.

Download latest version of microETL here. The workbook is called PiCloud (2007/2010 format), requires Python 2.7 and a subscription to PiCloud.