These last two weeks, Python has been on my mind. First off, last week I decided to make time to fully investigate Picalo, an open-source Python-based data analysis tool, and then, this week, Google announced their long awaited cloud-computing offering, Google Apps Engine, with the language at its core.
But of course the scripting language that continues to power most of my datasmithing activities is Excel VBA. That’s why I was so excited to see a tool such as Proto utilise VBA as its scripting language. But, Microsoft has abandoned VBA, there will be no more Protos.
Also, Excel VBA is now a Windows only language. Windows, however, is no longer the ‘only’ business client OS (see how many Apple laptops you can spot the next time you’re in a business-class airport lounge, a few years ago it would have been zero, not any more), and is currently nowhere to be seen as a cloud computing platform (but that’ll change).
I’m at heart a table-oriented programmer, and I, like Picalo’s author Conan Albrecht, believe “data analysis is best done through scripting”; but not just data analysis, the T in ETL (Extract, Transform and Load) and the I in DI (Data Integration) and SI (Systems Interfacing) also benefit from a scripting approach.
It looks like it’ll be Python because it’s …
- Windows friendly (via Py2Exe),
- …but also runs and is installed by default on MacOS and most Linux distros,
- …and being Linux friendly means I can use it to power Amazon EC2 hosted “batch processing“.
- Cross-platform “native” GUI support using wxWidgets via wxPython.
- Google Apps Engine support, I now have a replacement for the late Zimki.
- It’s table-oriented thanks to Picalo’s Table Object which provides a “good enough” alternative to Excel VBA’s powerful Range Object.
- Picalo’s Table Object being an extension of a Python “list of lists” is of course memory-bound, but large-scale memory mapped datasets are no longer a problem, they’re an oppurtunity …
- Iron Python offers powerful .NET integration, whether through the innovative Resolver Spreadsheet product or via ExcelDNA.
- Via Google Apps Engine, it looks like becoming the Macro language for Google Spreadsheets.
- It’s powerful, fun to use, easy to learn, function orientated and I even like the use of human-eye-friendly indentation rather than braces and delimiters as a code grouping mechanism.
- It’s easy to wrap C/C++ libraries to make them usable from Python via Pyrex and cTypes.
- Python 2.5 now has embedded SQLite3 (ultimate datasmithing tool!) support (note PHP5 developers, SQLite3 not SQLite2!).
- Oh, and did I mention the Google Apps Engine service and its equally important but generally overlooked SDK.
The runner up is of course Ruby, but its poor integration with Windows is a major problem and the datasmithing “prior art” of Picalo and Resolver makes Python hard to beat.
UPDATE Jan 2010:
To experience the best of both worlds, VBA & Python, my xLite (Excel combined with SQLite) datasmithing platform now allows Python to be used in conjunction with VBA. Check it out here
UPDATE: July 2011:
For another method of integrating Python (this time .NET’s IronPython) with Excel/VBA see http://blog.gobansaor.com/2011/07/18/vba-multithreading-net-integration-via-hammer/
Also, as Dan pointed out in the comments below, I’d not included Jython in my list of reasons for embracing Python. I must add it to my list of things to try out particularly as both my “classic” ETL tools, Talend and Kettle are JVM based.
Another thing to add to the (ever growing) list is Mike Pitarro’s SnapLogic python-based ETL tool. They have …
…just released a 2.0 Beta version with some major architectural enhancements. The SnapLogic model is very different from traditional ETL systems. It takes an approach that’s more like the web, based on loose coupling and HTTP interactions. We model data source, sinks, and transformations as URI addressable endpoints, and have a model where than can be chained together in pipelines to build transformation logic. We use a plugin architecture to make it easy to add custom components.