The 64-bit question, and the birth of MicroETL

Like the original $64 question, I have for a long time being pondering the Excel 64-bit question. Whether “To take it, or leave it”?

When first announced, I believed 64bit Excel would only be of interest to a minority of demented quants in investment banks hacking their way to yet another evil model to bankrupt the world. The problem of incompatible add-ins, COM controls etc. would also make its widespread adoption less likely.

But three things changed my mind:

  • first, the appearance of Windows 7 powered 64-bit PCs on many worker’s desks
  • second, my embedding of SQLite into Excel which enables me to effectively use the extra memory available to a 64-bit instance to hold, analyse and transform a huge amount of raw relational data (in the form of a “:memory:” database)
  • and finally, the arrival on the scene of PowerPivot.

Although a PowerPivot workbook doesn’t allow more than 4Gig of memory to be addressed (this is a SharePoint restriction, as a 4G workbook is likely to compress down to <= the 2G upload limit imposed by SharePoint), this represents a doubling of the 32bit limits in memory terms, but much more in raw data terms as PowerPivot can achieve up to 10x-20x times data compression.

So, I decided to bite the bullet; fired up an AWS 64-bit image and started to upgrade my xLite code-base to handle this new world.

I decided on the TDN-GCC MinGW-w64 distro to re-compile 64-bit versions of my SQLite wrapper  libraries, and set about converting  my VBA code to handle both 64-bit and 32-bit dlls. So far, easy peasy.

The problems started when I went to convert my JavaScript & Python wrappers. I couldn’t get JSDB to compile to either 32-bit or 64-bit using MinGW-w64, so I abandoned the attempt as my main use of JSDB is as a “command-line data-crunching tool” similar to, and used along side, CSVFix and SFK (Swiss File Knife). Having the ability to call JavaScript in-process was relatively new and I could live without it.

More worrying was the problem with Python embedding, in-process Python would be harder to live without. The APSW SQLite library which I had used is 32-bit only. Luckily the Python 2.7 version of its SQLite3 standard module  (aka pysqlite) now allows the loading of SQLite C extensions. This enables me to load databases under Python and extract the C handle using a specially written extension, thus enabling Excel/VBA to attach to a Python opened SQLite shared memory.

Without this new  loaded extension trick I would have had to modify the SQLite3 library itself as the previous 2.6 version would not divulge SQLite’s handle, while APSW did, which was  the reason I picked APSW in the first place.  As a result, xLite’s Python functionality now requires Python 2.7.

So the end result is a somewhat reduced-functionality xLite  (no in-process JavaScript, missing APSW functionality such as Python-based SQLite virtual tables), but running on the ultimate data-smithing platform:

  • Excel 64-bit combining with
  • the relational power of SQLite 64bit,
  • the fast and easy scripting power of Python 64-bit,
  • and topped off with the king of pivots, 64-bit PowerPivot.

A week well spent, I think!

Oh, I’ve also decided to rename the add-in to MicroETL. This is mainly to reflect the fact that xLite (SQLite embedded in Excel) is just one element within the tool. The  pure-VBA detached-ADO recordset functionality, embedded Python and xLiteScript elements can exist independently of xLite.  That, and I’m fed-up getting enquiries about X-Lite http://www.counterpath.com/x-lite.html and I own the http://www.microETL.com URL 😉

Advertisements

3 responses to “The 64-bit question, and the birth of MicroETL

  1. Nikolaus Pohle

    Didn’t know about the existence of CSVFix and SFK so far. Thanks!

  2. @Nikolaus

    Yep, two very useful tools. If you’ve not tried JSDB, do give it a try; and of course the mighty SQLite3.exe, the King of all command-line datasmithing tools.

    Tom

  3. Pingback: Manipulating Excel tables with SQL | Gobán Saor