Manipulating Excel tables with SQL

microETL SQL Window

microETL SQL Window

Having upgraded xLite to be 64bit compatible and as a result having to re-test it, I decided to make some significant changes to how it works. Although xLite was, and is, SQLite embedded in Excel, having access to SQLite databases is not the reason I’ve combined the two products. The manipulation of Excel tables using a SQL dialect, all the while operating in memory, is my goal.

Having the ability to access external disk-based SQLite databases can be useful (particularly when dealing with very large fact-like tables) but it’s a perk not the primary use-case. The main aim is to bring the data-handling power of a SQL engine  to  bear on Excel tabular datasets; if Excel offered this by default, I would not be embedding SQLite. SQLite is a means to an end.

So what changes have I made?

  • First off, I’ve added a SQL helper form (right-click on cell context menu, and pick SQL). The form not only makes issuing SQL statements easier but also holds a history of commands if required.
  • Secondly, I’ve introduced a pre/post processor for the SQL() function.
  • I’ve also added two SQLScript (new name for xLiteScript) helper forms. One to run a SQLScript from the right-click context menu, plus a lookup form to help select the required microETL function along with a description of the parameters required.

Examples of the new SQL() pre-processor in action:

=SQL(“Select * from TABLE(Sheet2!A1)”,”Sheet1!B17″)

This will paste the table found at position Sheet2!A1 to Sheet1!B17.

=SQL(“Insert into TABLE(Sheet1!B17) Select * from TABLE(Sheet2!A1))

This will do the same as above, difference being that for this to work, an existing Excel table at Sheet1!B17 of the same structure (i.e. same headings) as Sheet2!A1 must first exist.

The TABLE(…) command tells SQL() to go fetch the relevant Excel ranges, loads the “tables” into SQLite; when the SQL command finishes it writes back the data (if changed) to the originating Excel address, and cleans up any temporary SQLite-side tables. Nearly all SQL commands can be issued against the resulting tables, including UPDATE, DELETE, DROP and CREATE TRIGGERS etc.

Two other pre-processor commands are CELL(…) and CELLS(…). Similar to TABLE(…), data is loaded from the Excel addresses specified and placed in the SQL  statement. CELL takes a single cell reference while CELLS will iterate through a multi-cell range and insert a comma-separated list into the SQL statement (in a format suitable for IN (…) expressions).

SQLScript is a simple table-based command sequencer  (ETL is primarily a step-by-step process) which can be used as an alternative to VBA  when a simple set of events needs to happen in sequence. The optional  SQL history generated by the SQL helper form is held in a SQLScript format and written back to an Excel table on exit, allowing for the creation of repeatable scripts based on previous SQL commands.

I still need to fully test the new functionality and will do so over the next few weeks before I publish a beta version. If anybody wants to try the un-tested alpha version let me know and I’ll send you a copy. you can find it here.

Update:

See http://blog.gobansaor.com/2011/03/09/microetls-sql-function/ for a more detailed description of the =SQL() function.

Update:

The table() command now supports virtual tables e.g. “select * from table(vtCSV,C:\data\myData.csv)” see http://blog.gobansaor.com/2011/05/05/excel-document-oriented-database-with-python-map-sql-reduce/

Advertisements

3 responses to “Manipulating Excel tables with SQL

  1. Pingback: MicroETL Alpha Release | Gobán Saor

  2. Tom this looks great – thanks very much. I thought I had reached a new level of Excel goodness when I came across the morefunc add-in and in particular the VSORT.IDX array function, allowing me to make a ‘view’ of sorted data without affecting the underlying data, but this looks as helpful (in a different way). It sounds like SELECT copies data rather than ‘refers to’ it – apologies if I have misread. What do you do if you want to imitate a view in Excel? I have used Excel tables (containing formulas only) with some VBA code to auto-extend them as the base table grows.

  3. John,

    You’re correct, when Table(..Excel REF..) is used it takes a copy of the referenced data into the SQLite in-memory database; does whatever is required by the SQL statement(s) and if the table has changed writes back the (entire) table back to the Excel referenced location and then deletes the copy of the data from SQLite (see http://blog.gobansaor.com/2011/03/09/microetls-sql-function/ for more details).

    So although you can create a SQLite view based on a TABLE() “table”, it will be of no use outside the set of SQL statements that created it as the view’s base table will be no more. I thought of creating cross-function-call references – like I do within a single call; a single call can consist of multiple SQL statements (separated by “;”) and those can in turn “fire” SQL triggers and SQLScripts which will all access the same TABLE() data) – with the addition of an event handler to keep the data in-sync but figured it added a layer of complexity greater than the benefits.

    To persist a “view” of data across function calls do something like “CREATE TABLE tableSheet1A1 AS SELECT …. FROM TABLE(Sheet3!A1) WHERE … ORDER BY …” in effect creating a “materialised view”. To keep that view in-sync add the SQL call to an event class “watching” the Excel source of the data.

    I’m in the process of adding “virtual table” functionality to the pre-processor (in fact, that’s what it already is), examples so far are:

    “from TABLE(vtCSV,C:\data\inv_fact.csv)” which loads a CSV file, which if changed during the function call will write back the CSV data back to the file.

    “from TABLE(vtFILES,C:\imported\,*.csv)” which will list a set of files from a folder.

    These are based on standard VBA functions, rather than C as is normal for SQLite virtual tables. Each VBA function registered as a VT, must simply take an argument that passes in the name of a SQLite table which it then populates with whatever data is relevant.

    Tom