microETL’s SQL function

At the heart of microETL is the SQL() command (or more correctly microETL.SQL()). The primary purpose of microETL is to allow the power of SQL to be used within Excel and SQL() is how that is delivered. The command can either be issued from a cell, like a standard excel formula, or by using the microETL SQL Menu option or as a command in a SQLScript. It can be called from a VBA macro (using Application.Run(“microETL.SQL” … ) but not if that macro is to be used as a User Defined Function (aka user defined formula). It can also be called from within another SQL statement using the equivalent SQLite user defined function also called SQL().

Example: click for a larger image (animated GIF)

=SQL() in action

SQL(sqlStatement,selectDestination(OR arg1), arg2 …. argn)

The 1st argument is the SQL statement(s) to execute. These can be any valid SQLite statements (see SQL as understood by SQLite); exceptions are the ATTACH statement and in some circumstances the DROP TABLE statement. The statement will be issued against the default microETL in-memory database i.e. an empty database that is instantiated in memory at workbook start-up and any tables or data created during the workbook session are discarded at shutdown.

So, if the database is ephemeral where will my data be stored? In the spreadsheet most likely. The other alternatives  are to use the SQLite UDFs load_CSV() and write_CSV() to load/save CSV files, or see http://blog.gobansaor.com/2011/05/11/accessing-sqlite-databases-from-excel-via-microetl/ for other options.

In order to make transferring data between Excel and the SQL environment easier and simpler a pre-processor has been added to the function. There are three types of pre-processor commands:

  • The TABLE(…) command tells SQL() to go fetch the relevant Excel range, loads the “table” into SQLite; when the SQL command finishes it writes back the data (if changed) to the originating Excel address, and cleans up the temporary SQLite-side table. Nearly all SQL commands can be issued against the resulting tables, including UPDATE, DELETE, DROP and CREATE TRIGGERS etc.
  • Two other pre-processor commands are CELL(…) and CELLS(…). Similar to TABLE(…), data is loaded from the Excel addresses specified and placed in the SQL  statement. CELL takes a single cell reference while CELLS will iterate through a multi-cell range and insert a comma-separated list into the SQL statement (in a format suitable for IN (…) expressions).
  • Positional substitution tokens in the form :1 to :n

Note these are pre-processor commands not SQLite commands and can only be accessed using the SQL() function. The pre-processor “functions” may appear to work like a normal SQLite function call but are less flexible in some ways e.g. no spaces allowed, parser expects “TABLE(” not “TABLE  ( ” but more flexible in others so a function requiring a table name can use the pre-processor to provide it e.g. =SQL(“Select load_CSV(‘TABLE(Sheet!A1)’,’c:\data\test.csv’);)”).

The sequence of events when a TABLE() pre-processor command is detected in a SQL statement is as follows:

  • SQLite SAVEPOINT is set if the SQL()  statement is not already running in the context of a previous SQL() function call. This wrapping in a transaction is the reason for ATTACH commands not been allowed. The purpose of the transaction is two-fold; 1st it will clean up any temporary tables and other updates in the case of a function failure, 2nd, SQLite inserts/updates are much faster when executed within a transaction, especially if writing to a disk-based database.
  • If the text between “TABLE(” and the next “)” is a valid range representation (or a named range pointing at such) a check is carried out to see whether that same range has been referenced before in this context (same context being within the same set of SQL statements or those of a prior “wrapping” SQL() function call). If already “handled” replace the TABLE(…) with the name of the temporary table associated with this range. If not, load the table data (as long as the range is an Excel table, or points to the top left-hand corner of a potential table) into a new temporary SQLite table and replace the TABLE(…) with its name.
  • When the last SQL statement within the current context is executed and if the temporary table has been modified (INSERT’ed, UPDATEd,DELETEd or DROPped ) the table is written back to the source range (or range cleared if a DROP statement).
  • The SAVEPOINT is released.

The remaining arguments to the SQL() function are also processed by the pre-processor, the arguments passed replace positional substitution tokens in the form :1 to :n where :1 is replaced by 2nd argument and :n is replaced by the n+1th argument.

The 2nd argument (that is the arg corresponding to the :1 token) is special. If no “:1” token is found within the SQL Statement, the argument is assumed to be a destination for a SQL SELECT statement (think of it as a SELECT INTO). It can be either a string representation of a workbook range or a named range pointing at one. The range can be either a full range, top left-hand corner range or an Excel Table. (If none of the above, the SELECT will populate an ADO detached RecordSet and use the argument as the name of that recordset).

A number of helper functions have been added to SQLite within microETL. Note: these are not preprocessor commands and so can be used by any SQL issued within microETL. (There are a number of SQL issuing functions other than SQL() and there’s also the SQL calls issued within Python functions.)

Using the Menu option SQL

The main SQL helper functions are …

x(formula,p1,p2,p3…)

Where formula is a spreadsheet formula or built-in function, followed by a variable list of parameters. The formula can take advantage of microETL “placeholders” e.g. x(“20+ :1 + :2”,20,40) will return 80, x(“upper(:1:)”,”abc”) returns ABC.

The formula’s text, with any placeholders replaced, is passed to Excel’s Application object’s Evaluate method, so not withstanding some of the limitations associated with this function (see here) it’s possible to use most Excel formulas within a SQL statement.

Only built-in functions (i.e not User Defined Functions) can be called in this manner.  To call a UDF use the udf() (or u() /mdf())  function.

When specifying the formula do not include the leading “=”, this will ensure formula is evaluated inline along with the rest of the SQL statement. If preceded by “=”, the formula is passed back as text. This can sometimes be useful either for testing purposes or to get around Application.Evaluate limitations.

Example: click for larger view:

udf(functionName,p1,p2,p3…)

This allows any VBA public function (i.e. a UDF) in the active workbook  to be called from within a SQLite SQL statement. The u() and mdf() functions are similar but the mdf() function will only run functions in the microETL project while u() will default to the microETL project but will also take references to UDFs in other workbooks.

Py(address,p1,p2,p3…)

The address can be either a string representation of an Excel range e.g. “Sheet2!A3:Sheet2!D6” or a named range. If the range is a single cell the function will expect to find the full Python script within, if a multi-cell range then the script will utilise cells as Python indents.

The scripts are in fact anonymous parameter-less Python functions. Parameters are passed via xLite “placeholders” which are replaced before the script is passed to the Python interpreter e.g.

If in  this case 2nd parameter is less than 40, return 1st parameter increased by 10% otherwise return unchanged.

SQLScript(address,p1,p2,p3…)

See previous post http://blog.gobansaor.com/2011/03/04/sqlscript-microetls-sql-sequencer-utility/

SQL(sqlStatement,p1,p2,p3…)

This allows the microETL.SQL() function (along with its pre-processor goodness) to be called from an environment where the pre-processor is not supported e.g. within Python. It’s also useful to process a series of SQL statements held in a table.

Various Hierarchy functions

See this Handling Flat, Parent-Child and Nested Set Hierarchies and this.

write_CSV(table,filename)

This will write out the contents of the table in classic CSV format (UTF-8 encoded). Useful for external SQLite tables that may be too big to fit in memory but need transformation before loading as a CSV into the likes of PowerPivot.

load_CSV(table,filename)

Will load UFT-8 encoded CSV files into a table. Again, if a CSV source is too big to load into memory (thinking fact tables here) but needs some SQL love’n’care can be loaded into an external table (i.e. in a disk-based database) bypassing the need to fit in Excel’s working memory.

sqliteDate(dateTimeString)

Will attempt to convert a string to an ISO DateTime string ( the format used internally by SQLite i.e. “YYYY-MM-DD hh:mm:ss”). Usually microETL will handle date conversions between Excel and SQlite and back again (but when a date is pasted back it’ll be in Excel numeric representation, the column may need to be formatted to the required date/time representation). Sometimes however the date format of imported “string dates” may not be recognised (for example, Excel loaded CSVs often fail to recognise dates correctly). This function will usually solve the problem.

ISODate(Year,Month,Day)

Will take a date passed as 3 numbers (e.g. ISODate(2010,12,31)) and convert to SQLite date format.

REGEXP

I’ve also added REGEXP (regular expression) support so you can do stuff like so…

SELECT * FROM Foo WHERE account_id REGEXP ‘[0][0-9]*[x]’

or

=SQL(“SELECT regexp(‘[0][0-9]*[x]’,’0987f85x’);”) -> returns 0 for false (should be all digits between leading zero and final ‘x’).

Advertisements