SQLScript – microETL’s SQL & Sequencer utility

If you’ve downloaded the examples in my previous posts you’ll have come across SQLScript in action. You’ll normally find it in either cell A1 of each example sheet, something like =SQLScript(“scriptToRun”) or in the VBA code of macros associated with buttons e.g. Application.Run(“MicroETL.SQLScript”, “Scripts!A1”).

SQLScript can also be called from within another SQLScript and from the “Run SQLScript” option on the microETL add-in menu.

I term SQLScript a “SQL and Sequencer” utility as its primary use-case is the issuing of SQL statement in a defined sequence to support ETL-like processing. Excel formulas (I keep wanting to write formulae, but have been told I’m just showing my age, Latin is so yesterday) operate in what can appear a random fashion (in fact it’s following a highly sophisticated precedence graph). Such apparent randomness is not ideal for tasks that need to happen in a set sequence. To ensure a set sequence in Excel requires the use of VBA, which of course is what lies behind SQLScript, it being simply another layer of abstraction on top to enable ETL tasks to be data driven.

The SQL in SQLScript’s name highlights the fact that most steps will typically be SQL statements, but any public function in the various modules of microETL (microETL, xLite, Python, ADO etc.) can be executed. In fact, any accessible VBA function can be called by pre-pending the function’s workbook address.

SQLScript’s 1st argument is the address of a “table” that contains the steps to execute. The address is a “string representation of an address” i.e. it must be enclosed in double quotes (e.g. “Sheet!A1”, “myNamedRange”,”someTableName”)

The address can be of several formats:

  • The address of the top left-hand cell of the table, e.g. “Sheet2!B20”. This will allow the table to grow and contract without having to use dynamic ranges
  • A range of cells containing the table e.g. “Sheet3!A1:Sheet3!H4”
  • A “named range” pointing at either of the above.
  • The “data range” name of an Excel Table.
  • A SQLite table name.
  • An ADO detached RecordSet name (if created using ADO module functions).
  • If called in a macro, a two-dimensional Variant() array.

The 1st argument points to the “table” holding the list of steps to be executed in sequence, any subsequent arguments are passed on to the script. I quote “table” as it’s genesis was as a two-dimensional array (I’ve been using some variation of this for nearly a decade) so the constituent  columns are positional not named. In other words, the first line is ignored and the column headings can be anything but must simply extend to at least the last used “p1 to Pn” parameter column to help identify the “width” of the script. The following are the columns required to define a step:

  • Column 1 “Step”: This must be non-blank otherwise step is skipped. If non-blank its value is used as a key to a hash array that will hold the return value of the step’s function call. This stored result value can then be referenced by subsequent steps. Each step doesn’t require a unique name, but only the last result value of the duplicate steps can then be accessed.
  • Column 2 “Command”: Identifies the function to execute. The most common command is SQL. Other common commands would be SQLScript (for a script to call another script), PyScript (to call an in-workbook Python script), LOG (to output a log of step activity for debugging), ALERT (to display a windows Message Box).
  • Column 3 “FailIfFalse”: Columns 3 through 5 are usually not needed for most simple scripts, but offer a measure of control and branching when needed.  All 3 depend on the return call from related steps returning either a boolean TRUE or FALSE, or an informational string  starting with either OK (for TRUE) or ABEND (for FALSE). In the case of “FailIfFalse” if the value TRUE is set in this column then the current step will be the last step executed if the step’s command returns a FALSE (yeah I know a bit confusing, just think if TRUE then abort script if this STEP fails).
  • Column 4 “DoIFTrue”: If a value in this column then the “prior results” hash array is searched for a step result with that name. If the result of that prior step was TRUE then the step is executed, otherwise it is skipped.
  • Column5 “DoIfFalse”: As above but this time the prior step must be FALSE to execute the step. Using a combination of this and the previous column it is possible to simulate an IF THEN ELSE scenario. For both columns, alongside the names of previous steps, a special value of RETURN will access the result value of the last executed step. Also, any parameters passed into the script can be accessed via the P1 to Pn names (i.e. P1 is 1st parameter, P2 is 2nd and so on).
  • Column 6 to 6+n(max n of 20) (p1 to p20):  This contains the arguments to be passed to the step’s Command. If an end argument is to be left empty use “” (two double quotes) as the value. If the value begins with ‘:’ (colon) then it is assumed to be a name for accessing the “prior results” hash array. The special RETURN name and the p1 to p20 names (i.e. the “variables” holding the arguments passed into the current script) may also be accessed by this method.

Values supplied for each column should be actual values and not the result of a formula as the evaluation of the formula may not happen in the sequence you expect (remember the primary role of SQLScript is to get around Excel’s “randomness”).

This method of “scripting” could be termed table-oriented programming (other examples would be the “config” of ERP applications such as SAP). Being data driven rather than code-driven allows for flexibility in the control of ETL-like processes, for example the scripts could be maintained centrally and passed to remote workbooks via a web service call or even an email. Also, it should be possible for those with limited programming experience to understand the sequence of a transformation and modify (or write new scripts) as needed without having to learn VBA.

Download microETL from here http://www.gobansaor.com/microetl

The microETL project  is password protected; if you need access to the code just email me I’ll send you on the password.

Advertisements