Tag Index – SQLite Star Query Part IV

In a previous post I put forward an approach to using SQLite as the basis for a Tag Cube i.e an alternative pivoting mechanism using a ‘tag’ index to  enable multi-column star-queries against a central fact table. Although the appearance on the scene of PowerPivot has negated the raison d’être for “Tag Cubes”, the concept of a “Tag Index” still has relevance.

A tag index differs from a normal b-tree index in that it is optimised for multi-columnar access (OLAP type queries) rather than single-column primary/foreign key access (OLTP type queries). In essence it’s somewhat similar to columnar indexes over relational databases such as will be available in the next version of SQLServer (see here, for this and other exciting news re the future of PowerPivot in the enterprise).

SQLite doesn’t offer  columnar indexes but it does have something similar; FTS3 – Full Text Search virtual tables; and this is what I use to implement TAG indexes.

The basic idea is to create an FTS3 virtual table that indexes the primary search columns of a fact table (or very large dimensional table). This virtual table is then used as a first level filter ‘index’ for the fact table. To make this simpler to use (for example, to make it possible to be used as a fact table in a SQLPower Wabit Mondrian schema) I wrap the fact table and FTS3 ‘index’ table using a virtual table of my own. An example will best explain it:

If you have a fact table such as this:

CREATE TABLE orderFacts (CustomerID, ProductID, CustomerType, InvoiceDate, Qty,Value);

To create a Tag index:

CREATE VIRTUAL TABLE orderFactsFTS using FTS3();

Populate this ‘index’ with the rowid and a space delimited concatenation of the relevant ‘key’ columns (docid & content are the default columns of an FTS3 virtual table):

INSERT into orderFactsFTS (docid,content) select rowid,CustomerID||’ ‘||ProductID||’  ‘||InvoiceDate from orderFacts;

Finally, create a ‘wrapping’ virtual table using my TAGindex virtual table function:

CREATE VIRTUAL TABLE  orderFactsTAGs using TAGindex (orderFacts, orderFactsFTS,3,CustomerID,ProductID,InvoiceDate, CustomerType,Qty,Value);

The TAGIndex parameters are:

  • The fact table name.
  • The FTS3 ‘index’
  • The number of columns that the TAGIndex virtual table will use as part of the tag index. In this case 3, as we’ve didn’t index customerType (too low a cardinality to bother, perhaps) nor the two ‘measure’ columns of Qty and Value.
  • The column list from the fact table to expose via the TAGIndex virtual table. Columns that are part of the index must appear at the beginning of the list and must match in number the value specified by the 3rd argument above.

We can then access the resulting table like so:

SELECT InvoiceDate, sum(Value)
from orderFactsTAGs
where CustomerId=2098 and ProductID=’FX-010-23′ and Qty >10
group by InvoiceDate;

The same result could be achieved without the TAGindex virtual table using :

SELECT InvoiceDate, sum(Value)
from orderFacts
where CustomerId=2098 and ProductID=’FX-010-23′ and Qty >10
and rowid in
(SELECT docid from orderFactsFTS
where orderFactsFTS match “2098 FX-010-23”)
group by InvoiceDate;

There are a number of potential problems with this approach:

  • What happens if there’s a product and a customer with the same 2098 key? The FTS3 filter will be less effective but the result will still be correct due to the ‘filter twice’ provided by the standard SQL WHERE predicates. Obviously it would be better in that case to add new columns to the table to ensure uniqueness of keys across key domains but it’s not essential.
  • If a key value begins with a “-” the FTS3 engine will treat that as “not =”  resulting in an incorrect result. Unlikely, but if likely, same solution as above, new column to doctor the key to suit.
  • High cardinality amongst key values, e.g. Invoice number, causing a very large, potentially slow, index. This is the same problem that columnar-store indexes face. Best not index such columns if index bloat occurs but it most cases (e.g. such columns being a minority) this shouldn’t be a problem.

So what are the potential use cases for TAGindex tables?

Here’s the code of the TAGindex virtual table function (implemented as a SQLite extension):

TAGindex.c

And, here’s a already built dll (in SQLite3.exe issue “.load TAGindex.dll” command before using):

TAGindex.dll

For  another example of a SQLite virtual table (this time combining Expat XML streaming parser with SQLite) see http://blog.gobansaor.com/2010/06/16/sqlite-xml-streaming-virtual-table-via-expat/

Advertisements

One response to “Tag Index – SQLite Star Query Part IV

  1. Pingback: SQLite XML Streaming Virtual Table via Expat | Gobán Saor