Category Archives: Palo

Handling Flat, Parent-Child and Nested Set Hierarchies

Alberto Ferrari describes a method of handling un-balanced hierarchies in PowerPivot. As usual Alberto comes up with an elegant solution to something that looks near impossible. For more of the same, check out  his book, co-authored with Marco Russo, Microsoft PowerPivot for Excel 2010: Give Your Data Meaning. It’s the best book out there for those from a data modeling background trying to make sense of this new form of ROLAP or indeed for those from a spreadsheet background who wish to gain a better insight into PowerPivot-oriented data modelling.

PowerPivot has problems with un-balanced hierarchies (aka variable-depth parent-child hierarchies) as it’s essentially a relational engine. Relational databases have traditionally sucked at handling such hierarchies which are usually implemented using an Adjacency List Model.

When I moved from network CODASYL databases to relational databases all those years ago it was my biggest complaint about the ‘new order’ of the RDBMs. I guess it’s my network database heritage that makes working with MOLAP engines such as Palo seem so natural. Such databases are kind to hierarchies and to the users of hierarchies.

Most of the major commercial databases have now extended their SQL offering to handle hierarchy navigation, you can see the SQL Server’s way in Alberto’s post. But SQLite offers no help, so in order to make handling for what are in fact very common requirements for reporting and analysis datasets, I’ve extended the SQLite SQL available within microETL with 4  useful helper functions.

All 4 work with tables of the following structure:

Create table whatEver (HierarchyName text, Parent text,Child text,Name text,Level int,setID int,setLeft in,setRight int);

Where:

  • HierarchyName allows for multiple hierarchies to be stored in the one table.
  • Parent is the parent node for “this Child”. Can be text or a numeric surrogate key. Top level nodes will be set to null (or = “”).
  • Child is the node identifier.
  • Name is the textual identifier. If not populated, uses Child value.
  • Level is the depth from top of the current path. Levels = 0 are top level nodes. Not strictly required, but so useful it’s worth populating.
  • setID like Level is not required to be populated but can be useful in Nested Set queries.
  • setLeft is the left-hand side of the Nested Set range associated with this node.
  • setRight is the right-hand side of the Nested Set range.

What’s all this Nested Set stuff? As I pointed out above, SQL engines tend not to agree on a method of hierarchy navigation, if indeed they offer any method. Nested Sets is the classic data modelling method to get around this problem (if you look at hierarchy tables in cross-RDBMs packaged applications you’ll often find a variation on this method).

See this post Managing Hierarchical Data in MySQL for a good description of the technique.

There’s no need to populate these ‘set columns’ unless required they’re there just in case.

Function: make_pc_hier

This function is for handling the opposite problem to that which Alberto faced. It takes an existing flattened hierarchy and turns it into a “proper” parent-child one (hence the pc element of the name). Why would you do that? Well, MOLAP tools such as Palo not only handle unbalanced hierarchies, they usually store hierarchies using an adjacency list model. Also, if you wish to use Nested Set SQL, you’ll first need the model in parent-child format.

The function’s 1st argument is the name of the hierarchy table; if it exists it must be in the format specified above, otherwise a table of that name with the required format will be created by the function.

The 2nd argument is the hierarchy’s name.

The remaining arguments are the source columns for the hierarchy in the sequence of level 0 to level n.

The function is in fact a aggregation function (like SUM()) so can be used to group the required columns from a table where they are repeated (such as a fact table on which the hierarchy has been denormalised).

Example (click on it to see larger version):

Function: make_pc_hier_from_tree

This is similar to make_pc_hier() except the source table must be in a ‘tree format’. It is also an aggregate function but the use of group by would, in this case, not make sense. The use case for this is the easy creation of hierarchy dimensions from a tree structure for use in the likes of Palo.

Example:

Function: flatten_pc_hier

This takes four mandatory and an optional fifth argument. First two are the usual hierarchy table name and hierarchy name. The 3rd is the output table to receive the flattened table. The first column of this output table will be populated with the base element Child IDs (i.e. the deepest level in the nodes’ paths). The remaining columns must match the number of the next argument. This 4th argument specifies the maximum depth of the flattened hierarchy, so if the hierarchy is to be, for example, Sector,Brand,Product, then this would be 3.

The optional 5th argument if set to Y will not fill unused columns in an unbalanced hierarchy, otherwise the final nodes are filled out to the right with the last non-blank Name. This is a “stored procedure” like function, so should be called using “Select function(arg1,arg2 …);” syntax.

Example:

Function: nested_set_hier

This takes the two usual arguments, hierarchy table and hierarchy name. Its function is to populate the “nested set” fields of the supplied table to make a valid Nested Set. Again, it’s a “stored procedure” function so should be called using the “Select function(arg1,arg2 …);” syntax.

Example

Download Example:

Download  microETL from http://www.gobansaor.com/microetl and locate the hier97-2003.xls file.

The microETL project  is password protected; if you need access to the code just email me I’ll send you on the password.

Update:

There’s now an Alberto_hier workbook (in 2007/2010 format) with a worked example of building Alberto’s modified dimension. This workbook also includes a PowerPivot cube demonstrating the required measure (plus an equivalent Nested Set SQL Query).

For more on handling nested sets in PowerPivot see http://blog.gobansaor.com/2011/03/10/powerpivot-nested-set-measures/

Advertisements

Slowly Changing Dimensions – time to stop worrying.

Anybody who has worked on a major data warehouse project has war stories about that most awkward of BI concepts, the slowly changing dimension (see http://en.wikipedia.org/wiki/Slowly_changing_dimension).

The problem is not so much the need to keep a history of changes to master data attributes nor how to model these in the underlying data warehouse but how, or if, to model SCDs on end-user facing cubes.

SCDs have the potential to make end-user interaction with a cube more complex and error prone. A similar problem arises when a dimension can have several “roles” such as a sales fact with an order date,  dispatch date and invoice date.

Where the nature of the problem requires the exposure of dimensional change in a cube model then there’s no alternative. But, when the justification is that there might be a need in the future or that occasionally in the past that need existed then the argument that front-end cubes should solve specific not generic problems should apply. Holding that line was not always possible when the cost of developing cubes was high due to professional IT involvement and the well founded fear that the money might be wasted as requirements changed over time. The result was often cubes that were more generic (and long-lived, often to the point of irrelevancy) compared to a design that was more tightly focused.

This is where I think self-service BI tools such as PowerPivot or Palo changes things; if those who pay for BI need no longer insist that a design is future proof or that a single model be capable of handling all the current views of the business they can concentrate on building just-in-time highly focused (and hence usually simpler) models.

Different groups in the same business can have variations that suit them e.g. HR/Payroll might was to see a sales rep who has changed sales area reported against the rep’s actual sales while Sales/Marketing might be interested in the potential/history of the rep’s current sales area. Likewise, with date roles, call centre management might wish to report against order date, while the warehouse guys focus on dispatch date, and Accounts only have interest in those transactions with an invoice date.

But what about single-version-of-the-truth worries, what about reconciliation? This is where professional IT come back into the picture. Self service BI doesn’t mean do-it-all-by-yourself BI, no more than a self-service sausage and bacon hotel breakfast means you first catch a pig. The marshalling and delta management of data remains an IT responsibility, very small organisations and/or skilled end-users may be the exception to the rule, but ETL and data repository design are non-trivial problems and look likely to remain so. But with more and more of the front-end been delegated to (or snatched by) business end-users, the nature of back-end ETL and data modeling will change. Many of the resources once applied to a fully IT owned BI project will be focused on speedy, cost effective and accurate data provisioning, a less glamorous and largely invisible role compared to putting pretty dashboards on the CEO’s desktop. As a result, many IT staff with front-of-house skills who are now regarded as IT “assets” will more and more see themselves (and be seen) as front-line business ones.

So is this end of the road for the all-rounder, the catcher of pigs, frier of bacon and breakfast waiter rolled into one? No, I think not, there will continue to be a need for those individuals (or small teams) who can quickly and effectively tackle the once-offs, the emergencies and the need-to-knows that “normal” business practice continually brings to the surface. Small business (or isolated departments in larger concerns) will also continue to need tailored “end-to-end solutions”.

While I do see the continuing need for all-rounders their background (and hence the tools and product features to support them) will change from IT folks with business skills  to end-users with IT skills. Tools such as PowerPivot are an important first step on that journey, more end-user focused ETL tools (and services) is the next step. In the mean time, the world will have to depend on the likes of me (IT guy with business skills armed with a hodge-podge of ETL tools and techniques to bridge the raw-to-actionable data gap) for most of their BI all-rounders.

Excel in Data

My speciality is data, more specifically, data at the edges. Data that has been extracted from, or needs to be generated outside of, formal systems. Such data tends to be the extremely valuable and useful, but only if shaped into a format that has meaning to those who can use and analyse it. I’ve been extracting, marshalling and shaping such data for most of my 30 years in the IT business, I’m a datasmith.

The technical term for what I do is ETL (Extract, Transform & Load). It’s the name given to the skill-set and tools associated with the movement of datasets. It’s the unglamorous, but essential backbone, to a whole range of IT deliverables: Data Warehousing, BI (Business Intelligence), Planning & Forecasting, New System Commissioning, Systems Integration.

In the past my skills would only have been of use to large IT departments, as major “data shaping” activities such as BI & data warehousing were “owned” by IT; small businesses and departmental workgroups were outside the loop except as providers of requirements (and money) and as long-suffering “end users”.

This alienation from the process of data shaping and the need to simply “get things done” led to the evolution of “shadow IT”, usually involving Excel is some form or another. Although many attempts have been made to control and tame this wild-west of data creation and consumption, they’ve largely failed. With good reason, the best people to get value out of data are those who depend on and generate the data in the first place.

Microsoft has long recognised this and has over the years improved Excel’s ability to handle and pivot lists (aka tables, the primary data artefacts of business). Excel’s PivotTable is undoubtedly the world’s primary BI tool.

Microsoft has, with Excel 2010, introduced an even more powerful end-user pivoting tool, the free PowerPivot add-in. The world of self-service data analysis has changed for ever; not only can PowerPivot handle enormous amounts of data but it enables end-users and data analysts alike to build highly complex analytical models at a fraction of their previous cost.

PowerPivot offers the greatest returns to those organisations or individuals who have:

  • access to report-friendly data sources, such as those provided by existing data warehouses.
  • The resources to purchase and maintain an Enterprise SharePoint farm (backed by Enterprise SQLServer) allowing PowerPiviot dashboards to be shared (and controlled) via a web interface. Such infrastructure, although ideal, comes with an expensive price tag.

So, if your organisation or your team is missing one or both of the above is PowerPivot not for you?

Far from it, PowerPivot is still extremely useful without either. Data that’s not in the ideal format can still provide useful analyses, and both Excel and PowerPivot can help cleanse such data. Likewise, just like an ordinary workbook, a PowerPivot workbook can be emailed or placed on a shared drive (all the supporting data is embedded in the .xlsx file). The only problem is, the time and resources associated with this activity can be substantial if you don’t have the ‘micro ETL’  and general Excel VBA skills to help automate it. This is where I can help.

I’ve both the skills and the tools to build a rock-solid Excel-based ETL back-end for your PowerPivot projects. This can be as simple as a “good enough” data shaping automation for a single PowerPivot cube or a sophisticated ‘micro-Data Warehouse’ which would act as a foundation for a more long-term self-service BI strategy. The only software product you’ll need to purchase is Excel, I’ll provide the rest. Most of the time Excel 2010’s  & Powerpivot’s in-built ETL capabilities, automated with some VBA, will be all that is required but when more ‘firepower’ is required:

  • I’ve assembled my own Excel based ETL tool specifically to enable me to quickly and cheaply build such micro-ETL solutions, it’s a collection of best-of-breed open-source tools (such as SQLite) driven by, and integrated into Excel (versions 2000 – 2010, 32bit & 64bit).
  • I can call on Palo OLAP (an open source, Excel focused budgeting & forecasting tool)
  • and on Talend, a leading open source ETL and data-management suite.

I can also automate the work-flows and non-ETL repetitive tasks associated with such projects. Tasks such as: generating questioners, marshalling and recording the resulting replies, generating non-PowerPivot workbook reports/pivots (remember, unlike previous versions, Excel 2010 can exist side-by-side with older versions of Office, only need to provide it to those who need its full power).

As for building PowerPivot cubes, I’ve a long background in the design & delivery of ROLAPMOLAP solutions. PowerPivot is essentially a 3rd way of producing OLAP cubes, heavily influenced by the dimensional modelling technique commonly know as the star-schema, but with the presentational simplicity of the Excel’s PivotTable and an Excel-like formula language (DAX) bolted on. PowerPivot models built with an understanding of the usefulness of the star-schema approach will allow for richer and simpler data analystics. I can provide that knowledge or help you master it.

My involvement can be either on a “deliverable” basis, i.e. I deliver the required solution, usually fixed-price for a fixed deliverable or on a consultancy basis, where the “deliverable” is more an ongoing process and would often involve a degree of skills transfer.

I excel in data, with tools such as PowerPivot so can you.

Star Schemas – to boldly go where no Excel spreadsheet has gone before


One of the many things that delights me about PowerPivot is the central role played by the Star Schema. Those of you reading with a data-warehousing background would shrug your shoulders and say: “So what, what else would you expect to find at the core of a BI tool?”.

Those from an Excel PivotTable background would ask: “What’s a Star Schema, why do we need one,what’s wrong with a the good old-fashioned single flattened table?”.

Those from a classic MOLAP background (Essbase, TM1, Palo) might also ask: “Why do we need this extra layer? Load the cube directly from the operational data model and get on with it!”.

A quick Q&A is perhaps the best way for me to explain why star schema design is a powerful skill in a datasmith’s toolset.

First off, what’s a Star Schema?

A Star Schema (also know as the dimensional model) is a denormalised (flattened) data model used to simplify an operational (OLTP) data model to better accommodate reporting and what-if analysis.

At its simplest, it consists of a central fact table with links back to a “surrounding” set of dimensional tables, hence the star name. A variation is the snow-flake schema, where the dimensional tables are not fully denormalised (e.g. Product Category->Product->Fact instead of  Product->Fact).

The role of the fact table (besides being the table that hosts most of the measure fields) is to create linkages between dimensions (such as Customer, Product, Date) usually based on an actual transactional event (e.g. Invoice) or a proposed event (such as a Budget or Forecasted Sale). In effect, simplifying  the often complex work-flow-driving connections of a typical operational system by using a single many-to-many relationship (modern ERP/CRM systems’ data models consist of scores of configurable many-to-many relationships).

Many wrongly believe the star-schema was adopted for performance reasons and now that in-memory OLAP is becoming the norm it’s no longer necessary to use dimensional modelling techniques. In fact, in the early days of data-warehousing, RDBMs had great difficulty efficiently handling star-queries (and some such as MySQL and SQLite, still do).

The original primary purpose of the star schema was to simplify the SQL required to access reporting data; to make the model more approachable to non-technical users. Of course, even simple SQL was beyond the knowledge or interest of most end-users but a sizeable proportion were happy to do so (often helped by SQL “generators” such as MicroStratery or Business Objects). But even in situations where SQL-wielding civilians were not to be found, the simplicity of the dimensional models  proved to be a valuable aid when establishing and developing the warehouse data requirements. PowerPivot requires no SQL knowledge to manipulate the dimensional model which brings the original concept full-circle but this time opening its possibilities to a much wider audience.

But surely, concentrating on the actual reports would be a more valuable requirements gathering exercise?

A so called “bottom-up approach” is often the best way to approach a reporting request particularly if the reports are simple one-off “traditional” reports. But for self-service BI, this needs to be combined with a top-down dimensional design. The idea is not to build out each and every report or indeed cube but to build a structure that’ll support likely queries. The process of building a star schema provides both a logical model and a physical implementation of that model against which potential queries can be tested. I’ve worked on several POCs destined for implementation in Essbase where the star-schema was built and potential cubes mocked up using Excel PivotTables that subsequently never went any further (except for the star-schema ETL process). The end-users derived sufficient value from the denormalised star-schema pivoted and reported in Excel.

In traditional ROLAP data-warehouses where the cubes were built directly against star-schemas, the pure logical approach to the data model often had to take a back-seat to the necessity of fine-tuning it to make response times (be that ETL or user-pivoting) acceptable. This is why I much preferred situations where the star acted as a logical model from which MOLAP cubes were built.

With PowerPivot, ROLAP has a new champion. The column-oriented high-compression in-memory architecture means that the compromises of the past are no longer necessary. The fact table reverts back to it primary role as a many-to-many connector. In a pure hypercube, measures are just another dimension (the approach that Palo takes), this is also now true of PowerPivot models; measures can be sourced from dimension tables and dimensions from fact tables as it logically should be, but without the performance hit of old.

But what’s the advantage of a star schema over a flattened table when using PowerPivot?

It is true that the same flattened table model as used to backend a PivotTable can be used within PowerPivot. But doing so would limit the potential of the DAX language to construct measures such as average sales spread over potential customers (rather than actual customers that would typically be represented on a flattened table). Also, by creating “conformed dimensions” (single cross-business views of Customer, Product etc.) and using such tables as dimensional sources for multiple fact tables, “virtual cubes” that combine values from multiple fact tables can be built.

If you’re new to dimensional modelling I’d recommend the books & articles of Ralph Kimbal as good starting point. You do have to be aware that some of the advice regarding efficiency trade-offs, surrogate keys etc. do not  apply in a PowerPivot scenario (even though other performance issues still apply) but the logical design tips still apply.

Star Schemas: to explore strange new conformed dimensions, to seek out new measures, to boldly go where no Excel spreadsheet has gone before.

 


TAG Cubes – SQLite Star Query Part III

It’s no secret that I’m a huge fan of SQLite and Excel, particularly when used in combination. I also greatly admire the open source BI engines, Palo and Mondrian. Mondrian appeals because of its “ROLAP with a cache” architecture and its implementation of MS’s excellent MDX language. When I say MDX is excellent I’m talking with my professional programmer’s hat on, as an end-user tool it’s a non-runner. This is where Palo comes in, building on the hypercube concepts pioneered by the likes of  TM1 and ESSbase, it presents a designview that’s approachable by a vastly greater percentage of “civilians” than is the case with ROLAP-based solutions.

The trick behind TM1, Essbase, Palo etc. is the extension of the spreadsheet metaphor from two to multiple dimensions, while still binding the interface closely to the familiar spreadsheet (which for most of the business world is still Excel).

So where does SQLite come in all this?

At first glance, SQLite lacks the sophisticated join functionality to support star-queries, but of course, if the dataset is small then a full-table scan of a fact table, or better still, loading the fact table into memory negates any such short-comings.

In fact, all traditional ROLAP engines have problems with dimensional models, particularly when you reach the point of using summary tables or query re-writes, that’s why the emerging SQL-speaking columnar-databases are such a godsend for ROLAP data warehouses.

It was SQLite combined with Excel acting as a data prep platform that was originally my main interest, so for pivoting, Excel’s own pivot table would have to do.  Nevertheless, I felt the tool was incomplete without the ability to directly pivot the underlying SQLite database.

Why not use Palo or Mondrian as a pivot tool? Well yes, where a fixed permanent “solution” is required then the extra moving parts of either approach would be justified and indeed necessary but that is to miss the essence of what I call datasmithing.

Datasmithing is not data warehousing nor is it the provision of solutions (which, for example, Palo superbly enables in multi-user budgeting situations). Datasmithing, as a skill, is of course part of the process of both, but it’s on the edges, at perhaps the planning or consumption stages.

Datasmiths deal in the unknown, in change, in disaster recovery, in systems’ commissioning, in the never-ending barely-repeatable processes thrown up by daily business life.  For that, the toolset required must be as simple as possible (but no simpler), self-contained, document-oriented, secureable (is that a word?) and easily archived and retrieved. Excel and file-based DBMSs such as MSAccess or SQLite fit the bill nicely, server-based technologies such as DBA controlled database servers or IT installed “solutions”, less so.

Jedox has made Palo relatively easy to install (and likewise, Canada’s SQLPower has made Mondrian setup a painless exercise via their excellent Wabit reporting tool), but, the zero-install, email friendly document approach that spreadsheets are famous (and infamous) for, is preferable in many situations. This is something that Microsoft have recognised in their Gemini add-in for Excel 2010, but Excel 2010 is a not here yet and it’s likely to be five years or more before it’s as common as Excel 2003 is today.

The inclusion of FTS full-text searching with SQLite triggered an ah-ah moment with regards to pivot-enabling SQLite.

The usual method that hypercube-like excel-friendly OLAP tools use to return data is via a UDF like so…

=DATA(“CubeName”,”value1″,value2″,…)

…where valueN represents dimensional elements, so…

=DATA(“SalesCube”,”Beer”,”Profit”,”Jan 09″,”Actual”)

…is the Actual Profit for Beer sales in Jan 09. The dimensional elements act as “tags” to locate a particular value, there is of course much more to tools like Palo; hierarchies, intra-cube rules etc. but in essence most OLAP tools are like www.delicious.com for number crunchers. This method of retrieving data fits well with how people use Excel and not just for pivots, but for embedding OLAP aggregated cells in lists.  For example, a CRM scenario; a Sales Rep makes a list of her ‘best’ (subjective) customers, but needs hard (objective) stats, to be placed alongside the list to convince the boss or to track actuals against expectation.

Dimensional elements as tags; FTS3 virtual tables as fact table indexes; the concept of a TAG Cube was born.

In the above example “Profit” would most likely be described as a measure (Palo, a near pure hypercube does not distinguish between Measure and other Dimensional coordinates). Dimensions, measures and attributes are in reality interchangeable (a Customer ID can act as a dimension or an attribute, but by applying a  Count Distinct to it, it’s a measure) but most OLAP solutions treat “Measure Dimensions” as different, and so do TAG Cubes.

By using the default fact table structure (a single-columned table) and querying using the default measure (which translates to the SUM() of that single value) a ‘pure’ approach can be used. But, ROLAP is tightly bound to the concept of a fact table, and since SQLite is relational, TAG Cubes offer the ability to use a wide fact table approach and I think gains considerably in flexibility by so going.

The above example of using Count Distinct, or the simple creation of calculated measures are examples of this flexibility. Another, is a measure based on SQLite’s concat_group aggregate function to provide a drill-down facility, e.g.

=DATA(“SalesCUBE”,”ROWIDList,”Beer”,”Jan 09″,”Actual”)

…where “ROWIDList” would be setup as concat_group(rowid,’,’) and will return a comma separated list of the underlying fact table ROWIDs.

A major reason for rolling my own pivot engine was to add a concept of “namespaces” and to separate the implementation of these namespaces from the actual pivot.  When a tag (or a predefined hierarchy of tags) is assigned to a cube, it’s also assigned to a namespace, in many cases namespace and cube would be synonymous, but in some cases a more sophisticated approach is required:

  • Multiple cubes sharing the same set of conformed dimensions would be best served by such cubes sharing a common namespace, and so they can.
  • Different consumers of the pivot may require the use of a different language, be that a spoken language or a different ‘business language’ e.g. Manufacturing Product Codes V Consumer Product Names. Again, easily done.
  • Sometimes identifying data can’t be shared with the datasmith or the numerical analyst working on a problem; in such cases being able to replace  the actual namespace with an obfuscated one can be very useful. Or, for added security, the namespace might only be issued to approved  PCs while the tag index and fact table are stored on a shared drive.  Needs some more work to make managing such scenarios secure and easy to use but the structure is there.

As hinted on above, the three elements of a Tag Cube, the namespace, tag index and fact table can be assigned to different databases (i.e. files). Due to the wonders of SQLite’s ATTACH statement and the backup API’s ability to quickly load/unload databases in/out of memory, it’s possible, for example, to load namespace and tag index (i.e. the ‘dimensions’) into a memory database, while a very large (i.e. too big to fit to memory) fact table remains on disk. Fast and cheap SSDs will add further configuration options.

Although most of the TAG Cube functionality is available only within Excel, I’ve built a C based SQLite Virtual Table (cFact) to allow the tag index to used outside xLite. This means that SQLite drivers for ODBC (for use as a Pivot Table source, for example) or JDBC (for use in  SQLPower Wabit perhaps) can efficiently access data models built using xLite.

I had to revert to using C rather than my preferred Python (did I mention that xLite now embeds Python in Excel, no, well it does, Python the newVBA ?), having failed to get around multi-threading issues with callbacks to Python in both the ODBC and JDBC drivers. I’d make a career promise to myself many years ago, not to having anything to do with printers or threads, and I think I’ll stick with it 🙂

TAG Cubes are the latest addition (still WIP actually) to be added to xLite, adding to:

  • VBA coded SQLite SQL functions.
  • Worksheet Functions; call out to a ‘function’ built using Excel formula, passing a parameter list and returning a value.
  • Workbook Functions; like Worksheet Functions, but loading a new Workbook, passing in parameters, passing back a value (or tables) and closing the Workbook when finished.
  • XLiteScript; xLite exposes its functionality via VBA coded UDFs, which can be called like any other formula, but data prep activities often require sequential procedural logic, xLiteScript is a table-oriented scripting mechanism offering basic flow-control logic.
  • pyScript; I embedded Python into xLite to take advantage of Python’s speed in developing Virtual Tables, SQL Functions and extensions to SQLite and to tap in the wonderful world of Python code. I’ve also added the ability to use Python from scripts defined within Excel (to indent, tab to the next cell!).
  • Fast load/unload to/from CSV.
  • Load from any ADO source.
  • Remove xLite formulae and rename and save Workbook, very handy when used via Workbook Functions to mass produce Excel “reports”.
  • Other WIP items are; load from SAP, load/unload to/from Amazon S3, use Palo cubes as TAG Cube “facts”, slot in/out Palo for TAG Cubes, auto-generate Mondrian XML based on TAG Cubes, write-back and splash, Python & VBA TAG Cube “rules”.

I’ve started the process of releasing the beta code here …

Why not join me on Twitter at gobansaor?