In Memory OLAP

The consolidation within the BI market continues, this time with the purchase of Applix by Cognos. As Timo Elliott points out, the interesting bit is the Applix TM1 memory-centric OLAP product. For the vast majority of OLAP users (i.e. the millions of Excel Pivot table jockeys) in-memory OLAP is nothing new, but traditionally big-ticket OLAP was disk-centric, be that ROLAP star-schemas or MOLAP products like Essbase. Until recently, in-memory was a non-runner for large datasets as the cost of memory was high and more importantly 32bit machine programs could only map up to 2G (or with a bit of trickery,3G) of the stuff. But with the appearance of cheap 64-bit servers , and even cheaper RAM, the memory-mapping of large datasets is now possible. While the focus of the big guns in BI turns towards 64-bit servers, the cheapness of RAM makes even a 32-bit machine (i.e. the majority of business PCs) useful for in-memory OLAP especially for SMEs or departmental data-marts – 2 or 3 Gigs of MOLAP data is a lot of data.

This is where tools such as the open source PALO MOLAP server come in to play. Although Palo is both 32 and 64 bit enabled, its status as a free open-source tool, its ability to run client-side, and its close integration with Excel, makes it ideal for adoption in those smaller organisations that previously would not have considered OLAP technologies.

But it’s not just MOLAP that can benefit from cheap client-side RAM, loading data into in-memory relational databases such as SQLite‘s :memory: can also be useful, particularly when that memory is shared with Excel. I’ve already experimented with embedding SQLite into Proto; as Proto’s scripting language is VBA I based it on code I had used in the past for integrating SQLite into Excel. I currently use SQLite/Excel as a micro-ETL tool and I’m looking at the possibility of front-ending a SQLite star-schema with Excel, in effect, a micro-BI environment.

Advertisements

9 responses to “In Memory OLAP

  1. I have tried PALO, struggled with it quite a bit, may have just been me. If you have a large-scale working model of it up and running reliably or know of one I would like to see a little presentation of or webex style recording if available please let me know – it a very interesting software offering. Do you really think open source In-Memory OLAP technology is the way to go with the limited support, limited continuous development funding, and typical high frequency of “bugs” or do you recommend going with one of the few corporate In-Memory OLAP vendors that are out there. I ask with a bias of course because I work for a RAM Based / In-Memory OLAP vendor.

    Thanks,

    Jim

  2. Hi Jim,

    Yep, I too struggled a bit with Palo, it didn’t quite gel with my Essbase and ROLAP background. Those who have used TM1 seem to find it very easy to use. That’s a problem with MOLAP in general, no standard way of doing things, no SQL-like standard (MDX anybody?). But I’m now comfortable with it and find it “good enough” for many problem areas.

    As for large datasets, I’m not convinced the current V2.0 could handle complex large models, but Jedox are promising significant improvements in speed and in the capabilities of the rule engine with V2.5. And, I’ve found when Jedox promise something they tend to deliver.

    I’m an IT professional, with a long history (close-on 30 years) working with “big ticket” software; the FUD of “limited support, limited continuous development funding, etc…” no longer works on me. I’ve seen the likes of Linux,PostgreSQL, MySQL et al. develop into outstanding products, I think Palo has that same potential.

    Palo is closer to the MySQL model, in that Jedox offers full support and training to those who need it and is also actively investing the product (e.g. new drill-back capable ETL engine to be added in V2.5).

    Which to go for, closed or open source? Depends on an organisation’s attitude to risk, its skills profile, is it an end-user or an IT services provider? The main thing is to keep an open-mind, any CIO/CTO who approves a major new project without at least reappraising the open/closed/cloud/outORin-sourced “balance” isn’t in my opinion acting in the best interests of their organisation.

    Tom

  3. Pingback: Python the new VBA ? « Gobán Saor

  4. Pingback: Why Larry hates the cloud, and my data trinity. « Gobán Saor

  5. Hi Tom,
    I found your post looking for a clear specifiactoin what in memory OLAP really is all about. I couldn’t find anything online, Wikipedia doesn’t say anything about it. So maybe it would be helpful to discuss what in memory really means.

    – Does in memory is for both, ROLAP & MOLAP? you mentioned SQLite as a in memory RDBMS.

    – Does in memory needs to have write back to be a “real” in memory solution (TM1, Alea, Palo have write back)

    – If a database uses caching, would you say is it already in
    memory? The more memory I give MySQL for caching, the more data will be “in memory”. So If I use Mondrian and MySQL, do I have in memory OLAP?

    Just a few questions to think about, it would like to hear what you think about it.

  6. @Ben,

    All databases and in fact all computing as we know it uses caching, so most every process could be described as in-memory.

    When I talk of in-memory OLAP I’m referring to an OLAP model where the whole “database” is loaded into memory, all reading and writing is applied against the in-memory model (with perhaps a redo-log written to disk).

    Palo is such a tool, ESSBAse (Hyperion) is not, ESSBase of course uses extensive and sophisticated caching, but it was developed before the days of cheap and abundant memory and uses the disk as its operating store. Palo (and Excel. another in-memory “database”) uses disk as a backup and restore facility not as an operational store.

    The ability to write-back (or not) has nothing to do with in-memory “status”, e.g. both Palo and ESSbase allow write-backs.

    SQLite can either run in-memory or in conventional cache-disk mode.

    The big problem facing in-memory OLAP is the ability to scale to handle very large and sparse datasets, ESSBase for example can deal with these while Palo V2.5 cannot (or at least not as well). But Jedox is to release Palo V3.0 next month which promises to address this issue.

    Tom

  7. Hi Tom,

    thanks for your fast answer. It makes sense and helps to get a clear picture of it.

    Ben

    PS: I can see the Jedox logo out of my office 😉

  8. Hi Tom

    Do u have any idea of Qlikview ? How does it tackle with very large (like banks) databases?

  9. @Swati,

    Never used Qlikview, but what I know of it, its main claim to fame is it eliminates the need to pre-build “cubes”, instead lazy-loading data directly from OLTP data structures into internal memory-based structures (a bit like Business Objects operated when I used it last).

    I wouldn’t think it would be suitable for large banking or telco datasets (at least without some dimensional pre-processing and hence eliminating one of its selling points) but as I said never used it so I might be wrong.

    Tom