… and that’s good. That’s how I like my databases, boring, reliable, consistent, easy to use.
SimpleDB on the other hand is not boring, it’s an exciting new shiny thing that opens up a myriad of new possibilities; but first, I and the rest of the developer community, need to tool up and cast aside some of our cherished database design patterns (oh like, 3rd normal form, strong typing, joins, nothing major) and embrace a slightly different way of thinking, however, as much as I like a challenge, I also like to get things done.
That’s where EnterpriseDB’s new Postgres Plus Cloud Edition comes in, this is an Amazon Ec2/S3 hosted edition of their Oracle compatible PostgreSQL-based product that offers the scalability of SimpleDB but the familiarity of a traditional relational database. The “magic” is supplied by Elastra, who are also offering the same functionality against MySQL and standard PostgreSQL databases.
A Talend ETL job which I had been developing for a client, had been tested against a “normal” EnterpriseDB instance. This ETL job was part of a BI prototype trialling a Postgres Plus Cloud Edition (the new name for EnterpriseDB’s cloud offering) as the back-end database. So, I exported the job as a Java executable, fired up an EC2 instance, copied up the generated JAR files, changed the database’s hostname to that of the Postgres Plus “cloud” database, ran the ETL job and it worked. As I said, boring, nothing to report, it just worked.
Now you may be wondering what’s so special about these Elastra powered databases, surely EC2 is no different from any other Linux virtual machine, why not simply install a standard database? The problem with EC2, and it is a problem to those of us (i.e. practically every IT pro on the planet) who have come to expect highly reliable RAID backed disk storage, is the non-permanence of its disk systems.
When an EC2 instance is powered down or fails, the disk system is wiped!
That, combined with fixed (if generous) disk sizes (160GB, 850GB or 1690GB), means that often a clustered database environment is a necessity, adding considerably to the complexity. It’s this sort of complexity that SimpleDB and Elastra address.
The obvious use-case for both Elastra and SimpleDB is as data stores for OLTP applications but Elastra’s ability to handle S3-backed massive databases means the possibility of using EC2 as a data warehousing platform is also considerably strengthened. Although not obvious at first glance, SimpleDB could also act as an OLAP data store; SimpleDB massively indexed tuples as “sparse dimensions” pointing to S3 objects (SQLite databases?) that hold the fact data combined with dense/”partioning” dimensions (e.g. Time). Possible ? Yes. Fun to do? Yes. A solution that I can apply tomorrow? No, that’s why I’m glad EnterpriseDB and Elastra are delivery such a boring product!
The other big EC2 missing – non-permanent IP addresses – has at last been addressed. EC2 now offers “Elastic IP Addresses”, addresses associated with an account not an instance. If the instance fails or is shut down, the IP address can either be immediately re-assigned to a new instance (no more waiting for Dynamic DNS propagation) or “reserved” for future use at a cost of USD0.01c per hour. Also, the new “multiple locations” facility puts the API changes in place to allow for location selection, hopefully a sign that we here in Europe will have “local” EC2 instances to match our European S3 buckets!
It looks like IBM have invested in EnterpriseDB, possibly as a counter-weight against Sun’s acquisition of MySQL (EnterpriseDB’s targeting of Oracle’s customer base would also be an added benefit!).