Pentaho Data Integration (Kettle) V Talend Benchmark

Pentaho’s Matt Caster has just published a benchmarking exercise comparing Kettle and Talend.  In it he admits he’s not a Talend expert and he advises that people should perform their own benchmarks where possible as requirements differ.  Nevertheless, unlike most other benchmarks we’ve seen on the subject he publishes not just the results but the actual transformation “code” used in the tests. 

For many people these benchmarks are of no real interest as long as the product does what is required within the time and resources available they’re content.  But it would be a mistake to think that benchmarks don’t matter, they do; people have and will make that final decision based on them.  Remember ETL is not life and death, the decision which tool (if any) to go with may not get the level of investigation that the developers behind such products expect of their potential clientele and this is particularly true of open source.  Busy people will use such reports to direct them down a path or to confirm their existing prejudices. So I’m really glad to see Matt responding and in particular, responding in the manner he has.

Databases vendors have for years played the benchmarking game, setting and breaking records either via real technological advances or simply gaming the process.  We as purchasers and users knew in many cases to take the results with a large dose of salt, but purchasing decisions where nevertheless made on the backs of these surveys.

Why not join me on Twitter at gobansaor?

Advertisements

6 responses to “Pentaho Data Integration (Kettle) V Talend Benchmark

  1. FYI, I’m not apposed at all to put the source files and transformations in an open source benchmarking project.

    Remember that the enemy here is not Talend nor Pentaho. We are still fighting the proprietary ETL tools out there. If benchmarking can improve the situation for all open source tools involved, then that’s good for all, customers included. If we can expose the used transformations, jobs, mappings, source files, etc, then that strengthens us all.

    Any other approach is IMHO counter productive and that includes making blanket statements concerning things you know very little about… if you know what I mean.

    By the way, Nick is kinda biased. He has seen Kettle process massive amounts of data on clustered SAN systems. As such I think his requirements for “time and resources” are a bit different from the average user. 🙂

    Matt

  2. Matt,

    I, like Nick, but from the other end of the scale (micro-ETL), care little of marginal differences between tools. And even when a product under-performs for me, throwing more hardware (or more time) at the problem usually solves it.

    As for people making blanket statements about things they know little about; end users and purchasers of IT products do it all the time, that is why they use benchmarks (biased or otherwise) and reports (Gartner comes to mind) to help them pick and choose between offerings.

    Benchmarking is not a scientific discipline, it’s a marketing one.

    Tom

  3. Pingback: Alex的个人Blog » 再谈Kettle 性能问题

  4. First, I have to confess I am responsible for CloverETL (www.cloveretl.org) tool – competitor of both Talend & Pentaho.
    We have just recently conducted quite comprehensive performance test of our tool – Clover plus Talend and Pentaho (can be download from: http://www.cloveretl.org/_upload/clover-etl/Comparison%20CloverETL%20vs%20Talend%20and%20Pentaho.pdf)

    The reason for conducting this test was simple one – we wanted to know where do we stand. We took the TPC-H Q1&Q3 tests and implemented it using the aforementioned tools. We used the TPC dbgen utility to generate 1GB and 10GB of data.
    The reasons we obtained are quite interesting and I would like anyone experienced with Talend or Kettle to comment on them. All tools were able to cope with the 1GB data set. When it came to 10GB, Talend failed completely and Kettle just took too long – approx 3x times slower then Clover.

    We had a chance to do the same tests with Informatica and DataStage. They both scored quite well, actually were faster than our tool, but not so much.

    David.

  5. Pingback: 16.4. Comparativa ETL Talend vs Pentaho Data Integration (Kettle). « El Rincon del BI

  6. Pingback: Comparing Talend Open Studio and Pentaho Data Integration (Kettle). « El Rincon del BI