• ETL effort estimation: Points to factor-in

    Posted on September 23rd, 2008 biexplorer No comments

    Estimation of ETL effort is not always fun (as with any estimation).

    There are several ways to estimate the effort needed to complete an ETL job. Work Breakdown Structure (WBS) is popular. And so is Function Point Analysis (FPA).

    But the most widely used is the one that factors in complexity based on the understanding of things like source, target, resources on project, etc.

    Though I haven’t really seen anyone use this method to perfection, it is a good place to start with. Some people argue against this method, but I see this as a complementary option to whatever method you have.

    So, here is a list of points that I think would be useful when you do any ETL effort estimation. I have grouped it under 5 heads: Source, target, transformations, resources, other.

    Source based:

    1. No of different sources & types
    2. Incremental extraction needs
    3. Profiling of data sources
    4. Cleansing / de-duplication dirty data sources
    5. Availability of documentation / transition of knowledge of source data
    6. Access control & management, if needed
    7. Data volumes for unit testing

    Stage / Target based:

    1. Proper database design available (primary keys defined, right columns indexed, etc)
    2. Familiarity of the team with the various source, staging and target systems
    3. No of different target types
    4. Truncate / append / merge options
    5. Fact / dimension / stage load

    Transformation / features / mechanisms based:

    1. No of transformations and their complexity
    2. Usage of reusability feature
    3. Usage of features like CDC, SCD, etc
    4. Need for error / failure handling and recovery mechanisms
    5. Auditing and validation needs of ETL
    6. Complex functions / calculations for measures
    7. Parallelism / DOP / etc

    Resource based :

    1. Which is the tool used? What is the skill level of the resources (fresh, medium skilled, expert)?
    2. Availability of the resources (part time, full time, etc)

    Other :

    1. Are there any people, task or decision dependencies?
    2. Unit / system & integration testing needs
    3. Migration / deployment needs
    4. Performance and tuning needs
    5. Project management / status reporting needs
    6. How clear is the requirement? Is scope properly defined?
    7. How well is the project being managed?
    8. Have you factored in time for rework?

    Few interesting links on this topic :
    1. Vincent McBurney’s article on estimating ETL development time
    2. FPA
    3. FPA based estimation for a Data warehouse

    Leave a reply

    You must be logged in to post a comment.