-
ETL effort estimation: Points to factor-in
Posted on September 23rd, 2008 No commentsEstimation of ETL effort is not always fun (as with any estimation).
There are several ways to estimate the effort needed to complete an ETL job. Work Breakdown Structure (WBS) is popular. And so is Function Point Analysis (FPA).
But the most widely used is the one that factors in complexity based on the understanding of things like source, target, resources on project, etc.
Though I haven’t really seen anyone use this method to perfection, it is a good place to start with. Some people argue against this method, but I see this as a complementary option to whatever method you have.
So, here is a list of points that I think would be useful when you do any ETL effort estimation. I have grouped it under 5 heads: Source, target, transformations, resources, other.
Source based:
- No of different sources & types
- Incremental extraction needs
- Profiling of data sources
- Cleansing / de-duplication dirty data sources
- Availability of documentation / transition of knowledge of source data
- Access control & management, if needed
- Data volumes for unit testing
-
Migration of a DW solution from Oracle to DB2
Posted on April 26th, 2008 No commentsRecently we migrated an entire DW module from Oracle to DB2. And our ETL was done using Business Objects Data Integrator.
Below is a sequence of steps that we followed to migrate the entire solution (including data structures, ETL code and universe)
Assumption:
–>All estimates done for 4 DB2 databases, 3 federated systems, 62 tables, 33 indexes, 6 DI jobs, 110 data flows, 151 transformations, 16 SQL transforms, 34 SQL() scripts, 220 objects in 23 classes and 2 reports.
–>Add 20% time as buffer for unexpected issues.
–>Some tasks can be executed in parallel, while some have strong dependency on completion of previous tasks.Work breakdown of migration and rough estimation:
1. Initiation & planning : 40 Hrs
Establish objectives and goals
Requirements gathering
Establish scope
Plan resources
Identify development environment2. Assessment : 40 Hrs
Assess tool and technology needs
Assess and understand technical requirements
Establish technical guidelines
Finalize tools and development environmentBODI, BODS, Business Intelligence, DB2, Data Integration, Data Modelling, Data warehouse, Database, Dimensional modelling, EIM, ETL, Metadata, Oracle, SAP BODI, BODS, Business objects, Data Integration, Data Modelling, Data Quality, Database, DB2, Dimensional modelling, DWH, Effort estimation, EIM, ETL, Metadata, Oracle, SAP-BO -
The maximum size of a fact table.
Posted on July 11th, 2007 No commentsWell, this is just a workout. It may or may not make much sense!
Here I have detailed how we can calculate the maximum size of a fact table.
Assume our datamart has 5 dimensions: Time_dim, customer_dim, product_dim, product_warehouse_dim, supplier_dim. And our fact table has 5 foreign keys, 15 measures, 2 degenerate columns
Time_dim has 20 years of rows on daily grain: so 365 X 20 = 7300 rows
There are 4000 customers in customer_dim.
200 products.
10 warehouses.
12 suppliers.So the maximum possible fact table rows is
7300 x 4000 x 200 x 10 x 12 = 7008000000000 (7008000 million or 7008 billion)Now this is the maximum no of rows possible in the fact table.(Assuming that every customer brought every product from every store supplied by every supplier on every day…..etc… it goes on … permutations and combinations)
Assuming that the 22 columns have an average size of 5 bytes, we need 110 bytes per row.
So now, 110 x 7008 billion = 71794 GB
The maximum possible size for this fact table (assuming that dimension count stays same) is 71794 GB.
The maximum possible increase per year could be 3590 GB.


