Exploring Business Intelligence
RSS icon Email icon Home icon
  • Home
  • About
  • Links & resources
  • Favourite articles
  • Recommended books
  • Contact
  • Disclaimer
Gear

Welcome to my technical blog!

This is where I jot down my learning notes on various topics like BI, DW, ETL, database, SAP BI, SAP CRM, etc.

  • Future of BODI / BODS?

    Posted on February 4th, 2009 biexplorer No comments

    What do you think is the future of BODI / BODS?

    I have almost stopped working on BODS. Well, almost… except for a fix here or there… once in a while. Most of my focus is now on SAP BI.  

    But I have a soft side towards BODI/BODS. It is a tool that I know very well. It is also a tool that has a lot of potential, but is underestimated a lot. It has undergone a sea of changes since the ACTA days. And it equates well with the Informatica’s and IBM Information Server’s of the world.

    Personally, I feel that this tool has a good future. It has shaped up pretty well, has added more functionality, and integrates well with SAP R/3, but more importantly is non-SAP in focus. It should do well in the next few years.

    Okay, here is an interesting discussion on the same. And Werner’s comments are promising. Need to keep a tab on the developments.

    BODI, BODS, Business objects, Data Integration, Data Quality, Data warehouse, EIM, ETL, SAP BODI, BODS, Business objects, Data Integration, Data Quality, DWH, EIM, ETL, SAP, SAP-BO
  • Logic behind implementing SCD 2

    Posted on October 7th, 2008 biexplorer No comments

    Lets talk about the logic behind SCD type 2 today.

    We know that SCD 2 is about preserving all the changes in the dimension records. Let us see the logic behind how we can implement it.

    NOTE: The steps below assume SCD 2 having a Begin_date and End_date and a Current_Flag column

    1. Check if the incoming row is already present in the target table (dimension) using the source primary key
    2. If it doesn’t exist in the target dimension
      1. Generate a surrogate key
      2. Enter source record’s date as the Begin_date
      3. Enter the default end date (which could be 31/12/2099) as the End_date
      4. If you have a Current_flag column, set it as ‘Y’ or ’1′ (or whatever you want)
      5. Insert into the dimension
    3. If the row exists in the target
      1. Check if the incoming and target current record are different (at least for one chosen attribute)
      2. If they are same, do nothing
      3. If they are different, do the following
        1. For the record in the target table, change (update) the End_date to source record’s date and set the Current_flag to ‘N’ or ’0′ or whatever
        2. Take the incoming record, generate a surrogate key, enter source record’s date as the Begin_date and the default date as the End_date. Also set the Current_flag to ‘Y’ or ’1′. Insert into the dimension

    NOTE: The End_date of the previous record and the Begin_date of the current record are assumed to be the same. But some people prefer them to be different dates ie) End_date is 1 day lesser than the next records Current_date.

    Data Integration, Data Modelling, Data warehouse, Dimensional modelling, EIM, ETL Data Integration, Data Modelling, Dimensional modelling, DWH, EIM, ETL, SCD
  • Staging area: Necessary or overhead?

    Posted on September 23rd, 2008 biexplorer No comments

    In this article, let us see what a staging area is, its types and the reason to have one in your data warehouse.

    Ok, what is a stage area?

    It is that part of a data warehouse where data is stored physically (in database or in files), but as an intermediate step before loading the target data warehouse / data marts. It is where activities like cleansing, de-duplication, etc take place. It is like a pit stop for a racing car before reaching the destination.

    Some characteristics of the staging area are

    1. accessible to and owned by ETL / DW team
    2. OLAP / reporting teams do not have access to it
    3. indexed very little
    4. ETL developers are usually free to create / drop tables, controlled though (by the architect or modeling team)

    Types of staging areas:

    1. Persistent staging – stage data is not deleted, if you want to maintain history.
    2. Transient staging – stage data is deleted after each ETL load

    Most data warehouses have one or more staging areas, the types being either persistent or transient or both.

    But should you really have a stage area? Can’t you do without it? After all these days, ETL tools are more capable of handling more data in memory fully.

    Is staging necessary or is it an overhead?

    Read the rest of this entry »

    Data Integration, Data Modelling, Data warehouse, Dimensional modelling, EIM, ETL Data Integration, Data Modelling, Dimensional modelling, DWH, EIM, ETL, Staging
  • ETL effort estimation: Points to factor-in

    Posted on September 23rd, 2008 biexplorer No comments

    Estimation of ETL effort is not always fun (as with any estimation).

    There are several ways to estimate the effort needed to complete an ETL job. Work Breakdown Structure (WBS) is popular. And so is Function Point Analysis (FPA).

    But the most widely used is the one that factors in complexity based on the understanding of things like source, target, resources on project, etc.

    Though I haven’t really seen anyone use this method to perfection, it is a good place to start with. Some people argue against this method, but I see this as a complementary option to whatever method you have.

    So, here is a list of points that I think would be useful when you do any ETL effort estimation. I have grouped it under 5 heads: Source, target, transformations, resources, other.

    Source based:

    1. No of different sources & types
    2. Incremental extraction needs
    3. Profiling of data sources
    4. Cleansing / de-duplication dirty data sources
    5. Availability of documentation / transition of knowledge of source data
    6. Access control & management, if needed
    7. Data volumes for unit testing

    Read the rest of this entry »

    Data Integration, Data warehouse, EIM, ETL Data Integration, DWH, Effort estimation, EIM, Estimation, ETL
  • Decfloat datatype not supported in BODI / BODS

    Posted on July 12th, 2008 biexplorer No comments

    Recently I was working with a few DB2 UDB 9.5 tables and discovered that decfloat datatype columns were not recognized even though I had checked the option to recognize unsupported datatypes as varchar.

    Read the rest of this entry »

    BODI, BODS, Business objects, DB2, Data Integration, Data Modelling, Data warehouse, Database, Dimensional modelling, EIM, ETL, SAP BODI, BODS, Business objects, Data Integration, Data Modelling, Database, DB2, Dimensional modelling, DWH, EIM, ETL, SAP-BO
  • Why surrogate key?

    Posted on July 4th, 2008 biexplorer No comments

    Why should we use surrogate keys in a data warehouse?

    Here is why.

    1. To separate the DWH from the operational environment
    Think of this. You have a dimension loading from transactional tables. All of a sudden, the OLTP people decide to re-use all closed / inactive account id’s. For them, it wont matter a bit. For the DWH team, it will.
    Read the rest of this entry »

    Data Integration, Data Modelling, Data warehouse, Dimensional modelling, EIM, ETL Data Integration, Data Modelling, Dimensional modelling, DWH, EIM, ETL
  • Migration of a DW solution from Oracle to DB2

    Posted on April 26th, 2008 biexplorer No comments

    Recently we migrated an entire DW module from Oracle to DB2. And our ETL was done using Business Objects Data Integrator.

    Below is a sequence of steps that we followed to migrate the entire solution (including data structures, ETL code and universe)

    Assumption:
    –>All estimates done for 4 DB2 databases, 3 federated systems, 62 tables, 33 indexes, 6 DI jobs, 110 data flows, 151 transformations, 16 SQL transforms, 34 SQL() scripts, 220 objects in 23 classes and 2 reports.
    –>Add 20% time as buffer for unexpected issues.
    –>Some tasks can be executed in parallel, while some have strong dependency on completion of previous tasks.

    Work breakdown of migration and rough estimation:

    1. Initiation & planning : 40 Hrs
    Establish objectives and goals
    Requirements gathering
    Establish scope
    Plan resources
    Identify development environment

    2. Assessment : 40 Hrs
    Assess tool and technology needs
    Assess and understand technical requirements
    Establish technical guidelines
    Finalize tools and development environment

    Read the rest of this entry »

    BODI, BODS, Business Intelligence, DB2, Data Integration, Data Modelling, Data warehouse, Database, Dimensional modelling, EIM, ETL, Metadata, Oracle, SAP BODI, BODS, Business objects, Data Integration, Data Modelling, Data Quality, Database, DB2, Dimensional modelling, DWH, Effort estimation, EIM, ETL, Metadata, Oracle, SAP-BO
  • BO Data Services XI 3.0

    Posted on March 18th, 2008 biexplorer No comments

    BO Data Services is a new offering from Business Objects.

    Well, it looks like a new bundling and not a new product by itself.

    BO Data quality + BO Data Integrator = BO Data Services.

    I tried to gather some information regarding this. Here is what I have found so far.

    Read the rest of this entry »

    BI News, BODI, BODS, Business objects, Data Integration, Data Quality, Data warehouse, EIM, ETL BODI, BODS, Business objects, Data Integration, Data Quality, DWH, EIM, ETL, News, SAP-BO
  • Times.. they are changing.

    Posted on March 3rd, 2008 biexplorer No comments

    Looks like traditional data warehousing will soon be taken over by a sea of changes.

    (Click on underlined words to open the links)

    Integration of OLTP with DW? Read about it here.

    Here is a discussion on new approaches in DW.

    If you want to know Werner’s opinion on the future of Business Objects Data Integrator (BODI), see here.

    Appears like BO Data quality + BO Data Integrator = BO Data Services. A discussion on it here.

    And is Informatica the next acquisition target? Some say maybe… some say no.

    BI News, BODI, BODS, Business Intelligence, Business objects, Data Integration, Data Modelling, Data Quality, Data warehouse, Dimensional modelling, EIM, ETL, Informatica, SAP BODI, BODS, Business objects, Data Integration, Data Modelling, Data Quality, Dimensional modelling, DWH, EIM, ETL, Informatica, News, OLTP, SAP-BO
  • Semi-additive facts

    Posted on May 31st, 2007 biexplorer No comments

    What are semi-additive facts?

    Well, facts that can be aggregated across a few dimensions, but not all.

    Take the case of an inventory fact table.

    We track inventory every day for every product at every warehouse.

    To get the inventory on a given day for all products, I aggregate the inventory fact measures across the product dimension filtered for that particular day. The following query will give the inventory for every product for today.

    Select product_dim.name, sum(Inv_fact.inventory_qty)
    from
    product_dim, inv_fact, date_dim
    where
    date_dim.date_key= inv_fact.date_key and
    date_dim.calendar_dt=sysdate and
    product_dim.product_key = inv_fact.product_key
    group by product_dim

    However, I cannot add all the inventory quantities every day for a product, say for a month and get anything meaningful. Aggregation across time is meaningless. (of course, you can use it to calculate the average inventory per day)

    Such facts that can be aggregated across certain dimensions but not across all dimensions are called Semi-additive or partial facts.

    Business Intelligence, Data Integration, Data Modelling, Data warehouse, Dimensional modelling, EIM, ETL, OLAP BI, Data Integration, Data Modelling, Dimensional modelling, DWH, EIM, ETL, OLAP
  • « Older Entries

Search

Tags

BI BODI BODS Business objects Database Data Integration Data Modelling Data Quality DB2 Dimensional modelling DWH Effort estimation EIM Estimation ETL Events Gartner Informatica Metadata News OLAP OLTP Open Source Oracle Performance Tuning SAP SAP-BO SAP BI SAP CRM SCD SQL Staging Training
Copyright © 2006-9 Prem Sagar. All Rights Reserved.
Powered by WordPress. Designed by My Mobiles