• Back after a hiatus

    Posted on November 3rd, 2009 biexplorer No comments

    Am back! Its been a long time since I wrote something technical. Currently am focusing on the Business Objects suite and its integration with SAP suite. Also, in the near future, I expect to do a bit of research on the open source BI / DW tools like Talend, Pentaho Kettle, etc. Stay tuned!

  • Gartner BI Summit summary

    Posted on February 3rd, 2009 biexplorer No comments

    Please visit this site for an excellent summary on the recent Gartner BI Summit.

  • Migration of a DW solution from Oracle to DB2

    Posted on April 26th, 2008 biexplorer No comments

    Recently we migrated an entire DW module from Oracle to DB2. And our ETL was done using Business Objects Data Integrator.

    Below is a sequence of steps that we followed to migrate the entire solution (including data structures, ETL code and universe)

    Assumption:
    –>All estimates done for 4 DB2 databases, 3 federated systems, 62 tables, 33 indexes, 6 DI jobs, 110 data flows, 151 transformations, 16 SQL transforms, 34 SQL() scripts, 220 objects in 23 classes and 2 reports.
    –>Add 20% time as buffer for unexpected issues.
    –>Some tasks can be executed in parallel, while some have strong dependency on completion of previous tasks.

    Work breakdown of migration and rough estimation:

    1. Initiation & planning : 40 Hrs
    Establish objectives and goals
    Requirements gathering
    Establish scope
    Plan resources
    Identify development environment

    2. Assessment : 40 Hrs
    Assess tool and technology needs
    Assess and understand technical requirements
    Establish technical guidelines
    Finalize tools and development environment

    Read the rest of this entry »

  • Times.. they are changing.

    Posted on March 3rd, 2008 biexplorer No comments

    Looks like traditional data warehousing will soon be taken over by a sea of changes.

    (Click on underlined words to open the links)

    Integration of OLTP with DW? Read about it here.

    Here is a discussion on new approaches in DW.

    If you want to know Werner’s opinion on the future of Business Objects Data Integrator (BODI), see here.

    Appears like BO Data quality + BO Data Integrator = BO Data Services. A discussion on it here.

    And is Informatica the next acquisition target? Some say maybe… some say no.

  • Yahoo pipes vs ETL

    Posted on March 2nd, 2008 biexplorer No comments

    If you are an ETL developer, you would be handling sources like Oracle, DB2, flat files, etc.

    What if your source was a website? Or a Yahoo search result? And what if you could play around with multiple websites, blogs, flickr, custom searches… mash it up together, edit and transform it to create your own version of what you want to read.

    Welcome to Yahoo Pipes!

    Read the rest of this entry »

  • Semi-additive facts

    Posted on May 31st, 2007 biexplorer No comments

    What are semi-additive facts?

    Well, facts that can be aggregated across a few dimensions, but not all.

    Take the case of an inventory fact table.

    We track inventory every day for every product at every warehouse.

    To get the inventory on a given day for all products, I aggregate the inventory fact measures across the product dimension filtered for that particular day. The following query will give the inventory for every product for today.

    Select product_dim.name, sum(Inv_fact.inventory_qty)
    from
    product_dim, inv_fact, date_dim
    where
    date_dim.date_key= inv_fact.date_key and
    date_dim.calendar_dt=sysdate and
    product_dim.product_key = inv_fact.product_key
    group by product_dim

    However, I cannot add all the inventory quantities every day for a product, say for a month and get anything meaningful. Aggregation across time is meaningless. (of course, you can use it to calculate the average inventory per day)

    Such facts that can be aggregated across certain dimensions but not across all dimensions are called Semi-additive or partial facts.

  • Rapidly changing dimensions

    Posted on May 25th, 2007 biexplorer No comments

    Dimensions can be loaded in different ways. Slowly changing dimension loading is one. Even in this there are
    1. Type 1 - any changes in existing data is updated. Used for correction handling.

    2. Type 2 - change is not updated, but a new record is inserted with change and the old data is kept as such and flagged as old version. For example, Prem is a record in Customer dimension. Today Prem lives in Hyderabad. So the current record in customer dim says
    #1 — Prem — Hyderabad — Male

    Then I move to Bangalore. Now how do I update this? So I say

    # no — Name — Place — Sex — Current version
    #1 — Prem — Hyd — M — No
    #2 — Prem — Blr — M — Yes

    Why should I have the old record?
    There might be a case where you want to know all the sales that was done by customers in Hyderabad last year. And Prem had a purchase in Hyderabad last year. I want this to be tracked. I dont want the sale to point to Prem living in Bangalore, but in Hyderabad.

    This way, if Prem moves 10 times, we will have 10 records in customer each pointing to a different Prem as per his location. This will keep the data accurate.

    3. Type 3
    Here instead of having all the history, we keep selected history data. Like 3 versions. The last 3 location moves of Prem will be tracked, not more!

    Okay, what are rapidly changing dimensions?
    Any dimension whose attributes change too rapidly. For example, credit rating of Prem. Assume that your rating agency rates people every 3 months (assume for example purpose), then every 3 months Prem can have a possibly different rating!

    # no — Name — Place — Sex — Cr rate –Current version
    #1 — Prem — Hyd — M — AAA –No
    #2 — Prem — Blr — M — AA –No
    #2 — Prem — Blr — M — A –No
    #2 — Prem — Blr — M — B –Yes

    So if you have around 3000000 customer records and if they keep changing every quarter as they do here, customer dimension will grow exponentially. That defeats the purpose of a warehouse, effective querying.

    What do we do?
    Remove the rapidly changing attributes and make a new dimension out of it. Pull the Credit rating and make a dimension out of it and call it credit rating behaviour dimension.

    So the customer will not grow exponentially. The rating dimension will. But this will be used only by queries that need rating info. Any query that doesnt need rating info and only other customer info will go to the original customer dimension.

  • What is a data warehouse?

    Posted on May 25th, 2007 biexplorer 1 comment

    What is a datawarehouse?

    In one line, it is a repository of several years of data using which reports can be created for business purposes (to take decisions).

    It is a database specifically modeled and fine-tuned for analysis and decision making.

    Read the rest of this entry »