• Rapidly changing dimensions

    Posted on May 25th, 2007 biexplorer No comments

    Dimensions can be loaded in different ways. Slowly changing dimension loading is one. Even in this there are
    1. Type 1 – any changes in existing data is updated. Used for correction handling.

    2. Type 2 – change is not updated, but a new record is inserted with change and the old data is kept as such and flagged as old version. For example, Prem is a record in Customer dimension. Today Prem lives in Hyderabad. So the current record in customer dim says
    #1 — Prem — Hyderabad — Male

    Then I move to Bangalore. Now how do I update this? So I say

    # no — Name — Place — Sex — Current version
    #1 — Prem — Hyd — M — No
    #2 — Prem — Blr — M — Yes

    Why should I have the old record?
    There might be a case where you want to know all the sales that was done by customers in Hyderabad last year. And Prem had a purchase in Hyderabad last year. I want this to be tracked. I dont want the sale to point to Prem living in Bangalore, but in Hyderabad.

    This way, if Prem moves 10 times, we will have 10 records in customer each pointing to a different Prem as per his location. This will keep the data accurate.

    3. Type 3
    Here instead of having all the history, we keep selected history data. Like 3 versions. The last 3 location moves of Prem will be tracked, not more!

    Okay, what are rapidly changing dimensions?
    Any dimension whose attributes change too rapidly. For example, credit rating of Prem. Assume that your rating agency rates people every 3 months (assume for example purpose), then every 3 months Prem can have a possibly different rating!

    # no — Name — Place — Sex — Cr rate –Current version
    #1 — Prem — Hyd — M — AAA –No
    #2 — Prem — Blr — M — AA –No
    #2 — Prem — Blr — M — A –No
    #2 — Prem — Blr — M — B –Yes

    So if you have around 3000000 customer records and if they keep changing every quarter as they do here, customer dimension will grow exponentially. That defeats the purpose of a warehouse, effective querying.

    What do we do?
    Remove the rapidly changing attributes and make a new dimension out of it. Pull the Credit rating and make a dimension out of it and call it credit rating behaviour dimension.

    So the customer will not grow exponentially. The rating dimension will. But this will be used only by queries that need rating info. Any query that doesnt need rating info and only other customer info will go to the original customer dimension.

  • What is a data warehouse?

    Posted on May 25th, 2007 biexplorer 1 comment

    What is a datawarehouse?

    In one line, it is a repository of several years of data using which reports can be created for business purposes (to take decisions).

    It is a database specifically modeled and fine-tuned for analysis and decision making.

    Read the rest of this entry »