• Back after a hiatus

    Posted on November 3rd, 2009 biexplorer No comments

    Am back! Its been a long time since I wrote something technical. Currently am focusing on the Business Objects suite and its integration with SAP suite. Also, in the near future, I expect to do a bit of research on the open source BI / DW tools like Talend, Pentaho Kettle, etc. Stay tuned!

  • Introduction to SAP CRM

    Posted on April 10th, 2009 biexplorer No comments

    Here are a few links that give a very brief introduction to SAP CRM.

  • Future of BODI / BODS?

    Posted on February 4th, 2009 biexplorer No comments

    What do you think is the future of BODI / BODS?

    I have almost stopped working on BODS. Well, almost… except for a fix here or there… once in a while. Most of my focus is now on SAP BI.  

    But I have a soft side towards BODI/BODS. It is a tool that I know very well. It is also a tool that has a lot of potential, but is underestimated a lot. It has undergone a sea of changes since the ACTA days. And it equates well with the Informatica’s and IBM Information Server’s of the world.

    Personally, I feel that this tool has a good future. It has shaped up pretty well, has added more functionality, and integrates well with SAP R/3, but more importantly is non-SAP in focus. It should do well in the next few years.

    Okay, here is an interesting discussion on the same. And Werner’s comments are promising. Need to keep a tab on the developments.

  • Gartner BI Summit summary

    Posted on February 3rd, 2009 biexplorer No comments

    Please visit this site for an excellent summary on the recent Gartner BI Summit.

  • ROOSOURCE

    Posted on December 8th, 2008 biexplorer No comments

    The table ROOSOURCE has information on the datasources. You can view this by issuing a SE16  (or SE11) and looking up ROOSOURCE. (I  am new to the world of SAP BI. I am posting my learning here.)

    Typically, it holds the name of the Datasource, its type (attribute, text or hierarchy), the extract method, extract structure, etc.

    Instead of me writing about something that I am not good at (yet :-)..), let me point you to a link that explains ROOSOURCE. Please see the explanation here.

  • SAP BI training notes

    Posted on October 21st, 2008 biexplorer No comments

    Recently, I attended a 5 day training programme on SAP BI 7.0 (It was good).  Thought I will share my training notes here.

    I am not sure if these notes will help you any. This is mainly for my own use. In the process, if it helps you, I will be happy.

    And also, I would be happy to receive your comments on it too.

    Read the rest of this entry »

  • Logic behind implementing SCD 2

    Posted on October 7th, 2008 biexplorer No comments

    Lets talk about the logic behind SCD type 2 today.

    We know that SCD 2 is about preserving all the changes in the dimension records. Let us see the logic behind how we can implement it.

    NOTE: The steps below assume SCD 2 having a Begin_date and End_date and a Current_Flag column

    1. Check if the incoming row is already present in the target table (dimension) using the source primary key
    2. If it doesn’t exist in the target dimension
      1. Generate a surrogate key
      2. Enter source record’s date as the Begin_date
      3. Enter the default end date (which could be 31/12/2099) as the End_date
      4. If you have a Current_flag column, set it as ‘Y’ or ‘1′ (or whatever you want)
      5. Insert into the dimension
    3. If the row exists in the target
      1. Check if the incoming and target current record are different (at least for one chosen attribute)
      2. If they are same, do nothing
      3. If they are different, do the following
        1. For the record in the target table, change (update) the End_date to source record’s date and set the Current_flag to ‘N’ or ‘0′ or whatever
        2. Take the incoming record, generate a surrogate key, enter source record’s date as the Begin_date and the default date as the End_date. Also set the Current_flag to ‘Y’ or ‘1′. Insert into the dimension

    NOTE: The End_date of the previous record and the Begin_date of the current record are assumed to be the same. But some people prefer them to be different dates ie) End_date is 1 day lesser than the next records Current_date.

  • Staging area: Necessary or overhead?

    Posted on September 23rd, 2008 biexplorer No comments

    In this article, let us see what a staging area is, its types and the reason to have one in your data warehouse.

    Ok, what is a stage area?

    It is that part of a data warehouse where data is stored physically (in database or in files), but as an intermediate step before loading the target data warehouse / data marts. It is where activities like cleansing, de-duplication, etc take place. It is like a pit stop for a racing car before reaching the destination.

    Some characteristics of the staging area are

    1. accessible to and owned by ETL / DW team
    2. OLAP / reporting teams do not have access to it
    3. indexed very little
    4. ETL developers are usually free to create / drop tables, controlled though (by the architect or modeling team)

    Types of staging areas:

    1. Persistent staging - stage data is not deleted, if you want to maintain history.
    2. Transient staging - stage data is deleted after each ETL load

    Most data warehouses have one or more staging areas, the types being either persistent or transient or both.

    But should you really have a stage area? Can’t you do without it? After all these days, ETL tools are more capable of handling more data in memory fully.

    Is staging necessary or is it an overhead?

    Read the rest of this entry »

  • ETL effort estimation: Points to factor-in

    Posted on September 23rd, 2008 biexplorer No comments

    Estimation of ETL effort is not always fun (as with any estimation).

    There are several ways to estimate the effort needed to complete an ETL job. Work Breakdown Structure (WBS) is popular. And so is Function Point Analysis (FPA).

    But the most widely used is the one that factors in complexity based on the understanding of things like source, target, resources on project, etc.

    Though I haven’t really seen anyone use this method to perfection, it is a good place to start with. Some people argue against this method, but I see this as a complementary option to whatever method you have.

    So, here is a list of points that I think would be useful when you do any ETL effort estimation. I have grouped it under 5 heads: Source, target, transformations, resources, other.

    Source based:

    1. No of different sources & types
    2. Incremental extraction needs
    3. Profiling of data sources
    4. Cleansing / de-duplication dirty data sources
    5. Availability of documentation / transition of knowledge of source data
    6. Access control & management, if needed
    7. Data volumes for unit testing

    Read the rest of this entry »

  • Is open source database a viable solution?

    Posted on August 8th, 2008 biexplorer No comments

    Please read the article at this link.

    What do you think? Is open source database a viable solution?

    The link says (as quoted by Forrester) that the market share for

    • Open source database is at $850 million
    • Commercial databases is $ 16 billion

    Read the rest of this entry »