• Is open source database a viable solution?

    Posted on August 8th, 2008 biexplorer No comments

    Please read the article at this link.

    What do you think? Is open source database a viable solution?

    The link says (as quoted by Forrester) that the market share for

    • Open source database is at $850 million
    • Commercial databases is $ 16 billion

    Read the rest of this entry »

  • Tuning: Min and Max in same query

    Posted on July 12th, 2008 biexplorer No comments

    Please read this article.

    Karen explains why having min and max in the same query impacts performance and how to tackle the same.

  • Decfloat datatype not supported in BODI / BODS

    Posted on July 12th, 2008 biexplorer No comments

    Recently I was working with a few DB2 UDB 9.5 tables and discovered that decfloat datatype columns were not recognized even though I had checked the option to recognize unsupported datatypes as varchar.

    Read the rest of this entry »

  • Migration of a DW solution from Oracle to DB2

    Posted on April 26th, 2008 biexplorer No comments

    Recently we migrated an entire DW module from Oracle to DB2. And our ETL was done using Business Objects Data Integrator.

    Below is a sequence of steps that we followed to migrate the entire solution (including data structures, ETL code and universe)

    Assumption:
    –>All estimates done for 4 DB2 databases, 3 federated systems, 62 tables, 33 indexes, 6 DI jobs, 110 data flows, 151 transformations, 16 SQL transforms, 34 SQL() scripts, 220 objects in 23 classes and 2 reports.
    –>Add 20% time as buffer for unexpected issues.
    –>Some tasks can be executed in parallel, while some have strong dependency on completion of previous tasks.

    Work breakdown of migration and rough estimation:

    1. Initiation & planning : 40 Hrs
    Establish objectives and goals
    Requirements gathering
    Establish scope
    Plan resources
    Identify development environment

    2. Assessment : 40 Hrs
    Assess tool and technology needs
    Assess and understand technical requirements
    Establish technical guidelines
    Finalize tools and development environment

    Read the rest of this entry »

  • NULL and NOT IN (Oracle)

    Posted on March 26th, 2008 biexplorer No comments

    Here is a very interesting scenario.

    I have an employee table called EMP.

    1

    And I have a DEPT table.

    2

    Now, I write a query to find out all the employees who are part of any department. (DEPT_ID is properly updated and found in DEPT table)

    3

    Now, I want to know if any employee is not part of any department.

    4

    What??? No results? I know that employee with EMP_ID 9 doesn’t belong to any department. Where did he go?

    Is Oracle drunk? Is it giving me wrong results? Wait let me check with another query.

    Read the rest of this entry »

  • Truncate and delete

    Posted on July 25th, 2007 biexplorer No comments

    Truncate:
    1. Moves the high water mark down to beginning.
    2. Doesnt need commit. It is a DDL command.
    3. No condition needed. Removes all rows.
    4. Extents are deallocated. (If you specify the reuse storage clause, then the extents are not deallocated)
    5. No rollback.

    Delete:
    1. Doesnt change the high water mark.
    2. Needs commit. It’s a DML command.
    3. Can specify a condition and remove specific rows.
    4. Extents are not deallocated.
    5. Changes can be rolled back.

  • Why index doesnt access null values?

    Posted on July 17th, 2007 biexplorer No comments

    Oracle indexes do not contain entries that point to rows with null values on all the indexed columns.

    Therefore any query that has a column IS NULL or IS NOT NULL will not use the index.

  • The maximum size of a fact table.

    Posted on July 11th, 2007 biexplorer No comments

    Well, this is just a workout. It may or may not make much sense!

    Here I have detailed how we can calculate the maximum size of a fact table.

    Assume our datamart has 5 dimensions: Time_dim, customer_dim, product_dim, product_warehouse_dim, supplier_dim. And our fact table has 5 foreign keys, 15 measures, 2 degenerate columns

    Time_dim has 20 years of rows on daily grain: so 365 X 20 = 7300 rows
    There are 4000 customers in customer_dim.
    200 products.
    10 warehouses.
    12 suppliers.

    So the maximum possible fact table rows is
    7300 x 4000 x 200 x 10 x 12 = 7008000000000 (7008000 million or 7008 billion)

    Now this is the maximum no of rows possible in the fact table.(Assuming that every customer brought every product from every store supplied by every supplier on every day…..etc… it goes on … permutations and combinations)

    Assuming that the 22 columns have an average size of 5 bytes, we need 110 bytes per row.

    So now, 110 x 7008 billion = 71794 GB

    The maximum possible size for this fact table (assuming that dimension count stays same) is 71794 GB.

    The maximum possible increase per year could be 3590 GB.

  • Index usage

    Posted on July 5th, 2007 biexplorer No comments

    What do you think? Will an index be used in these cases?

    1. Table returns 2,500,000 (2.5 million) records from a table having 10,000,000 rows (10 million).

    2. You use a value !=’45′ condition in your where clause. ( Value column is indexed)

    3. You say Value=’%ABC’ condition in where clause (value column is indexed)

    4. You say Value=’ABC%’ condition in where clause (value column is indexed)

    5. Where Value IS NULL (value column is indexed)

    6. Where Value IS NOT NULL (value column is indexed)

    In all these cases, do you think the index will be used or not?

    Read the rest of this entry »

  • Oracle analytic functions

    Posted on June 28th, 2007 biexplorer No comments

    I recently purchased O’Reilly’s SQL cookbook. Its good. I came to know about the Oracle Window functions (analytic functions).

    Some examples are
    1. row_number() over()
    2. dense_rank() over()
    3. max() over()
    4. Sum() over()
    5. Lead() over()
    6. Lag() over()

    See here for a detailed explanation.

    Check this example

    Now, you have the trade_val column. But you want a running count of the same. How do you achieve it? See the query below. I have used the sum() over() function to get the sum. But hey! Its wrong. Its giving me the entire total and not the running total.

    Yes, since I didnt specify anything in the over() section, it gave an entire total. So now I say order by stk_nam_key, date_key so that the running count is done as per the original order. See results below.

    But hey! Again there is something different I need. I need the running count for each stk_nam_key and not for the entire table. See in the upper query, I have indicated with red arrow. When the stk_nam_key changed to 2, it gave me the running count. But I want to start fresh for every new grouping.

    So what do I do? I say over(partition by stk_nam_key order by stk_nam_key,date_key). Check the results now.

    Dont get confused by the first value for the new grouping 3. The date_key for both 3′s are same and hence the sum is same for both.