<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>Exploring Business Intelligence</title>
	<atom:link href="http://www.premsagar.net/techblog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.premsagar.net/techblog</link>
	<description></description>
	<pubDate>Tue, 10 Nov 2009 18:53:31 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Back after a hiatus</title>
		<link>http://www.premsagar.net/techblog/2009/11/03/back-after-a-hiatus/</link>
		<comments>http://www.premsagar.net/techblog/2009/11/03/back-after-a-hiatus/#comments</comments>
		<pubDate>Mon, 02 Nov 2009 19:01:50 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[Business Intelligence]]></category>

		<category><![CDATA[Business objects]]></category>

		<category><![CDATA[Data warehouse]]></category>

		<category><![CDATA[Open Source]]></category>

		<category><![CDATA[SAP]]></category>

		<category><![CDATA[SAP BI]]></category>

		<guid isPermaLink="false">http://www.premsagar.net/techblog/?p=376</guid>
		<description><![CDATA[Am back! Its been a long time since I wrote something technical. Currently am focusing on the Business Objects suite and its integration with SAP suite. Also, in the near future, I expect to do a bit of research on the open source BI / DW tools like Talend, Pentaho Kettle, etc. Stay tuned!
]]></description>
			<content:encoded><![CDATA[<p>Am back! Its been a long time since I wrote something technical. Currently am focusing on the Business Objects suite and its integration with SAP suite. Also, in the near future, I expect to do a bit of research on the open source BI / DW tools like Talend, Pentaho Kettle, etc. Stay tuned!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.premsagar.net/techblog/2009/11/03/back-after-a-hiatus/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Introduction to SAP CRM</title>
		<link>http://www.premsagar.net/techblog/2009/04/10/introduction-to-sap-crm/</link>
		<comments>http://www.premsagar.net/techblog/2009/04/10/introduction-to-sap-crm/#comments</comments>
		<pubDate>Fri, 10 Apr 2009 09:06:37 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[SAP]]></category>

		<category><![CDATA[SAP CRM]]></category>

		<category><![CDATA[Training]]></category>

		<guid isPermaLink="false">http://www.premsagar.net/techblog/?p=363</guid>
		<description><![CDATA[Here are a few links that give a very brief introduction to SAP CRM.

Introduction to SAP CRM - part 1
Introduction to SAP CRM - part 2
SAP CRM details

]]></description>
			<content:encoded><![CDATA[<p>Here are a few links that give a very brief introduction to SAP CRM.</p>
<ul>
<li><a href="https://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/13201" target="_self">Introduction to SAP CRM - part 1</a></li>
<li><a href="https://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/13702" target="_self">Introduction to SAP CRM - part 2</a></li>
<li><a href="http://www.sap.com/solutions/business-suite/crm/index.epx" target="_self">SAP CRM details</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.premsagar.net/techblog/2009/04/10/introduction-to-sap-crm/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Future of BODI / BODS?</title>
		<link>http://www.premsagar.net/techblog/2009/02/04/future-of-bodi-bods/</link>
		<comments>http://www.premsagar.net/techblog/2009/02/04/future-of-bodi-bods/#comments</comments>
		<pubDate>Wed, 04 Feb 2009 04:27:13 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[BODI]]></category>

		<category><![CDATA[BODS]]></category>

		<category><![CDATA[Business objects]]></category>

		<category><![CDATA[Data Integration]]></category>

		<category><![CDATA[Data Quality]]></category>

		<category><![CDATA[Data warehouse]]></category>

		<category><![CDATA[EIM]]></category>

		<category><![CDATA[ETL]]></category>

		<category><![CDATA[SAP]]></category>

		<category><![CDATA[DWH]]></category>

		<category><![CDATA[SAP-BO]]></category>

		<guid isPermaLink="false">http://biexplorer.wordpress.com/?p=157</guid>
		<description><![CDATA[What do you think is the future of BODI / BODS?
I have almost stopped working on BODS. Well, almost&#8230; except for a fix here or there&#8230; once in a while. Most of my focus is now on SAP BI.  
But I have a soft side towards BODI/BODS. It is a tool that I know very well. It [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>What do you think is the future of BODI / BODS?</p></blockquote>
<p>I have almost stopped working on BODS. Well, almost&#8230; except for a fix here or there&#8230; once in a while. Most of my focus is now on SAP BI.  </p>
<p>But I have a soft side towards BODI/BODS. It is a tool that I know very well. It is also a tool that has a lot of potential, but is underestimated a lot. It has undergone a sea of changes since the ACTA days. And it equates well with the Informatica&#8217;s and IBM Information Server&#8217;s of the world.</p>
<p>Personally, I feel that this tool has a good future. It has shaped up pretty well, has added more functionality, and integrates well with SAP R/3, but more importantly is non-SAP in focus. It should do well in the next few years.</p>
<p>Okay, <a href="http://www.forumtopics.com/busobj/viewtopic.php?t=103139&amp;postdays=0&amp;postorder=asc&amp;start=15" target="_blank">here is an interesting discussion</a> on the same. And Werner&#8217;s comments are promising. Need to keep a tab on the developments.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.premsagar.net/techblog/2009/02/04/future-of-bodi-bods/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Gartner BI Summit summary</title>
		<link>http://www.premsagar.net/techblog/2009/02/03/gartner-bi-summit-summary/</link>
		<comments>http://www.premsagar.net/techblog/2009/02/03/gartner-bi-summit-summary/#comments</comments>
		<pubDate>Tue, 03 Feb 2009 11:32:39 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[Business Intelligence]]></category>

		<category><![CDATA[Data warehouse]]></category>

		<category><![CDATA[Forrester]]></category>

		<category><![CDATA[Gartner]]></category>

		<category><![CDATA[BI]]></category>

		<category><![CDATA[DWH]]></category>

		<category><![CDATA[EIM]]></category>

		<category><![CDATA[ETL]]></category>

		<category><![CDATA[Events]]></category>

		<category><![CDATA[OLAP]]></category>

		<guid isPermaLink="false">http://biexplorer.wordpress.com/?p=154</guid>
		<description><![CDATA[Please visit this site for an excellent summary on the recent Gartner BI Summit.
]]></description>
			<content:encoded><![CDATA[<p>Please visit <a href="http://aristippus303.wordpress.com/2009/01/29/gartner-bi-summit-part-2/" target="_blank">this site</a> for an excellent summary on the recent Gartner BI Summit.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.premsagar.net/techblog/2009/02/03/gartner-bi-summit-summary/feed/</wfw:commentRss>
		</item>
		<item>
		<title>ROOSOURCE</title>
		<link>http://www.premsagar.net/techblog/2008/12/08/roosource/</link>
		<comments>http://www.premsagar.net/techblog/2008/12/08/roosource/#comments</comments>
		<pubDate>Mon, 08 Dec 2008 13:39:49 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[SAP]]></category>

		<category><![CDATA[SAP BI]]></category>

		<guid isPermaLink="false">http://biexplorer.wordpress.com/?p=149</guid>
		<description><![CDATA[The table ROOSOURCE has information on the datasources. You can view this by issuing a SE16  (or SE11) and looking up ROOSOURCE. (I  am new to the world of SAP BI. I am posting my learning here.)
Typically, it holds the name of the Datasource, its type (attribute, text or hierarchy), the extract method, extract structure, [...]]]></description>
			<content:encoded><![CDATA[<p>The table ROOSOURCE has information on the datasources. You can view this by issuing a SE16  (or SE11) and looking up ROOSOURCE. (I  am new to the world of SAP BI. I am posting my learning here.)</p>
<p>Typically, it holds the name of the Datasource, its type (attribute, text or hierarchy), the extract method, extract structure, etc.</p>
<p>Instead of me writing about something that I am not good at (yet :-)..), let me point you to a link that explains ROOSOURCE. Please see the explanation <a href="https://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/2282" target="_blank">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.premsagar.net/techblog/2008/12/08/roosource/feed/</wfw:commentRss>
		</item>
		<item>
		<title>SAP BI training notes</title>
		<link>http://www.premsagar.net/techblog/2008/10/21/sap-bi-training-notes/</link>
		<comments>http://www.premsagar.net/techblog/2008/10/21/sap-bi-training-notes/#comments</comments>
		<pubDate>Tue, 21 Oct 2008 06:58:17 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[Training]]></category>

		<guid isPermaLink="false">http://biexplorer.wordpress.com/?p=103</guid>
		<description><![CDATA[Recently, I attended a 5 day training programme on SAP BI 7.0 (It was good).  Thought I will share my training notes here.
I am not sure if these notes will help you any. This is mainly for my own use. In the process, if it helps you, I will be happy.
And also, I would be [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I attended a 5 day training programme on SAP BI 7.0 (It was good).  Thought I will share my training notes here.</p>
<p>I am not sure if these notes will help you any. This is mainly for my own use. In the process, if it helps you, I will be happy.</p>
<p>And also, I would be happy to receive your comments on it too.</p>
<p><span id="more-103"></span></p>
<p><strong>Morning Day 1</strong></p>
<ol>
<li>BW history ( 2.1 -&gt; 3.0 -&gt; 3.1 -&gt; 3.5 -&gt; BI 7.0 and 7.1 now)</li>
<li>Mainly deployed where R/3 is used</li>
<li>In R/3 database tables are of 2 types : Master and transaction</li>
<li>Infoobjects - characteristic IO is for master data</li>
<li>Infoobjects - Key figure IO is for trans data</li>
<li>RSA 1 - code for Data warehousing workbench</li>
<li>RSA6 - to check where DS is created</li>
<li>RSA3 - extract checker</li>
</ol>
<p><strong>Noon Day 1</strong></p>
<ol>
<li>Architecture overview</li>
<li>PSA is a temp storage area</li>
<li>Loading to PSA is mandatory in 7.0</li>
<li>BEx is for excel based reporting</li>
<li>Web Application Designer - WAD is for web based reporting</li>
<li>Mobile Intelligence is also used but not much</li>
<li>IF R/3 is customized, SAP BI 7.0 also would need customization</li>
<li>SAP provided content is called business content</li>
<li>Activate BI content, to create definitions in the database</li>
<li>Transaction data is stored in infocubes and each field is known as infoobject</li>
<li>Key figure - facts</li>
<li>Characterstics - dimensional attributes</li>
<li>InfoArea is a folder where infoobject or infocube is stored</li>
<li>Infocatalog - stores characteristics or key figures</li>
<li>Steps
<ol>
<li>Create info area</li>
<li>Create info catalog (contains chars and key figures IOs)</li>
<li>Create IO - chars and key figures</li>
</ol>
</li>
<li>Time and unit are mandatory dimensions</li>
</ol>
<p><strong>Morning Day 2</strong></p>
<ol>
<li>To track time, data is stored in T tables. Else in P tables</li>
<li>Master data is NOT in the dimension table</li>
<li>SID links characteristic to the dimension table</li>
<li>Modelling techniques
<ol>
<li>ERM</li>
<li>MDM - multi dim model</li>
</ol>
</li>
<li>By default 3 dimensions are created in SAP BI - Time, unit and data package</li>
<li>Maximum 16 dimensions including 3 above. (net it is 13)</li>
<li>Max 248 chars in one dimension</li>
</ol>
<p><strong>Noon Day 2</strong></p>
<ol>
<li>P is only attributes except name</li>
<li>H table is for hierarchy</li>
<li>T table is for text</li>
<li>Mapping is done in a transformation</li>
<li>Infopackage - pulls data from a file and into PSA</li>
<li>DTP - extract from PSA, transform and then load to Infoobjects and Infoproviders</li>
<li>Flat file support is only for CSV &amp; ASCII</li>
<li>Steps in loading a cube
<ol>
<li>Create info area</li>
<li>Create info catalog</li>
<li>Create info object - characters</li>
<li>Create info object - key figures</li>
<li>Activate them all</li>
<li>Create the Datasource</li>
<li>Specify source and edit / copy the fields (Ctrl Y for copy)</li>
<li>Create infopackage to load data into PSA</li>
<li>Go to info provider and select &#8220;Insert Characteristic as infoprovider&#8221;</li>
<li>Choose the characteristic given in the Infoobject</li>
<li>Create transformation</li>
<li>Create DTP</li>
<li>Execute Infopackage and DTP job requests</li>
</ol>
<p><a href="http://biexplorer.files.wordpress.com/2008/10/screen1.jpg"><img class="aligncenter size-medium wp-image-126" title="screen1" src="http://biexplorer.files.wordpress.com/2008/10/screen1.jpg?w=300" alt="" width="300" height="151" /></a></li>
</ol>
<p>The flow of data into Infocube</p>
<p><a href="http://biexplorer.files.wordpress.com/2008/10/infocube-flow.jpg"><img class="aligncenter size-full wp-image-124" title="infocube-flow" src="http://biexplorer.files.wordpress.com/2008/10/infocube-flow.jpg" alt="" width="153" height="256" /></a></p>
<p><strong>Morning Day 3</strong></p>
<ol>
<li>R/3 datasource has 2 structures - extract and transfer</li>
<li>Extract structure - need to specify the source. It will show all the fields in the source</li>
<li>Transfer structure -</li>
<li>Length of R/3 sources - 18 minimum</li>
<li>RSA6 - to check where DS is created</li>
<li>RSA3 - extract checker</li>
<li>Datasources of 3.X are different from 7.0</li>
<li>Steps loading from R/3
<ol>
<li>Create infoobject</li>
<li>In Info area, insert it as char</li>
<li>Use RSO2 to create DS</li>
<li>Select trans, master or text option and give it a name (like DS_MAT_ATR)</li>
<li>Select the application component (R/3 -&gt; MM -&gt; MMIO)</li>
<li>Enter short, medium and long description</li>
<li>Enter table name (like MARA)</li>
<li>Click on save</li>
<li>From the window that pops up, select local object</li>
<li>Select the needed fields and hide unwanted and click on save</li>
<li>Datasource is created</li>
<li>Go to main window and hit RSA1</li>
<li>Go to Datasources tab and under right infoarea (like Material Management MM-IO), right click and select replicate metadata</li>
<li>Select as Datasource (and not 3.x option)</li>
<li>Select background</li>
<li>You will get the DS here</li>
<li>Make changes to the DS if needed and activate</li>
<li>Create infopackage and make necessary changes</li>
<li>Start job</li>
</ol>
</li>
</ol>
<p><strong>Noon Day 3</strong></p>
<ol>
<li>DTP is available in 7.0 onwards only.</li>
<li>Update rules is the name for transformations in 3.5</li>
<li>RSA5- to see SAP predefined datasources from BUsiness content</li>
<li>In standard installation, first go to RSA1 - &gt; BI Content, select cube and say In data flow before and then hit RSA5 and activate all required datasources</li>
<li>DSO - data store object</li>
<li>ODS in 3.X is now known as DSO</li>
</ol>
<p><strong>Morning Day 4</strong></p>
<ol>
<li>ODS types - standard and transactional</li>
<li>DSO types - standard, write optimized and direct update</li>
<li>DSO supports delta. Cube has no delta support.</li>
<li>DSO stores transactional whereas cube stores summarized</li>
<li>DSO is for consolidation &amp; formalization of data</li>
<li>You can report using the DSO too</li>
<li>DSO has overwrite funtionality. Cube has only summarized.</li>
<li>Full upload is taken from Active data</li>
<li>Delta upload is taken from change log</li>
<li>Active data, change log, activation queue - see manual page 268</li>
<li>X value in the change log means before image. Blank means after image</li>
</ol>
<p><strong>Noon day 4</strong></p>
<ol>
<li> Loading data to Infocube using DSO
<ol>
<li>Create datasource</li>
<li>Import the structure from flat file and copy the template and activate</li>
<li>Create the infopackage to load data to PSA</li>
<li>Go to infoprovider</li>
<li>Assuming that the infoarea and infocatalog for char and key are already done</li>
<li>Create DSO</li>
<li>Drag and drop key and data fields from the infoobject catalogs and activate</li>
<li>Create the transformation with the target as DSO and source as the Datasource (of flat file)</li>
<li>Create DTP</li>
<li>Create infocube</li>
<li>Create dimensions</li>
<li>Drag and drop characteristics from infoobject catalog</li>
<li>Drag and drop key figures from IO catalog</li>
<li>Activate</li>
<li>Create transformation, this time with target as Infocube and source as DSO</li>
<li>Create DTP</li>
<li>Execute infopackage job and check if flat file data is loaded to PSA</li>
<li>Execute DTP to load DSO</li>
<li>Right click on DSO and select manage<br />
<a href="http://biexplorer.files.wordpress.com/2008/10/manage-dso.jpg"><img class="aligncenter size-medium wp-image-125" title="manage-dso" src="http://biexplorer.files.wordpress.com/2008/10/manage-dso.jpg?w=300" alt="" width="300" height="187" /></a><a href="http://interactivebi.files.wordpress.com/2008/10/manage-dso.jpg"><br />
</a></li>
<li>You will see the DTP job there</li>
<li>Click on contents and see if new data is there</li>
<li>Go back to requests, select the request and activate it</li>
<li>A new window will appear. Select the request and start the job</li>
<li>Now check the data in the content tab in active data</li>
<li>Execute DTP for infocube</li>
<li>Right click on infocube and select Display data.</li>
<li>Check columns needed and if needed, click on Use DB aggregation</li>
<li>See if data is loaded.</li>
</ol>
</li>
<li>Now for the delta load to Infocube
<ol>
<li>Repeat all steps above</li>
<li>Now go and change key figure data in the flat file</li>
<li>Execute Infopackage for loading PSA</li>
<li>Execute DTP to load DSO</li>
<li>Right click on DSO and select manage</li>
<li>Activate the new request</li>
<li>Check the changed log</li>
<li>Execute DTP to load infocube</li>
<li>Check the data in infocube and see if incremental is done<br />
<a href="http://biexplorer.files.wordpress.com/2008/10/delta-load-to-cube.jpg"><img class="aligncenter size-medium wp-image-128" title="delta-load-to-cube" src="http://biexplorer.files.wordpress.com/2008/10/delta-load-to-cube.jpg?w=300" alt="" width="300" height="161" /></a><a href="http://interactivebi.files.wordpress.com/2008/10/delta-load-to-cube.jpg"><br />
</a></li>
</ol>
</li>
</ol>
<p><strong>Day 4 noon</strong></p>
<ol>
<li>Virtualprovider overview</li>
<li>Virtualprovider creation steps
<ol>
<li>Right click on Infoarea and select create Virtualprovider</li>
<li>Give the name and if you want to copy details from a cube, specify it. Activate it</li>
<li>Go to datasources and create a datasource</li>
<li>In the extraction tab, say <strong>Direct access allowed</strong></li>
<li>Specify the flat file and copy template and activate it</li>
<li>Go to Virtual provider and create transformation and DTP</li>
<li>Activate but DO NOT EXECUTE as it is not needed</li>
<li>Infopackage creation for datasource is not needed</li>
<li>Right click on virtual provider and select Activate direct access</li>
<li>Right click on Virtualprovider and select display data</li>
</ol>
</li>
<li>RDA</li>
</ol>
<p><strong>Day 5</strong></p>
<ol>
<li>Aggregates - used to improve query performance</li>
<li>Steps to create an aggregate
<ol>
<li>Right click on infocube and choose Maintain aggregate</li>
<li>Select Create by yourself</li>
<li>Create new aggregate or hit F5</li>
<li>Drag and drop the characteristics for aggregation</li>
<li>If you want to choose Fixed values, select it by right clicking on Characteristic</li>
<li>Click on Aggregate and activate it</li>
<li>Start the job (immediate or later)</li>
<li>Click on the Spectacle symbol to view data or hit SHIFT+F9<br />
<a href="http://biexplorer.files.wordpress.com/2008/10/agg.jpg"><img class="aligncenter size-medium wp-image-135" title="agg" src="http://biexplorer.files.wordpress.com/2008/10/agg.jpg?w=300" alt="" width="300" height="53" /></a><a href="http://interactivebi.files.wordpress.com/2008/10/agg.jpg"><br />
</a></li>
</ol>
<p><a href="http://biexplorer.files.wordpress.com/2008/10/aggregate.jpg"><img class="aligncenter size-medium wp-image-134" title="aggregate" src="http://biexplorer.files.wordpress.com/2008/10/aggregate.jpg?w=300" alt="" width="300" height="116" /></a></li>
<li>Process chains - used to automate loading</li>
<li>Process chains work only with files present on Application server and not with local workstation files
<ol>
<li>Go to transaction window and hit RSPC</li>
<li>Click on create, enter name</li>
<li>Create a start process by clicking on variant and hit OK</li>
<li>Click on Change selection, choose immediate and save</li>
<li>Save again and go back</li>
<li>Click on Continue</li>
<li>Go to process types</li>
<li>Insert the Infopackage for master data datasource</li>
<li>Insert the DTP for masterdata</li>
<li>If all masterdata is done, create an AND</li>
<li>Next, insert the infopackage for transaction datasource</li>
<li>Insert DTP for DSO</li>
<li>Insert Activate datastore object from data target administration</li>
<li>Insert DTP for infocube</li>
<li>Click on schedule and execute</li>
<li>Click on logs and see execution</li>
<li>Refresh to see status</li>
</ol>
</li>
<li><a href="http://biexplorer.files.wordpress.com/2008/10/process-chain.jpg"><img class="aligncenter size-full wp-image-137" title="process-chain" src="http://biexplorer.files.wordpress.com/2008/10/process-chain.jpg" alt="" width="372" height="564" /></a></li>
<li>Creation of master data, cube and loading from scratch
<ol>
<li>Create Infoarea</li>
<li>Create Infocatalog for char</li>
<li>Create Infocatalog for Key fig</li>
<li>Create char Infoobjects, mention data type and length</li>
<li>Assign attributes to master object</li>
<li>Create key fig infoobjects, assign unit and currency (0Currency, 0Unit, etc are default)</li>
<li>Go to datasources</li>
<li>Create datasources, specify source, (for Process chains, the source has to be on application server), copy template in fields tab (use CTRL Y) and activate</li>
<li>Preview data</li>
<li>Create infopackage and activate</li>
<li>Repeat for all master data and transaction data</li>
<li>Go to infoprovider</li>
<li>For masterdata, right click on infoarea and select insert char as infoprovider</li>
<li>Create transformation and DTP</li>
<li>For transaction data, create DSO</li>
<li>Create transformation and DTP</li>
<li>Create infocube</li>
<li>Create transformation and DTP</li>
<li>See Process chain picture above for execution flow</li>
</ol>
</li>
<li>Infoset creation
<ol>
<li>Right click on Infoarea and select create infoset</li>
<li>Choose a start with object</li>
<li>From the screen that appears, choose a cube from the left</li>
<li>Join the objects using the common object</li>
<li>Right click on link and choose required join</li>
</ol>
</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.premsagar.net/techblog/2008/10/21/sap-bi-training-notes/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Logic behind implementing SCD 2</title>
		<link>http://www.premsagar.net/techblog/2008/10/07/logic-behind-implementing-scd-2/</link>
		<comments>http://www.premsagar.net/techblog/2008/10/07/logic-behind-implementing-scd-2/#comments</comments>
		<pubDate>Tue, 07 Oct 2008 10:11:03 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[Data Integration]]></category>

		<category><![CDATA[Data Modelling]]></category>

		<category><![CDATA[Data warehouse]]></category>

		<category><![CDATA[Dimensional modelling]]></category>

		<category><![CDATA[EIM]]></category>

		<category><![CDATA[ETL]]></category>

		<category><![CDATA[DWH]]></category>

		<category><![CDATA[SCD]]></category>

		<guid isPermaLink="false">http://biexplorer.wordpress.com/?p=94</guid>
		<description><![CDATA[Lets talk about the logic behind SCD type 2 today.
We know that SCD 2 is about preserving all the changes in the dimension records. Let us see the logic behind how we can implement it.
NOTE: The steps below assume SCD 2 having a Begin_date and End_date and a Current_Flag column

Check if the incoming row is [...]]]></description>
			<content:encoded><![CDATA[<p>Lets talk about the logic behind SCD type 2 today.</p>
<p>We know that SCD 2 is about preserving all the changes in the dimension records. Let us see the logic behind how we can implement it.</p>
<p>NOTE: The steps below assume SCD 2 having a Begin_date and End_date and a Current_Flag column</p>
<ol>
<li>Check if the incoming row is already present in the target table (dimension) using the <strong>source </strong>primary key</li>
<li>If it doesn&#8217;t exist in the target dimension
<ol>
<li>Generate a surrogate key</li>
<li>Enter source record&#8217;s date as the Begin_date</li>
<li>Enter the default end date (which could be 31/12/2099) as the End_date</li>
<li>If you have a Current_flag column, set it as &#8216;Y&#8217; or &#8216;1&#8242; (or whatever you want)</li>
<li>Insert into the dimension</li>
</ol>
</li>
<li>If the row exists in the target
<ol>
<li>Check if the incoming and target <strong>current</strong> record are different (at least for one chosen attribute)</li>
<li>If they are same, do nothing</li>
<li>If they are different, do the following
<ol>
<li>For the record in the target table, change (update) the End_date to source record&#8217;s date and set the Current_flag to &#8216;N&#8217; or &#8216;0&#8242; or whatever</li>
<li>Take the incoming record, generate a surrogate key, enter source record&#8217;s date as the Begin_date and the default date as the End_date. Also set the Current_flag to &#8216;Y&#8217; or &#8216;1&#8242;. Insert into the dimension</li>
</ol>
</li>
</ol>
</li>
</ol>
<p>NOTE: The End_date of the previous record and the Begin_date of the current record are assumed to be the same. But some people prefer them to be different dates ie) End_date is 1 day lesser than the next records Current_date.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.premsagar.net/techblog/2008/10/07/logic-behind-implementing-scd-2/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Staging area: Necessary or overhead?</title>
		<link>http://www.premsagar.net/techblog/2008/09/23/stage-area/</link>
		<comments>http://www.premsagar.net/techblog/2008/09/23/stage-area/#comments</comments>
		<pubDate>Tue, 23 Sep 2008 17:40:51 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[Data Integration]]></category>

		<category><![CDATA[Data Modelling]]></category>

		<category><![CDATA[Data warehouse]]></category>

		<category><![CDATA[Dimensional modelling]]></category>

		<category><![CDATA[EIM]]></category>

		<category><![CDATA[ETL]]></category>

		<category><![CDATA[DWH]]></category>

		<category><![CDATA[Staging]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=103</guid>
		<description><![CDATA[In this article, let us see what a staging area is, its types and the reason to have one in your data warehouse.
Ok, what is a stage area?
It is that part of a data warehouse where data is stored physically (in database or in files), but as an intermediate step before loading the target data warehouse [...]]]></description>
			<content:encoded><![CDATA[<p>In this article, let us see what a staging area is, its types and the reason to have one in your data warehouse.</p>
<blockquote><p><strong>Ok, what is a stage area?</strong></p></blockquote>
<p>It is that part of a data warehouse where data is stored physically (in database or in files), but as an intermediate step before loading the target data warehouse / data marts. It is where activities like cleansing, de-duplication, etc take place. It is like a pit stop for a racing car before reaching the destination.</p>
<p>Some characteristics of the staging area are</p>
<ol>
<li>accessible to and owned by ETL / DW team</li>
<li>OLAP / reporting teams do not have access to it</li>
<li>indexed very little</li>
<li>ETL developers are usually free to create / drop tables, controlled though (by the architect or modeling team)</li>
</ol>
<p><strong>Types of staging areas:</strong></p>
<ol>
<li><strong>Persistent </strong>staging - stage data is <strong>not</strong> deleted, if you want to maintain history.</li>
<li><strong>Transient </strong>staging - stage data is deleted after each ETL load</li>
</ol>
<p>Most data warehouses have one or more staging areas, the types being either persistent or transient or both.</p>
<p>But should you really have a stage area? Can&#8217;t you do without it? After all these days, ETL tools are more capable of handling more data in memory fully.</p>
<p>Is staging necessary or is it an overhead?</p>
<p><span id="more-69"></span></p>
<p>I think it is very critical to have a staging area, especially if you are moving a lot of data from multiple sources. Here are a few reasons why :</p>
<ol>
<li><strong>Intermediate processing:</strong> Often you need to perform transformations, cleansing and other processing on huge chunks of data from multiple sources. It is not feasible to do all of this in memory purely (extraction from source, processing and direct load to target). We need intermediate steps where data can be stored, the ETL tool can take a breather and start again. This also helps in several other ways discussed below.</li>
<li><strong>Auditing</strong>:<strong> </strong>Having a stage area provides a means of knowing the condition of data at certain points during the ETL cycle. If your ETL job fails mid-way, you can look at the stage tables and figure out the condition of data as it passes through the various steps. Thus it provides for easy auditing.</li>
<li><strong>Recovery:</strong> Suppose if your ETL job fails midway, you would need to fetch the data once again from the source and repeat the entire process all over again, if you do not maintain a stage area. Having stage areas could mean that the job can be restarted from the point of failure.</li>
<li><strong>Backup: </strong>If your source data gets overwritten, the original data is lost.. probably forever. And there is no way in which you can perform a reconciliation with the DW data against the source. But if you had a persistent stage, your data would be backed up in these intermittent databases.</li>
<li><strong>Cleansing, synchronizing, de-duping data</strong>, etc: If you have multiple sources, stage areas are a neat method to synchronize your data, at the same time, to cleanse and conform them. (For eg, if your customer data is from 3 different sources, it makes sense to pull them all into a stage area and have then synched. This is especially true if the different sources are not available at the same time)</li>
<li><strong>Helps reduce contention on busy source (OLTP) systems:</strong> It helps bring down the contention on the source systems. Once data is pulled, you can work on it in the stage area and get it transformed to your needs. If you were to apply these transformations on source data directly without a stage, you would be engaging the connections to the source for much longer. Thsi would impact OLTP systems adversely.</li>
<li><strong>Increases availability of data: </strong>If one of your source systems is temporarily down, you still will have access to the data which is staged.</li>
<li><strong>Joins are easier and faster:</strong> It is much easier and faster if the tables are in the same database schema as against different disparate databases.</li>
</ol>
<p>Source: Several, but notably, Ralph Kimball, The data warehouse ETL toolkit.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.premsagar.net/techblog/2008/09/23/stage-area/feed/</wfw:commentRss>
		</item>
		<item>
		<title>ETL effort estimation: Points to factor-in</title>
		<link>http://www.premsagar.net/techblog/2008/09/23/etl-effort-estimation-points-to-factor-in/</link>
		<comments>http://www.premsagar.net/techblog/2008/09/23/etl-effort-estimation-points-to-factor-in/#comments</comments>
		<pubDate>Tue, 23 Sep 2008 09:37:44 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[Data Integration]]></category>

		<category><![CDATA[Data warehouse]]></category>

		<category><![CDATA[EIM]]></category>

		<category><![CDATA[ETL]]></category>

		<category><![CDATA[DWH]]></category>

		<category><![CDATA[Effort estimation]]></category>

		<category><![CDATA[Estimation]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=102</guid>
		<description><![CDATA[Estimation of ETL effort is not always fun (as with any estimation).
There are several ways to estimate the effort needed to complete an ETL job. Work Breakdown Structure (WBS) is popular. And so is Function Point Analysis (FPA).
But the most widely used is the one that factors in complexity based on the understanding of things like [...]]]></description>
			<content:encoded><![CDATA[<p>Estimation of ETL effort is not always fun (as with any estimation).</p>
<p>There are several ways to estimate the effort needed to complete an ETL job. Work Breakdown Structure (WBS) is popular. And so is Function Point Analysis (FPA).</p>
<p>But the most widely used is the one that factors in complexity based on the understanding of things like source, target, resources on project, etc.</p>
<p>Though I haven&#8217;t really seen anyone use this method to perfection, it is a good place to start with. Some people argue against this method, but I see this as a complementary option to whatever method you have.</p>
<p>So, here is a list of points that I think would be useful when you do any ETL effort estimation. I have grouped it under 5 heads: Source, target, transformations, resources, other.</p>
<p><strong>Source based:</strong></p>
<ol>
<li>No of different sources &amp; types</li>
<li>Incremental extraction needs</li>
<li>Profiling of data sources</li>
<li>Cleansing / de-duplication dirty data sources</li>
<li>Availability of documentation / transition of knowledge of source data</li>
<li>Access control &amp; management, if needed</li>
<li>Data volumes for unit testing</li>
</ol>
<p><strong><span id="more-68"></span>Stage / Target based:</strong></p>
<ol>
<li>Proper database design available (primary keys defined, right columns indexed, etc)</li>
<li>Familiarity of the team with the various source, staging and target systems</li>
<li>No of different target types</li>
<li>Truncate / append / merge options</li>
<li>Fact / dimension / stage load</li>
</ol>
<p><strong>Transformation / features / mechanisms based:</strong></p>
<ol>
<li>No of transformations and their complexity</li>
<li>Usage of reusability feature</li>
<li>Usage of features like CDC, SCD, etc</li>
<li>Need for error / failure handling and recovery mechanisms</li>
<li>Auditing and validation needs of ETL</li>
<li>Complex functions / calculations for measures</li>
<li>Parallelism / DOP / etc</li>
</ol>
<p><strong>Resource based :</strong></p>
<ol>
<li>Which is the tool used? What is the skill level of the resources (fresh, medium skilled, expert)?</li>
<li>Availability of the resources (part time, full time, etc)</li>
</ol>
<p><strong>Other :</strong></p>
<ol>
<li>Are there any people, task or decision dependencies?</li>
<li>Unit / system &amp; integration testing needs</li>
<li>Migration / deployment needs</li>
<li>Performance and tuning needs</li>
<li>Project management / status reporting needs</li>
<li>How clear is the requirement? Is scope properly defined?</li>
<li>How well is the project being managed?</li>
<li>Have you factored in time for rework?</li>
</ol>
<p><strong>Few interesting links on this topic :</strong><br />
1. Vincent McBurney&#8217;s article on <a href="http://it.toolbox.com/blogs/infosphere/my-wiki-wiki-ways-estimating-etl-development-time-10940" target="_blank">estimating ETL development time</a><br />
2. <a href="http://geekswithblogs.net/Prabhats/archive/2007/03/01/107632.aspx" target="_blank">FPA</a><br />
3. <a href="http://www.ewsolutions.com/resource-center/rwds_folder/rwds-archives/issue.2008-03-01.6090544414/document.2008-03-01.9435972766/view?searchterm=Estimation%20Methodology%20for%20Data%20Warehousing%20Projects" target="_blank">FPA based estimation for a Data warehouse</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.premsagar.net/techblog/2008/09/23/etl-effort-estimation-points-to-factor-in/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Is open source database a viable solution?</title>
		<link>http://www.premsagar.net/techblog/2008/08/08/is-open-source-database-a-viable-solution/</link>
		<comments>http://www.premsagar.net/techblog/2008/08/08/is-open-source-database-a-viable-solution/#comments</comments>
		<pubDate>Fri, 08 Aug 2008 20:16:08 +0000</pubDate>
		<dc:creator>biexplorer</dc:creator>
		
		<category><![CDATA[Database]]></category>

		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://biexplorer.com/?p=99</guid>
		<description><![CDATA[Please read the article at this link.
What do you think? Is open source database a viable solution?
The link says (as quoted by Forrester) that the market share for

Open source database is at $850 million
Commercial databases is $ 16 billion

Key take aways from this article:

Many vendors are now open to the idea of a open source [...]]]></description>
			<content:encoded><![CDATA[<p>Please read the article at <a href="http://searchdatamanagement.techtarget.com/news/article/0,289142,sid91_gci1323310,00.html" target="_blank">this link</a>.</p>
<p>What do you think? Is open source database a viable solution?</p>
<p>The link says (as quoted by Forrester) that the market share for</p>
<ul>
<li>Open source database is at $850 million</li>
<li>Commercial databases is $ 16 billion</li>
</ul>
<p><span id="more-66"></span>Key take aways from this article:</p>
<ol>
<li>Many vendors are now open to the idea of a open source database. I personally feel small and medium sized enterprises are the ones who would be keen on it.</li>
<li>Forrester says the open source databases available today are capable of handling 80% of applications today. That to me is a surprisingly high number.</li>
</ol>
<p>In my opinion, this is an interesting area for the future. Today we have a few reliable open source databases like</p>
<ol>
<li>Postgre SQL</li>
<li>MySQL</li>
</ol>
<p>But open source has its own advantages and disadvantages. Below are a few that I can think of.</p>
<p><strong>Advantages:</strong></p>
<ol>
<li>Cheap / free to use and deploy.</li>
<li>Is almost capable of handling all small enterprise applications.</li>
</ol>
<p><strong>Disadvantages:</strong></p>
<ol>
<li>Very difficult to migrate from a commercial database to open source as of today. Can change soon.</li>
<li>Though database itself is cheap, sometimes the cost of maintaining it is high. And finding resources capable of working on an open source database is a challenge.</li>
<li>Post implementation support can be a challenge. But it is improving.</li>
<li>Lack of proper case studies and published implementations make enterprises skeptical of open source.</li>
<li>Awareness and confidence is poor.</li>
</ol>
<p>As a parting thought, the article also rightly says that companies should try the open source way for small to medium sized applications that are NON-MISSION-CRITICAL and evaluate the database first. If found satisfactory, they should migrate to an open source database for their other applications.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.premsagar.net/techblog/2008/08/08/is-open-source-database-a-viable-solution/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
