Informatica

Monday, 23 July 2012

Returning Result set

CREATE OR REPLACE PROCEDURE scott.proc (v_deptno nmuber,p_result_set OUT sys_refcursor)
IS
BEGINOPEN p_result_set FOR
SELECT *FROM emp
END;WHERE deptno LIKE '%' || v_deptno;

Sunday, 15 July 2012

Writing data to a text file from Oracle

Using External table we can read the data in the text file and put it in the respective table(s) in the schema.

In a similar fashion, we can also read data from regular tables and write contents of the table into flat file. Only thing to remember is, to write a data into text file, we need to use ORACLE_DATAPUMP access driver.

Up until Oracle 9i, external tables were read only. It means we could not write into external table from the schema.

To load data from flat file Oracle uses ORACLE_LOADER access driver which is based on the SQL*Loader API.

Starting Oracle 10g, Oracle introduced another method to export/import the data, called data pump. Using same data pump access driver, now we can also write into text/external file from database.

To deal with external tables, we need to create Oracle directory object which maps to folder on the operating system. So, we will create directory object which specifies an alias for a folder/directory on a server. Make sure that you have the correct privilege to create the directory object and correct permission on the o/s folder so that there is no error when we write data into text file.

Again, the directory structure we use is defined by user_dump_dest oracle parameter. We can check the value of this parameter using SHOW PARAMETER command. Following is the command to create directory. Path could be different on your system so modify the command before executing it. Let us first create directory object from SQL*Plus.

SQL> CREATE
OR
REPLACE
DIRECTORY DATALOAD AS
'D:\oracle\product\10.2.0\admin\orcl\udump';

Directory created.

Oracle user, who issues the create directory command, automatically gets read and write privilege on the directory. Other oracle users should be given specific grants to read and write content from the directory.

Now we will create an empty table and populate it with some data. Connect to the user using SQL*Plus and execute following sqls.

CREATE TABLE TEST_TABLE
(
TEST_TABLE_ID NUMBER(9) NOT NULL,
CITY VARCHAR(50),
STATE VARCHAR(2),
ZIP VARCHAR(10),
CNTRY VARCHAR(50),
CONSTRAINT PK_TEST_TABLE PRIMARY KEY(TEST_TABLE_ID)
)
/

INSERT INTO TEST_TABLE(TEST_TABLE_ID,CITY,STATE,ZIP,CNTRY)
VALUES(1,'ATLANTA','GA','30318′,'USA')
/
INSERT INTO TEST_TABLE(TEST_TABLE_ID,CITY,STATE,ZIP,CNTRY)
VALUES(2,'CALGARY','AB','T6H-5K9′,'CANADA')
/
INSERT INTO TEST_TABLE(TEST_TABLE_ID,CITY,STATE,ZIP,CNTRY)
VALUES(3,'JOHNS CREEK','GA','30097′,'USA')
/
INSERT INTO TEST_TABLE(TEST_TABLE_ID,CITY,STATE,ZIP,CNTRY)
VALUES(4,'FOREST HILLS','NY','11798′,'USA')
/

This time we will create external tables to write data directly into file from the database. Execute following statement from SQL> prompt.

CREATE TABLE external_city
(
ID,CITY,STATE,ZIP,CNTRY
)
ORGANIZATION EXTERNAL
(
TYPE oracle_datapump
DEFAULT DIRECTORY dataload
LOCATION ('external_city.txt')
)
AS
SELECT TEST_TABLE_ID,CITY,STATE,ZIP,CNTRY
FROM TEST_TABLE
/

Above command will write to an external table i.e. Oracle will push data to the file from the database upon execution of the command. The query (select statement) is shown in simplest form. We can write complex select statement which involves joins, functions etc. as well to create external table. One key thing to note is, access driver (TYPE) is defined as oracle_datapump. Also we haven't specified data type for any column in the CREATE TABLE clause. Oracle will automatically take the data type of columns mentioned in the SELECT clause. We can check it by describing the table.

SQL> desc external_city
Name Null? Type
——————— ——– ————
ID NOT NULL NUMBER(9)
CITY VARCHAR2(50)
STATE VARCHAR2(2)
ZIP VARCHAR2(10)
CNTRY VARCHAR2(50)

We can even create table without specifying any column names. Run following statement from SQL> prompt. If you already have created table earlier, drop the table and delete the file from the o/s folder before executing this statement.

CREATE TABLE external_city
ORGANIZATION EXTERNAL
(
TYPE oracle_datapump
DEFAULT DIRECTORY dataload
LOCATION ('external_city.txt')
)
AS
SELECT TEST_TABLE_ID,CITY,STATE,ZIP,CNTRY
FROM TEST_TABLE
/

Whenever table is created, Oracle always creates at least two files. Text file as specified in LOCATION clause and log file. Every time when SELECT is performed, log file will be updated. Some points to keep in mind about the file are
• Dropping a table will not drop the file. One has to delete it once table is dropped.
• File can be dropped without dropping a table, but later when SELECT is performed, oracle will give an error.
• File cannot be edited manually using any other editor like notepad, wordpad or vi etc.
• Multiple files can be specified in LOCATION clause but it has to follow directory:filename syntax.

Once table is created, we can use SELECT statement against external table in a same way as we use it against regular tables. Table can be used in join condition with other tables as well.

SQL> SELECT ID, City, State, Zip, Cntry FROM External_City;

ID CITY ST ZIP CNTRY
———- ————— — ———- ———-
1 ATLANTA GA 30318 USA
2 CALGARY AB T6H-5K9 CANADA
3 JOHNS CREEK GA 30097 USA
4 FOREST HILLS NY 11798 USA

Thursday, 2 June 2011

SCD – Slowly Changing Dimensions

SCD – Slowly Changing Dimensions

Implementing SCD’s is common concern in Data Warehouse design. Let me brief about different types of SCD’s first then we can step in to SCD’s design.

Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time.

Everybody knows that ‘Slowly Changing Dimensions’ is the acronym for SCD.

We have 4 types of SCD’s in Data Warehouse

Let’s say I have a customer dimension with these columns mainly

(Customer Id, Customer First Name, Customer Last Name, Customer Country)

Customer Id	Customer First Name	Customer Last Name	Customer Country
1	Sudheer	Sharma	India

Now, this guy moved to US. In source the country name has been changed to US, we need to update that in our target dimension to reflect this change.

SCD Type 1: The new incoming record (changed/modified data set) replaces the existing old record in target.

Customer Id	Customer First Name	Customer Last Name	Customer Country
1	Sudheer	Sharma	US

Old value (India) is overwritten by the new value (US) and there is no way to find out the old version of data. It holds only the current version of data.

SCD Type 2: In this case, an additional record is added into the customer dimension. The beauty of this approach is it will maintain two versions, you will find two records the older version and the current version. In other words it maintains history. Again we can implement Type 2 in following methods

Versioning
Effective Dates
By setting Current Flag values/Record Indicators.

Method 1: Versioning

Customer Id	Customer First Name	Customer Last Name	Customer Country	Effective Year	Version
1	Sudheer	Sharma	India	2008	0
1	Sudheer	Sharma	US	2009	1

Method 2: Effective Dates

Customer Id	Customer First Name	Customer Last Name	Customer Country	Effective Start Date	Effective EndDate
1	Sudheer	Sharma	India	01/01/2008	12/31/2008
1	Sudheer	Sharma	US	01/01/2009	tilldate

Method 3: Effective Dates & Current Record Indicators

Customer Id	Customer First Name	Customer Last Name	Customer Country	Effective Start Date	Effective End Date	Current Record IND
1	Sudheer	Sharma	India	01/01/2008	12/31/2008	N
1	Sudheer	Sharma	US	01/01/2009	tilldate	Y

SCD Type 3: In this approach, only the information about a previous value of a dimension is written into the database. An ‘old ‘or ‘previous’ column is created which stores the immediate previous attribute.

Product ID	Product Name	Current Year	Current Price	Previous Year	Previous Price
1	Close-up	2008	50.00	2007	45.00

The problem with this approach is over years, if the product price continuously changes, then the complete history may not be stored, only the latest change will be stored. For example, in year 2009, if the product price changes to 60, then we would not be able to see the complete history of 2007 prices, since the old values would have been updated with 2008 information.

SCD Type 4: In this approach, one table hold current data and another table keeps historical data for each dimension.

Customer Dimension

Customer Id	Customer First Name	Customer Last Name	Customer Country
1	Sudheer	Sharma	US

Customer History Table

Customer Id	Customer First Name	Customer Last Name	Customer Country	Effective Year
1	Sudheer	Sharma	India	2008

DIMENSION TABLE VS FACT TABLE

Wednesday, 1 June 2011

Working with Sessions Overview

Working with Sessions Overview

A session is a set of instructions that tells the Integration Service how and when to move data from sources to targets. A session is a type of task, similar to other tasks available in the Workflow Manager. In the Workflow Manager, you configure a session by creating a Session task. To run a session, you must first create a workflow to contain the Session task.

When you create a Session task, you enter general information such as the session name, session schedule, and the Integration Service to run the session. You can also select options to run pre-session shell commands, send On-Success or On-Failure email, and use FTP to transfer source and target files.

You can configure the session to override parameters established in the mapping, such as source and target location, source and target type, error tracing levels, and transformation attributes. You can also configure the session to collect performance details for the session and store them in the PowerCenter repository. You might view performance details for a session to tune the session.

You can run as many sessions in a workflow as you need. You can run the Session tasks sequentially or concurrently, depending on the requirement.

The Integration Service creates several files and in-memory caches depending on the transformations and options used in the session. For more information about session output files and caches, see “Integration Service Architecture” in the PowerCenter Administrator Guide

Working with Sessions Overview

Working with Sessions Overview

You can run as many sessions in a workflow as you need. You can run the Session tasks sequentially or concurrently, depending on the requirement.

Repository Architecture

Repository Architecture

The PowerCenter repository resides in a relational database. The repository database tables contain the instructions required to extract, transform, and load data. Repository clients access the repository database tables through the Repository Service. A repository client is any PowerCenter component that connects to the repository.

The Repository Service manages repository metadata transaction requests from repository clients. Each Repository Service manages a single repository. The Repository Service uses object-locking to ensure the consistency of metadata in the repository.

A Repository Service process is a multi-threaded process that fetches, inserts, and updates metadata in the repository database tables. A Repository Service process is an instance of the Repository Service that runs on a particular machine, or node.

The Repository Service accepts client metadata transaction requests from the following PowerCenter components:

PowerCenter Client tools. Use the Designer to create and store mapping metadata in the repository. Use the Workflow Manager to store workflow metadata and connection object information in the repository. Use the Workflow Monitor to retrieve workflow run status information and session logs written by the Integration Service. Use the Repository Manager to organize and secure metadata by creating folders. You can manage the repository from the PowerCenter Administration Console.

pmrep and infacmd. Use pmrep to perform repository metadata administration tasks, such as listing repository objects. Use infacmd to perform service-related functions, such as creating or removing a Repository Service.

Integration Service. When you start the Integration Service, it connects to the repository to schedule workflows. When you run a workflow, the Integration Service retrieves workflow task and mapping metadata from the repository. During the workflow run, the Integration Service writes workflow status information to the repository.