Friday, February 15, 2008

How to create informatica transformations: Aggregator, Expression ,Filter ,Joiner Lookup Rank Router Sequence Generator Source Qualifier

Basic INFORMATICA TRANSFORMATIONS

Aggregator
Expression
Filter
Joiner
Lookup
Rank
Router
Sequence Generator
Stored Procedure
Source Qualifier
Update Strategy
XML source qualifier
External Procedure
Advanced External Procedure
Expression Transformation - You can use ET to calculate values in a single row before you write to the target - You can use ET, to perform any non-aggregate calculation - To perform calculations involving multiple rows, such as sums of averages, use the Aggregator. Unlike ET the Aggregator Transformation allow you to group and sort data Calculation To use the Expression Transformation to calculate values for a single row, you must include the following ports. - Input port for each value used in the calculation - Output port for the expression NOTE You can enter multiple expressions in a single ET. As long as you enter only one expression for each port, you can create any number of output ports in the Expression Transformation. In this way, you can use one expression transformation rather than creating separate transformations for each calculation that requires the same set of data.

Sequence Generator Transformation - Create keys - Replace missing values - This contains two output ports that you can connect to one or more transformations. The server generates a value each time a row enters a connected transformation, even if that value is not used. - There are two parameters NEXTVAL, CURRVAL - The SGT can be reusable - You can not edit any default ports (NEXTVAL, CURRVAL) SGT Properties - Start value - Increment By - End value - Current value - Cycle (If selected, server cycles through sequence range. Otherwise, Stops with configured end value) - Reset No of cached values NOTE - Reset is disabled for Reusable SGT - Unlike other transformations, you cannot override SGT properties at session level. This protects the integrity of sequence values generated. Aggregator Transformation Difference between Aggregator and Expression Transformation We can use Aggregator to perform calculations on groups. Where as the Expression transformation permits you to calculations on row-by-row basis only. The server performs aggregate calculations as it reads and stores necessary data group and row data in an aggregator cache. When Incremental aggregation occurs, the server passes new source data through the mapping and uses historical cache data to perform new calculation incrementally. Components - Aggregate Expression - Group by port - Aggregate cache When a session is being run using aggregator transformation, the server creates Index and data caches in memory to process the transformation. If the server requires more space, it stores overflow values in cache files. NOTE The performance of aggregator transformation can be improved by using “Sorted Input option”. When this is selected, the server assumes all data is sorted by group.

Incremental Aggregation - Using this, you apply captured changes in the source to aggregate calculation in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes - This allows the sever to update the target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session. Steps: - The first time you run a session with incremental aggregation enabled, the server process the entire source. - At the end of the session, the server stores aggregate data from that session ran in two files, the index file and data file. The server creates the file in local directory. - The second time you run the session, use only changes in the source as source data for the session. The server then performs the following actions: (1) For each input record, the session checks the historical information in the index file for a corresponding group, then: If it finds a corresponding group – The server performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental changes. Else Server create a new group and saves the record data (2) When writing to the target, the server applies the changes to the existing target. o Updates modified aggregate groups in the target o Inserts new aggregate data o Delete removed aggregate data o Ignores unchanged aggregate data o Saves modified aggregate data in Index/Data files to be used as historical data the next time you run the session. Each Subsequent time you run the session with incremental aggregation, you use only the incremental source changes in the session. If the source changes significantly, and you want the server to continue saving the aggregate data for the future incremental changes, configure the server to overwrite existing aggregate data with new aggregate data. Use Incremental Aggregator Transformation Only IF: - Mapping includes an aggregate function - Source changes only incrementally - You can capture incremental changes. You might do this by filtering source data by timestamp. External Procedure Transformation - When Informatica’s transformation does not provide the exact functionality we need, we can develop complex functions with in a dynamic link library or Unix shared library. - To obtain this kind of extensibility, we can use Transformation Exchange (TX) dynamic invocation interface built into Power mart/Power Center. - Using TX, you can create an External Procedure Transformation and bind it to an External Procedure that you have developed. - Two types of External Procedures are available COM External Procedure (Only for WIN NT/2000) Informatica External Procedure ( available for WINNT, Solaris, HPUX etc) Components of TX: (a) External Procedure This exists separately from Informatica Server. It consists of C++, VB code written by developer. The code is compiled and linked to a DLL or Shared memory, which is loaded by the Informatica Server at runtime. (b) External Procedure Transformation This is created in Designer and it is an object that resides in the Informatica Repository. This serves in many ways o This contains metadata describing External procedure o This allows an External procedure to be references in a mappingby adding an instance of an External Procedure transformation. All External Procedure Transformations must be defined as reusable transformations. Therefore you cannot create External Procedure transformation in designer. You can create only with in the transformation developer of designer and add instances of the transformation to mapping. Difference Between Advanced External Procedure And External Procedure Transformation Advanced External Procedure Transformation - The Input and Output functions occur separately - The output function is a separate callback function provided by Informatica that can be called from Advanced External Procedure Library. - The Output callback function is used to pass all the output port values from the Advanced External Procedure library to the informatica Server. - Multiple Outputs (Multiple row Input and Multiple rows output) - Supports Informatica procedure only - Active Transformation - Connected only External Procedure Transformation - In the External Procedure Transformation, an External Procedure function does both input and output, and it’s parameters consists of all the ports of the transformation. - Single return value ( One row input and one row output ) - Supports COM and Informatica Procedures - Passive transformation - Connected or Unconnected By Default, The Advanced External Procedure Transformation is an active transformation. However, we can configure this to be a passive by clearing “IS ACTIVE” option on the properties tab. LOOKUP Transformation - We are using this for lookup data in a related table, view or synonym - You can use multiple lookup transformations in a mapping - The server queries the Lookup table based in the Lookup ports in the transformation. It compares lookup port values to lookup table column values, bases on lookup condition. Types: (a) Connected (or) unconnected. (b) Cached (or) uncached . If you cache the lkp table , you can choose to use a dynamic or static cache . by default ,the LKP cache remains static and doesn’t change during the session .with dynamic cache ,the server inserts rows into the cache during the session ,information recommends that you cache the target table as Lookup .this enables you to lookup values in the target and insert them if they don’t exist.. You can configure a connected LKP to receive input directly from the mapping pipeline .(or) you can configure an unconnected LKP to receive input from the result of an expression in another transformation. Differences Between Connected and Unconnected Lookup: connected o Receives input values directly from the pipeline. o uses Dynamic or static cache o Returns multiple values o supports user defined default values. Unconnected o Recieves input values from the result of LKP expression in another transformation o Use static cache only. o Returns only one value. o Doesn’t supports user-defined default values. NOTES o Common use of unconnected LKP is to update slowly changing dimension tables. o Lookup components are (a) Lookup table. (b) Ports (c) Properties (d) condition. Lookup tables: This can be a single table, or you can join multiple tables in the same Database using a Lookup query override. You can improve Lookup initialization time by adding an index to the Lookup table. Lookup ports: There are 3 ports in connected LKP transformation (I/P,O/P,LKP) and 4 ports unconnected LKP(I/P,O/P,LKP and return ports). If you’ve certain that a mapping doesn’t use a Lookup ,port ,you delete it from the transformation. This reduces the amount of memory. Lookup Properties: you can configure properties such as SQL override .for the Lookup,the Lookup table name ,and tracing level for the transformation. Lookup condition: you can enter the conditions ,you want the server to use to determine whether input data qualifies values in the Lookup or cache . when you configure a LKP condition for the transformation, you compare transformation input values with values in the Lookup table or cache ,which represented by LKP ports .when you run session ,the server queries the LKP table or cache for all incoming values based on the condition. NOTE - If you configure a LKP to use static cache ,you can following operators =,>,<,>=,<=,!=. but if you use an dynamic cache only =can be used . - when you don’t configure the LKP for caching ,the server queries the LKP table for each input row .the result will be same, regardless of using cache However using a Lookup cache can increase session performance, by Lookup table, when the source table is large.

Performance tips:

- Add an index to the columns used in a Lookup condition. - Place conditions with an equality opertor (=) first.

- Cache small Lookup tables .

- Don’t use an ORDER BY clause in SQL override.

- Call unconnected Lookups with :LKP reference qualifier.

Normalizer Transformation

- Normalization is the process of organizing data. - In database terms ,this includes creating normalized tables and establishing relationships between those tables. According to rules designed to both protect the data, and make the database more flexible by eliminating redundancy and inconsistent dependencies.

- NT normalizes records from COBOL and relational sources ,allowing you to organizet the data according to you own needs.

- A NT can appear anywhere is a data flow when you normalize a relational source.

- Use a normalizer transformation, instead of source qualifier transformation when you normalize a COBOL source.

- The occurs statement is a COBOL file nests multiple records of information in a single record.

- Using the NT ,you breakout repeated data with in a record is to separate record into separate records. For each new record it creates, the NT generates an unique identifier. You can use this key value to join the normalized records.

Stored Procedure Transformation

- DBA creates stored procedures to automate time consuming tasks that are too complicated for standard SQL statements.

- A stored procedure is a precompiled collection of transact SQL statements and optional flow control statements, similar to an executable script.

- Stored procedures are stored and run with in the database. You can run a stored procedure with EXECUTE SQL statement in a database client tool, just as SQL statements. But unlike standard procedures allow user defined variables, conditional statements and programming features.

Usages of Stored Procedure

- Drop and recreate indexes.

- Check the status of target database before moving records into it.

- Determine database space.

- Perform a specialized calculation.

NOTE -

The Stored Procedure must exist in the database before creating a Stored Procedure Transformation, and the Stored procedure can exist in a source, target or any database with a valid connection to the server.

TYPES

- Connected Stored Procedure Transformation (Connected directly to the mapping)

- Unconnected Stored Procedure Transformation (Not connected directly to the flow of the mapping. Can be called from an Expression Transformation or other transformations)

Running a Stored Procedure

The options for running a Stored Procedure Transformation:

- Normal

- Pre load of the source

- Post load of the source

- Pre load of the target

- Post load of the target

You can run several stored procedure transformation in different modes in the same mapping. Stored Procedure Transformations are created as normal type by default, which means that they run during the mapping, not before or after the session. They are also not created as reusable transformations.

If you want to: Use below mode

Run a SP before/after the session Unconnected

Run a SP once during a session Unconnected

Run a SP for each row in data flow Unconnected/Connected

Pass parameters to SP and receive a single return value Connected

A normal connected SP will have an I/P and O/P port and return port also an output port, which is marked as ‘R’.

Error Handling

- This can be configured in server manager (Log & Error handling)

- By default, the server stops the session .

Rank Transformation

- This allows you to select only the top or bottom rank of data. You can get returned the largest or smallest numeric value in a port or group.

- You can also use Rank Transformation to return the strings at the top or the bottom of a session sort order. During the session, the server caches input data until it can perform the rank calculations.

- Rank Transformation differs from MAX and MIN functions, where they allows to select a group of top/bottom values, not just one value.

- As an active transformation, Rank transformation might change the number of rows passed through it.

Rank Transformation Properties

- Cache directory

- Top or Bottom rank

- Input/Output ports that contain values used to determine the rank.

Different ports in Rank Transformation

I - Input

O - Output

V - Variable

R - Rank

Rank Index

The designer automatically creates a RANKINDEX port for each rank transformation. The server uses this Index port to store the ranking position for each row in a group. The RANKINDEX is an output port only. You can pass the RANKINDEX to another transformation in the mapping or directly to a target.

Filter Transformation

- As an active transformation, the Filter Transformation may change the no of rows passed through it.

- A filter condition returns TRUE/FALSE for each row that passes through the transformation, depending on whether a row meets the specified condition.

- Only rows that return TRUE pass through this filter and discarded rows do not appear in the session log/reject files.

- To maximize the session performance, include the Filter Transformation as close to the source in the mapping as possible.

- The filter transformation does not allow setting output default values.

- To filter out row with NULL values, use the ISNULL and IS_SPACES functions.

Joiner Transformation

Source Qualifier: can join data origination from a common source database .

Joiner Transformation: Join tow related heterogeneous sources residing in different locations or File systems. To join more than two sources, we can add additional joiner transformations.

SESSION LOGS

Information that reside in a session log:

- Allocation of system shared memory

- Execution of Pre-session commands/ Post-session commands

- Session Initialization

- Creation of SQL commands for reader/writer threads

- Start/End timings for target loading

- Error encountered during session

- Load summary of Reader/Writer/ DTM statistics

Other Information

- By default, the server generates log files based on the server code page.

Thread Identifier

Ex: CMN_1039

Reader and Writer thread codes have 3 digit and Transformation codes have 4 digits.

The number following a thread name indicate the following:

(a) Target load order group number

(b) Source pipeline number

(c) Partition number

(d) Aggregate/ Rank boundary number.

Log File Codes

Error Codes Description

BR - Related to reader process, including ERP, relational and flat file.

CMN - Related to database, memory allocation

DBGR - Related to debugger

EP- External Procedure

LM - Load Manager

TM - DTM

REP - Repository

WRT - Writer

Load Summary

(a) Inserted

(b) Updated

(c) Deleted

(d) Rejected

Statistics details

(a) Requested rows shows the no of rows the writer actually received for the specified operation (b) Applied rows shows the number of rows the writer successfully applied to the target (Without Error) .

(c) Rejected rows show the no of rows the writer could not apply to the target

(d) Affected rows shows the no of rows affected by the specified operation

Detailed transformation statistics

The server reports the following details for each transformation in the mapping

(a) Name of Transformation

(b) No of I/P rows and name of the Input source

(c) No of O/P rows and name of the output target

(d) No of rows dropped

Tracing Levels

Normal - Initialization and status information, Errors encountered, Transformation errors, rows skipped, summarize session details (Not at the level of individual rows)

Terse - Initialization information as well as error messages, and notification of rejected data.

Verbose Init - Addition to normal tracing, Names of Index, Data files used and detailed transformation statistics.

Verbose Data - Addition to Verbose Init, Each row that passes in to mapping detailed transformation statistics.

NOTE : When you enter tracing level in the session property sheet, you override tracing levels configured for transformations in the mapping.

No comments: