Tuesday, February 19, 2008

Informatica Questions and Answers real scenario Problem/Solutions

can we do this in informatica

I Want out put like this

Name Prd Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Ram Pen 10 10 20 20 56 50 40 70 80 100 50 200
Suni Book 100 10 50 60 40 40 40 400 40 40 20 110
Tom Mob 30 50 40 50 10 200 200 200 120 600 10 100

My Source is Like this

Name|Prd|Month|Quantity
Ram|Pen|Jan|10
Ram|Pen|Feb|10
Ram|Pen|Mar|20
Ram|Pen|Apr|20
Ram|Pen|May|56
Ram|Pen|Jun|50
Ram|Pen|Jul|40
Ram|Pen|Aug|70
Ram|Pen|Sep|80
Ram|Pen|Oct|100
Ram|Pen|Nov|50
Ram|Pen|Dec|200
Sunil|Book|Jan|100
Sunil|Book|Feb|10
Sunil|Book|Mar|50
Sunil|Book|Apr|60
Sunil|Book|May|40
Sunil|Book|Jun|40
Sunil|Book|Jul|40
Sunil|Book|Aug|400
Sunil|Book|Sep|40
Sunil|Book|Oct|40
Sunil|Book|Nov|20
Sunil|Book|Dec|110
Gaved|Mob|Jan|30
Gaved|Mob|Feb|50
Gaved|Mob|Mar|40
Gaved|Mob|Apr|50
Gaved|Mob|May|10
Gaved|Mob|Jun|200
Gaved|Mob|Jul|200
Gaved|Mob|Aug|200
Gaved|Mob|Sep|120
Gaved|Mob|Oct|600
Gaved|Mob|Nov|10
Gaved|Mob|Dec|100

This process is called denormalisation (or "denormalization").
Basically what you need here are:
- an EXP which "assembles" the records per year,
- one extra record at the end of the real input data,
- a FIL which lets pass through only the last record per year.

This is a VERY rough outline, but it works. So the answer to your
initial question (can this be done in PowerCenter?) is clearly YES.
=======================================================================

Hello,

I am currently developping on version 7.1.4 of Informatica.

I need to load flat file targets on a daily basis. I'd like to parameter the
name of the target file with a timestamp (Example : MyFile_YYYYMMDD.txt) so
I can put all the target files in a single directory but have a separate
file for each day of loading.
My sources are Oracle tables.
I am OK with generating the timestamp, my question is especialy about how to
parameter the name of the target file ?


Use one mapping to create the name of the target file and write this
file name to a parameter file.
Have the session for the second mapping utilise this parameter file.
This second session is the one that you want to provide with a
parameterised target file name.
Look up the Workflow Administration Guide, there's one section about
the session variables among which you will find the variables
$OutputFileName... ; this is the feature you need here.


There is a way to pass the filename of the target file via a special
filename port in the target. Check the documentation


Just a bit of explanation:
Since PowerCenter 7.1.4 a flat file target can have a new port named
FileName; when _creating_ a flat file target (not when importing one)
there are several buttons above the port list, and the rightmost one
of these shows an uppercase F in a dotted square; click this, and you
get a new port named FileName of type String(255).

Beware that a new file name provided via this port will only be
accepted on a transaction boundary; usually this feature is used to
generate several target files with different names during one session run.


=======================================================================


Hi,
Is it possible to write data directly into a file system on an external server using Powercenter without creating an intermediate file on Powercenter server machine.

Basically, we are trying to extract several gigs of data from an Oracle source, create the target directly as a FF on a target server and delete the data from the source once the target FF has been successfully created. We don't want to land the data on Powercenter server.

PC version is 7.1.3 on Solaris 9.

I dont see why not....you would probably need to map the drive of the
server into the Informatica Server machine and provide the path
accordingly, else I think you can use FTP Connection to right
directly into the FTP Target.

You can define you flat-file target as an FTP target and choose not
to stage it.

Informatica will then open a FTP connection and write the data
directly to the remote file.


======================================================================
hi,
I need some advice on using dynamic lookups. This is my scenario.

I get few employee,dept details from source.
(emp_name I get from srce).

My target is project_details_tbl.

Before inserting emp_name into the target, I lookup for the employee
details in the employee table.
If there is no row available, I need to insert into the table, making
the same table as the target.

For this, I will use a Dynamic Lookup. But, there are 3 fields in this
table(emp_num,emp_name,dept).

The first field is a simple sequence number.
The second, I wd get frm source.
The third , is a null value.

For every new emp_name I get from the source, a unique emp_num is to
be inserted.

I can get this by

select max(emp_num) +1 from emp_table

Hence there are two steps,
One dynamic lookup to check if the row already exists or not.
If it does not exist, another dynamic lookup to check the current max
value of the sequence number.

This sounds very complicated. Is using two dynamic lookups unnecessary?
Am I missing out any simple straight forward method?

Plz advice


Dinamic lookup is required only if you are getting duplicate emp
names from source, other wise you can do with simple lookup on target
table. Check in expression if lookup findes any value or not, if no
value found insert in project_details_tbl (u may want to use filter
here)


Hey..with a simple lookup I cannot see the updated records.

There is a new row from the source and it got inserted. When another
row comes up, that should also get inserted into the lokup table
right.

When sometime later, this row comes up, it should be able to get
this value too from the lookup table.

For this, I think a dynamic lookup is required.


======================================================
How many groups(filter conditions) we can use in RTR transformtion .

i have to use more than 100, will it work
please help in this one....

I don't know why this should not work. But to be honest I see a
"slight" problem: maintainability. Personally I would prefer (if that
can be done usefully) to split up this "monster" router into several
cascaded routers; one RTR would split up the records into for example
12 groups, and then every group uses another RTR to finally split up
the records. That's easier to survey and to understand. Of course this
works only if the >100 groups have some things in common.


Stored procedure might be a good option in cases like that. Instead of having a HUGE router or bunch of routers.

Please be careful giving such advice; we all do not know the least bit
about Naveen's requirements, and giving any "generalised" hint such as
"SP may be good here" may mislead users. Before giving such advice we
first have to better understand the original cause of trouble; I agree
that a RTR with >100 groups looks somewhat puzzling at first, but we
cannot know without more information what's the best alternative.

[Nico, have you read this? Sometimes you should be more careful in
this regard, too. ;-) ]

would say that any mapping with more than 20+ objects (router or
otherwise) is a problematic architecture, Except or unless the
mapping is for real-time, is using XML or is normalizing cobol
copybooks. Only in those cases have I seen a lot of objects in a map.

Just be aware, that "just because you can do it doesn't mean you
should." In other words, I can garauntee this map will have
performance problems. It probably won't run more than 800 rows per
second if your lucky.


===========================================

Imagine a expression with more than 50 datetime ports.

I need to create an output port with the most recent date.

Is there an easier way other than using several DATE_COMPARE ?


It's easier to write a Java Transformation (JTX) under PowerCenter
8.x; in a JTX, date/time ports are always treated as "long" variables
(signed 64-bit integers), so you can simply compare them as numbers.
However, a JTX is always executed in a so-called Java Virtual Machine
(JVM) in a different OS process, meaning you have to take into account
not only the process switch (which costs time) but also the (not too
easy) communication between the JVM and the PowerCenter Integration
Service process.

My suggestion is to "normalise" these >= 50 ports using a NRM, then
sort them per "group" (namely per real input record), and retrieve the
latest date from the sorted group of data. For example using an AGG.


Thanks for your answer.
I forgot to say that we still use Powercenter 7.1.4.

The easier solution we found was creating an Oracle Function that receives n
dates
as parameters and returns only one. Oracle has a function called GREATEST
that
returns the greatest item of an list of n items.

Any way, thanks a lot.

I was interested in learning the JTX transformation. Do you think you can provide sample code for a JTX transformation.

thanks for pointing out the EXP instead of the AGG, this can speed up
the solution.
Besides this detail this exactly resembles my suggestion. The NRM is
the perfect tool for this task; you can feed the NRM with as many
repetitive ports as you want to, and it will in (almost) every case be
faster than calling a Stored Procedure.
Not to forget that with a SP you are stuck to the database system
which you're going to use. This is one advantage of PowerCenter:
whatever you can do in a mapping, you can do it regardless of any
underlying storage system (be it flat files, be it some RDBMS). We are
using a great tool, so why not use all of its potential?

This function 'GREATEST()' is available in INFA 8. I am not sure about INFA
7.
But one can expect the same service from this function as Oracle GREATEST
function.
Performance considerations are important here. :)

It's been introduced in 8.0, there was nothing like that available in 7.x.

I don't know about the Oracle GREATEST function, and to be honest I
don't care; I'm happy that it exists in PowerCenter 8. ;-)
The most important thing one should keep in mind is that GREATEST()
returns NULL only if all its input values are NULL.

Check it out guys. AFAIK until 8.1.1 Informatica had a
different way of handling GREATEST() and LEAST();

While Oracle will return a null for Greatest() and
Least() if any of the columns used in the function is
a null, Informatica choose to ignore the nulls and
provide a output based on the not null columns.

IMHO, Greatest() can be used for this Max Date problem
but would advise caution and use something like NVL to
handle the nulls. This would de-risk your code if Infa
changed these functions at a later date to emulate the
databases.

This all depends on your PMServer configuration setting / PowerCenter
Service setting. You can change the way NULLS are handled there.

The way we control the handling of nulls is through the flag Aggregator Treat nulls as (something similarly worded); Greatest ( ) and Least ( ) are row level functions and hence this flag would not influence them but would influence only the aggregate functions

No problem, everyone has their own opinion, but do me a favor: check
the settings for the powercenter service, please tell the group what
the results are of changing the service setting for handling nulls
and then running your aggregator function, just to be sure. If
they've changed the behaviour we want to make sure the group has the
facts, not just opinionated here-say from each of us.

No comments: