1 What is the difference between a data warehouse and a data mart?
It is a collection of data marts. Represents historical data. a data warehouse is a relational database which is specially designed for analysis purpose rather then for transactional purpose.
It is a sub set of data ware housing.It can provide the data to analyze query reporting
& analysis. a datmart is subject oriented database which gives the data about each and every individual department in an organisation.
2 how can one connect two fact tables ? is it possible ? how?
This confirm dimenstion methodology.If a dimension table is connected to more then one Fact table is called confirm dimension.
Fact Tables are connected by confirmed dimensions, Fact tables cannot be connected directly, so means of dimension we can connect
3 suppose data are coming from different locations and those data will not change . is there any need to use surrogate key ?
Yes, We should use the surrogate key, here we are getting data from different locations means every one have one primary key, while transforming the data into target that time more than two key not in use so if you use surrogate key it will identified the duplicate fields in dimensional table.
4 what is the difference between aggregate table and fact table ? how do you load these two tables ?
A fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables or aggregated fact. A fact table usually contains facts with the same level of aggregation. Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all. An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it.
5 What Oracle features can be used to optimize my Warehouse system?
Partition table, bitmap index, sequence ,table function ,sql loader ,function like cube ,roll_up etc.
6 When should you use a STAR and when a SNOW-FLAKE schema?
The snowflake and star schema are methods of storing data which are multidimensional in nature (i.e. which can be analysed by any or all of a number of independent factors) in a relational database .The snowflake schema (sometimes called snowflake join schema) is a more complex schema than the star schema because the tables which describe the dimensions are normalized.Snowflake schema is nothing but one dimension table will be connected to another dimension table and so on.
? If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snow flaking may
? A multidimensional view is sometimes added to an existing transactional database to aid reporting. In this case, the tables which describe the dimensions will already exist and will typically be normalized. A snowflake schema will hence be easier to implement.
? A snowflake schema can sometimes reflect the way in which users think about data. Users may prefer to generate queries using a star schema in some cases, although this may or may not be reflected in the underlying organization of the database.
? Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snow flaking would typically be required to permit simple query tools such as Cognos
Power play to form such a query, especially if provision for these forms of query weren't anticipated when the data warehouse was first designed.
The star schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single "fact table" with a compound primary key, with one segment for each "dimension" and with additional columns of additive, numeric facts.The star schema makes multi-dimensional database (MDDB) functionality possible using a traditional relational database. Because relational databases are the most common data management system in organizations today, implementing multi-dimensional views of data using a relational database is very appealing. Even if you are using a specific MDDB solution, its sources likely are relational databases. Another reason for using star schema is its ease of understanding. Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables are in de-normalized second normal form (2NF). If you want to
normalize dimensional tables, they look like snowflakes (see snowflake schema) and the same problems of relational databases arise - you need complex queries and business users cannot easily understand the meaning of data. Although query performance may be improved by advanced DBMS technology and hardware, highly normalized tables make reporting difficult and applications complex.
7 When should one use an MD-database (multi-dimensionaldatabase) and not a relational one?
1 Because More than one dimensions can be shareble for Other Department
2 The Physical Load will be less.
3 Less Complexity of Fact
8 What is the difference between an ODS and a W/H?
An ODS is an environment that pulls together, validates, cleanses and integrates data from disparate source application systems. This becomes the foundation for providing the end-user community with an integrated view of enterprise data to enable users anywhere in the organization to access information for strategic and/or tactical decision support, day-to-day operations and management reporting.
The defination of Data Warehouse is as follows.
? Subject-oriented, meaning that the data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;
? Time-variant, meaning that the changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;
? Non-volatile, meaning that data in the database is never over-written or deleted, but retained for future reporting;
? Integrated, meaning that the database contains data from most or all of an organization's operational applications, and that this data is made consistent.
Transactions similar to those of an Online Transaction Processing System
Queries process larger volumes of data
Contains current and near current data
Contains historical data.
Typically detailed data only, often resulting in very large data volumes
Contains summarised and detailed data,generally smaller in size than on ODS
Real-time and near real-time data loads
Typically batch data loads
Generally modeled to support rapid data update
Generally dimensionally modeled and tunes to optimise query performance
Updated at the data field leve
Data is appended, not updated
Used for detailed decision making and
Used for ling-term decision making and
Knowledge workers (customer service
representatives, line managers)
Strategic audience (executives, business
9 What is the difference between a W/H and an OLTP application?
Warehouse is used for high level data analysis purpose .It is used for predictions, time series analysis, financial analysis , what -if simulations etc. Basically it is used for better decision making.
OLTP is NOT used for analysis purpose.It is used for transaction and data processing.
Its basically used for storing day-to-day transactions that take place in an organisation.
The main focus of OLTP is easy and fast inputing of data, while the main focus in data warehouse is easy retrieval of data.
OLTP doesnt store historical data.(this is the reason why it cant be used for analysis)
DW stores historical data.
10 What is the difference between OLAP, ROLAP, MOLAP and HOLAP?
ROLAP stands for Relational Online Analytical Process that provides multidimensional analysis of data, stored in a Relational database(RDBMS).
MOLAP(Multidimensional OLAP), provides the analysis
of data stored in a multi-dimensional data cube.
HOLAP(Hybrid OLAP) a combination of both ROLAP and MOLAP can
provide multidimensional analysis simultaneously of data
stored in a multidimensional database and in a relational
DOLAP(Desktop OLAP or Database OLAP)provide multidimensional
analysis locally in the client machine on the data collected
from relational or multidimensional database servers.
11 what are the types of dimension tables
4.Slowly changing Dimensions.
12 What is a Data Cube?
Data cube is the logical representation of multidimensional data .The edge of the cube contains dimentions and the body of the cube contains datas.
13 How does data mining and data warehousing work together?
data warehousing is used to store the historical data.by using dwh bsiness users can analize thier business.data mining is used to predict the future .dwh will act as the source for the data mining
14 What is Bulk Insert?
In informatica ..bulk insert or bulk load does 2 things :-
1) Ignores the commit interval specified at the session level.
2) Do not create a database session log file.
So, advantage is it's very fast as no entry goes into log.
Disadvantage is session cannot be rolled back as no entry exists in the log file .
15 What is surrogate key?
A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data.There appear to be two definitions of a surrogate in the literature. We shall call these surrogate (1) and surrogate
(2):Surrogate (1) This definition is based on that given by Hall, Owlett and Todd (1976). Here a surrogate represents an entity in the outside world. The surrogate is internally generated by the system but is nevertheless visible by the user or application.
Surrogate (2) This definition is based on that given by Wieringa and de Jung (1991). Here a surrogate represents an object in the database itself. The surrogate is internally generated by the system and is invisible to the user or
16 What is a diff between joiner and lookup transformation?
Joiner will join the two different data sources based on a
join condition ,and pass only the rows which satisfy that
condition.discards the remaining rows.
Joiner transformation supports 4 types of joins at
Lookup transformation basically for Reference ,based on the
lookup condition .when u want some data based on target
data ,will take lookup on that particular table and
retrieve the corresponding fields from that table.
we can override the lookup transformation using the SQL
17 What is the difference between View and Materialized View ?
A view has a logical existence but a materialized view has
a physical existence .Moreover a materialized view can be
indexed , analisied and so on....that is all the things that
we can do with a table can also be done with a materialized
18 How could we generate the sequence of key values without using sequence generator transformation in the target ?
Do a lookup on the Target table with an Lookup SQl Override
Select MAX(FIELD_NAME), field 1 , field3 from target group
by field1, field2...
In the Expression increment the Max values of the field
which you just got from the lookup by 1.
Here MAX_FIELDNAME) is the Max value of the field you want
to generate the sequence of..
19 why we need to use unconnected transformation?
Unconencted is used when ever u want to call the same
transformation several times and u have one return port.
We use unconnected transformation to use multiple number
of tables or views without physically taking the entity
into mapping.This kind of transformation is also helpful
when single return port is required.
Use dynamic cache if u want to update the case while
updating the target table itself and static is untouched
with the cache.
20 Whats the difference between $, $$, $$$
$ - These are the system variables like $Bad file,$input
file, $output file, $DB connection
$$ - Can any one tell me the scenario with example for
user defined variables
$$$ - $$$SessStartTime
$$$SessStartTime returns the initial system date value on
the machine hosting the PowerCenter Server when the server
initializes a session. $$$SessStartTime returns the session
start time as a string value. The format of the string
depends on the database you are using.
21 What is Factless fact table ???
A Fact table without measures(numeric data) for a column is
called Factless Fact table.
FACT LESS FACT TABLES ARE USED TO CAPTURE DATE TRANSACTION
22 explain the scenario for bulk loading and the normal loading option in Informatica Work flow manager ???
Normal: In this case server manager allocates the
resources(Buffers) as per the parameter settings. It creates
the log files in database.
Bulk: In this case server manager allocates maximum
resources(Buffers) available irrespective of the parameter
settings. It will not create any log files in database.
In first case data loading process will be time taking
process but other applications are not affected. While in
bulk data loading will be much faster but other application
Normal Load: It loads the record one by one and writes log each file.It will take more time to complete.
Bulk Load: Load the number of records at a time ,It wont fallow ant log files or trace levels,It takes less time .
23 Why is meant by direct and indirect loading options in sessions?
we use file type direct when we are loading single file into target. we use Indirect when we want to load multiple files through single session in the mapping
24 What is the method of loading 5 flat files of having same structure to a single target and which transformations I can use?
This can be handled by using the file list in informatica.
If we have 5 files in different locations on the server and
we need to load in to single target table.In session
properties we need to change the file type as Indirect.
am taking a notepad and giving following paths and file
names in this notepad and saving this notepad as
emp_source.txt in the directory /ftp_data/webrep/
In session properties i give /ftp_data/webrep/ in the
directory path and file name as emp_source.txt and file type
25 On a day, I load 10 rows in my target and on next day if Iget 10 more rows to be added to my target out of which 5 are updated rows how can I send them to target? How can I insert and update the record?
the best way to do this use the slowy changing dimension in
the mappings-->wizzard-->slowly changing dimension-->type1
here u need to select the source and target tables.
take a look t/f and a update strategy t/f. basing on look
up , if the record exits in target then reject it ,if not
exit insert it,and if the record exist but it is changed
then update it
26 In real time scenario where can we use mapping parameters and variables?
Before using mapping parameters and mapping variables we
should declare these things in mapping tab of mapping
A mapping parameter cannot change untill the session has
completed unless a mapping variable can be changed in
between the session.
if we declare mapping parameter we can use that parameter
untill completing the session,but if we declare mapping
variable we can change in between sessions.Use mapping
variable in Transcation Control Transformation......
27 By using Filter Transformation,How to pass rows that doesnot satisfy the condition(discarded rows) to another target?
Connect the ports of the filter transformation to the
second target table and enable the 'FORWARD REJECTED ROWS'
in the properties of the filter transformation. the
rejected rows will be forwarded to this table.
Well You can Use Router iif you need rejected rows along
with satisfied rows otherwis you just give the condition
for filter tx as you want it in your target table.
28 A table contains some null values .how to get ( Not Applicable) in place of that null value in target ?
with the help of ISNULL() function of the Informatica
In the column properties sheet, write N/A in the Default
value text box for the particular column
29 WHAT IS FACT TABLES?
Fact table is the primary table in the dimensional
modeling . the numerical performance of measures of the
business stored in fact table
mostly used facts ara numeric and additive
not every numeric is a fact ,but a numeric wich ara of type
key performance indicator is called fact
30 WHAT IS UPDATE OVERRIDE . DIFFERENCE BETWEEN SQL OVERRIDE AND UPDATE OVERRIDE ?
Update Override it is an option available in TARGET instance .By defalut Target table is updated based on Primary key values .To update the Target table on non primary key values u can generate the default Querey and override the Querey according to the requiremnet.Suppose for example u want to update the record in target table When a column value='AAA' then u can include this condition in where clause of default Querey.
Coming to SQL override it is an option available in Source
Qualifier and Lookup transafornmation where u can inlude
joins ,filters,Group by and order by.
31 In a mapping i have three dimensions. If i want to pass a same surrogate key value to all the three dimensions by using one sequence generator is possible?If the mapping is containing single flow? And in the same case if the mapping is contaning 3 flows for the three dimensions then by using one sequence generator can we populate surrogate key (same value) to all the three dimensions?
Use the Sequence and Expresion transfermations.first
genarate the surrogate with Seq trans,then send values to
exp trans,connect the exp trans o/p ports to 3 dimentions.
First Seq generate the surrogate key like 1,2,3,4,5
we wil pass this column to next tran(Exp) from there we
will connect o/p port to dimentions.so '1' wil go all
dimentions,then '2' wil go then '3' .....
32 what r the transformations that r not involved in mapplet?
3.XML Source Qualifier transformations
7.Pre- and post- session stored procedures
33 what is data driven?
data driven is the instruction fed to informatica server
wheather to insert/update/delete row whenever using update
34 what are the transformations that are used in data cleansing ? and how data cleansing takes place ?
Expression tranformation is used in data cleansing.
If ur target table consist of not null column and source
table consisting null columns so assign some value in
expression tr. and then pass data to target.
35 TWO FLAT FILES ARE THERE , EACH HAVING NO MATCHING COLUMNS . HOW CAN U JOIN THESE TWO USING JOINER TRANSFORMATION ?
This can be done by passing all ports to an expression
transformation and then creating a output port say ID=1 in
both the expression transformation of each file and then
join it using a joiner on ID,hope this helps..
36 HOW TO GET THE LATEST DATA IN SCD ?
IN THREE WAYS WE CAN GET THE LATEST RECORD IN SCD ...
1)IN "SCD TYPE 2 TIME STAMP" FOR LATEST RECORD THE "END
DATE" FIELD WILL BE BLANK. THAT MEANS IT IS THE NEW ROW .
2)IN "SCD TYPE 2 FLAG " THE FLAG NUMBER OF THE NEW ROW WILL
BE ONE .
3) IN " SCD TYPE 2 VERSION " THE LATEST RECORD IS HAVING
THE MAXIMUM VERSION NUMBER
37 what is the function of 'F10' informatica ?
F10 and F5 are used in debugging process
By pressing F10, the process will move to the next
transformation from the current transformation and the
current data can be seen in the bottom panel of the window..
whereas F5 will process the full data at a stretch..in case
of F5, u can see the data in the targets at the end of the
process but cannot see intermediate transformation values.
38 two types of data are there . one is mainframe and the other is ascii format . in informatica how can you get both the data in a single format in ascii .
39 WHAT IS UPDATE OVERRIDE . DIFFERENCE BETWEEN SQL OVERRIDE AND UPDATE OVERRIDE ?
Coming to SQL override it is an option available in Source
Qualifier and Lookup transafornmation where u can inlude
joins ,filters,Group by and order by.
Update Override it is an option available in TARGET
instance. By defalut Target table is updated based on
Primary key values.To update the Target table on non
primary key values u can generate the default Querey and
override the Querey according to the requiremnet.Suppose
for example u want to update the record in target table
When a column value='AAA' then u can include this condition
in where clause of default Querey.
40 WHAT IS THE NAME OF THAT PORT IN DYNAMIC CACHE WHICH IS USED FOR INSERT , UPDATE OPRATION ?
New lookup now
41 what are the reusable tasks in informatica ?
42 how the server recognises , if the session fails after loading the 100 records in to the target
Based on commit intervel session commits those many records
into target. suppose if commit intervel is 1000 if session
fails after 100 records it won't insert not a single record
43 How do you take care of security using a repository manager
FOLDER PERMISSION (OWNERS,GROUPS,USERS)
LOCKS (READ,WRITE,EXECUTE.FETCH ,SAVE)
44 if the session fails after 100 records agian we have to starts the session or we go for recovery session
informatcia server has 3 methods to recovering the
1)Run the session again if the Informatica Server has not
issued a commit.
2)Truncate the target tables and run the session again if
the session is not recoverable.
3)Consider performing recovery if the Informatica Server
has issued at least one commit.
Use performing recovery to load the records from where the
45 what r the values tht r passed between informatics server and stored procedure?
There are 3 types of data passing between informatica
server and stored proceduer these are
Input/Output parametors: Stored procedure it receive the
inputs and porvied the outputs.
Return Value:Ever data base to provied return value after
processing of stored procedure.
Status code: It is used for error handling.
46 why do u use shortcuts in informatica.?
Shortcut is a concept of reusability.If there is a mapping
that can be reused across several folders, create it in one
folder and use shortcuts of it in other folders. Thus, if
you have to make change, you can do it in main mapping
which reflects in shortcut mappings automatically.
47 scd methodology?
Type1: No historical datas will be available, when changes
are made the old data will be deleted and the new data will
Type2: Flag:The old data will be denoted as false and the
new data will be denoted as true.
Version:The changes made will be numbered as 0,1,2...so on.
Date:The changes along with the date in which they are made
are clearly mentioned.
Type3:The latest change which is made is alone available.
48 What is the filename which you need to configure in UNIX while installing infromatica?
in informatica 7, under $PMRootDir there is one utility
(script) called pmconfig exist, through it we can configure the inforamtica
49 what is confirmed dimension?
Conformed dimension is a dimension which is connected to or
shared by more than one fact table.
Eg:A business which takes care of both sales and orders of
products then product dimension becomes a conformed
dimension for both sales fact and order fact
50 surrogate keys usage in Oracle and Informatica?
surrogate key is one type of key which is used to maintain
it is used in slowly changing dimension (scd)
If u have mulitiple records then for mentain this records we
need to generate surrogate key in informatica
51 What happens if you increase commit intervals and also decrease commit Explain grouped cross tab?
if you have increased your commit interval to ~25000 rows
the session will run faster but when your session fails at
24000th record you will not have any data in your target.
When you decrease your commit interval to ~10000 rows your
session will be slow when compared to previous but if the
session fails at 24000th record you will lose only 4000
if commit interval is set to high value, performance will
be high. if commit is given for every 1000 rows say for eg,
it will affect the performance badly
52 What is an MDDB? What is the difference between MDDBs and RDBMSs?
MDDB stands for Multi Dimensional Database
MDDB: In MDDB, it views data in mutidimensional
(Perspective)i.e.,through various dimensions at a time with
the help of cubes developed using dimensions and stores
data in multidimensional i.e., stores data in power cubes.
In power cubes, each axis is a dimension and each member of
dimension is column.
In MDDB, at a glance we can see the dimensions and data
present in the dimensions
RDDB: In RDDB, it views data in two dimensional and stores
data in two dimensional i.e., in rows and columns in a
In RDDB, we can just see the rows and columns, but only
after issuing select over that u can see the data.