Tuesday, February 19, 2008

DataWarehousing FAQ

1 DataWarehousing simple what is a Data Warehouse? A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.
19 DataWarehousing Complex What is the conventional definition of a DWH? Explain each term. Subject-oriented, Integrated, Time-variant, Non-volatile.
17 DataWarehousing Medium Draw the architecture of a Datawarehousing system.
35 What are the goals of the Data warehouse?
18 DataWarehousing Medium What are the approaches in constructing a Datawarehouse and the datamart? Top-down and Bottom-up approaches.
2 DataWarehousing simple Data Mart A data structure that is optimized for access. It is designed to facilitate end-user analysis of data. It typically supports a single, analytic application used by a distinct set of workers.
25 Datawarehousing Medium Can a datamart be independent? Yes.
30 Datawarehousing simple What are the sources for a datawarehouse? OLTP system, Legacy system, Files, Other sources.
14 DataWarehousing simple What the difference is between a database, a data warehouse and a data mart? " A database is an organized collection of information.
-- A data warehouse is A very large database with special sets of tools to extract and cleanse data from operational systems
and to analyze data.
-- A data mart is A focused subset of A data warehouse that deals with A single area of data and is organized for quick
analysis."
4 DataWarehousing simple OLAP (On-Line Analytical Processing) A method by which multidimensional analysis occurs.
5 DataWarehousing average/simple what do you mean by Multidimensional Analysis? The ability to manipulate information by a variety of relevant categories or “dimensions” to facilitate analysis and understanding of the underlying data. It is also sometimes referred to as “drilling-down”, “drilling-across” and “slicing and dicing”
42 What is the difference between OLAP, ROLAP, MOLAP ,DOLAP?
40 Difference between OLAP & OLTP?
20 OLAP Medium What are the different types of OLAP? Give an eg. ROLAP eg.BO, MOLAP eg.Cognos, HOLAP, DOLAP
24 Data modelling Medium Which is the suitable data model for a datawarehouse? Why? MultiDimensional Model. optimized for data warehouse, data mart and online analytical processing (OLAP) applications. The main advantage of this database is query performance.
7 DataWarehousing simple Star Schema A means of aggregating data based on a set of known dimensions. It stores data multidimensionally in a two dimensional Relational Database Management System (RDBMS), such as Oracle.
13 DataWarehousing average/simple What are the benefits of STAR SCHEMA? "Because it’s denormalized structure , i.e., Dimension Tables are denormalized. Why to denormalize means the first (and often only) answer is : speed. OLTP structure is designed for data inserts, updates, and deletes, but not data retrieval. Therefore, we can often squeeze some speed out of it by denormalizing some of the tables and having queries go against fewer tables.
These queries are faster because they perform fewer joins to retrieve the same recordset. Joins are also confusing to many End users. By denormalizing, we can present the user with a view of the data that is far easier for them to understand.

Benefits of STAR SCHEMA:
· Far fewer Tables.
· Designed for analysis across time.
· Simplifies joins.
· Less database space.
· Supports “drilling” in reports.
· Flexibility to meet business and technical needs.
"
16 DataWarehousing complex What are Additive Facts? Or what is meant by Additive Fact? The fact tables are mostly very huge and almost never fetch a single record into our answer set. We fetch a very large number of records on which we then do, adding, counting, averaging, or taking the min or max. The most common of them is adding. Applications are simpler if they store facts in an additive format as often as possible. Thus, in the grocery example, we don’t need to store the unit price. We compute the unit price by dividing the dollar sales by the unit sales whenever necessary.
8 DataWarehousing simple Snowflake Schema An extension of the star schema by means of applying additional dimensions to the dimensions of a star schema in a relational environment.
32 Datawarehousing Complex What is Galaxy schema? A set of fact tables with some mutual dimension tables is called galaxy schema.
36 What is Dimension & Fact ?
41 Different types of Dimensions Confirmed, Degenerate, Junk, Mini, Multivalued, Too-Many, Dirty customer, Demographic mini dimensions
23 Normalization Twisted Are the dimensional tables normalized? If so when? Yes. In snowflake schema, they are normalized.
37 What is Transaction fact table & Centipede Fact table?
38 Different types of Facts? Additive, semi-additive, non-additive
48 What are the types of Factless fact tables? Event tracking & coverage
43 What is Granularity? "Granularity
The level of detail of the facts stored in a data warehouse."
22 Normalization simple Is the Fact table normalized? Yes
12 DataWarehousing average/simple Can 2 Fact Tables share same dimensions Tables? Yes
21 Project-related Complex Give egs. of the fact, dimensional tables, datamarts/DWH used in your project. Explain what data each contains.
31 Datawarehousing Medium What are fact constellations? Multiple fact tables sharing the dimension tables.
15 DataWarehousing simple What is a Fact less fact table ? "A factless fact table captures the many-to-many relationships between dimensions, but contains no numeric or textual facts. They are often used to record events or coverage information.
( A fact which does not have any measures )
Common examples of factless fact tables include:
o Identifying product promotion events (to determine promoted products that didn’t sell)
o Tracking student attendance or registration events
o Tracking insurance-related accident events
o Identifying building, facility, and equipment schedules for a hospital or University.

"
27 Datawarehousing simple What is metadata? Data that describes data and other structures, such as objects, business rules, and processes.
9 DataWarehousing simple What is data quality? Data quality (information quality) is defined as standardizing and consolidating customer and/or business data. By cleansing/enhancing the data and combining related records to avoid duplicate entries, you’re able to create a single record view. Within Informatica, this all takes place prior to the initial load to the target database as well as during the on-going data maintenance and updating processes.
28 ETL simple How do you achieve data quality? By Cleansing - The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.
46 What is Parsing & Data Mining.
29 Datawarehousing Medium What are surrogate keys? System generated, artificial primary keys in alternative to the natural keys like SSN etc.
26 Datawarehousing simple Name a few data modelling tools. Erwin, Embarcedaro, Rational Rose, Oracle Designer.
33 Datawarehousing Medium Materialized views? Views storing pre-computed results are called materialized views.
34 Datawarehousing Medium Can you insert into materialized views? No.
44 Definition of Adhoc Queries?
45 What is ODS (Operational Data Store), DSS (Decision support System), Data Staging Area, Data Presentation Area.
47 What is Market-Basket analysis?
49 DataWarehousing average/simple SCD Types "Type 1 : Keep Most recent Values in Target
Type 2 : Keep a full history of changes in the target
Type 3 : Keep the current and previous values in the target.
"
6 DataWarehousing simple what is a Hypercube? A means of visually representing multidimensional data.

11 DataWarehousing complex/average Explain the performance improvement techniques in DW? Partition, aggregation, indexing..
Explain slice and dice ? To slice and dice is to break a body of information down into smaller parts or to examine it from different viewpoints so that you can understand it better. This term can be compared to drill down, which is the process of dividing an information area up into finer and finer layers in a hierarchy, but with the purpose of narrowing in to one small area or item.

No comments: