Saturday, January 12, 2008

Frequently asked questions on Datawarehousing

What are the Different methods of loading Dimension tables?

ConventionalLoad: Before loading the data, all the Table constraints will be checked against the data.

Direct load: (Faster Loading) All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won’t be indexed.

What is conformed fact?

Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly

What are Data Marts?

Data Marts are designed to help manager make strategic decisions about their business.
Data Marts are subset of the corporate-wide data that is of value to a specific group of users.
There are two types of Data Marts:

1.Independent data marts – sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area.
2.Dependent data mart – sources directly form enterprise data warehouses.

What is a level of Granularity of a fact table?

Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail you are willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it up to minute and put that data.

How are the Dimension tables designed?

Most dimension tables are designed using Normalization principles up to 2NF. In some instances they are further normalized to 3NF. Find where data for this dimension are located.
Figure out how to extract this data. Determine how to maintain changes to this dimension (see more on this in the next section).

What are non-additive facts?

Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

What type of Indexing mechanism do we need to use for a typical datawarehouse?

On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes. To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.

Why are OLTP database designs not generally a good idea for a Data Warehouse?

Since in OLTP,tables are normalised and hence query response will be slow for end user and OLTP doesnot contain years of data and hence cannot be analysed.

What is BUS Schema?

BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts.

What are the various Reporting tools in the Market?

1. MS-Excel
2. Business Objects (Crystal Reports)
3. Cognos (Impromptu, Power Play)
4. Microstrategy
5. MS reporting services
6. Informatica Power Analyzer
7. Actuate
8. Hyperion (BRIO)
9. Oracle Express OLAP
10. Proclarity

What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?

1.Normalization is process for assigning attributes to entities–Reducesdata redundancies–Helps eliminate data anomalies–Produces controlledredundancies to link tables

2.Normalization is the analysis of functional dependency between attributes / data items of user views?It reduces a complex user view to a set of small and stable subgroups of fields / relations

1NF: Repeating groups must be eliminated, Dependencies can be identified, All key attributes defined, No repeating groups in table

2NF: The Table is already in1NF,Includes no partial dependencies–No attribute dependent on a portionof primary key, Still possible to exhibit transitive dependency, Attributes may be functionally dependent on non-key attributes

3NF: The Table is already in 2NF, Contains no transitive dependencies.

What is Fact table?

Fact Table contains the measurements or metrics or facts of business process. If your business process is “Sales” , then a measurement of this business process such as “monthly sales number” is captured in the Fact table. Fact table also contains the foreign keys for the dimension tables.

What are conformed dimensions?

Answer1:
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined Ex: Date Dimensions is connected all facts like Sales facts, Inventory facts..etc

Answer2:
Conformed dimensions are dimensions which are common to the cubes.(cubes are the schemas contains facts and dimension tables)
Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and Dimensions here D1,D2 are the Conformed Dimensions

Explain why and where do we exactly use the lookup transformations.

You can use the Lookup transformation to perform many tasks, including:

o Get a related value. For example, your source includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read.

o Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales).

o Update slowly changing dimension tables. You can use a Lookup transformation to determine whether rows already exist in the target.

How do you tell aggregator stage that input data is already sorted?

By enabling sorted input property in Aggregator Properties

What are push and pull ETL strategies?

Push and Pull strategies determine how data comes from source system to ETL server.

Push: In this case the Source system pushes data i.e.(sends data) to the ETL server.

Pull: In this case the ETL server pulls data i.e.(gets data) from the source system.

What is the Difference between a ODS and Staging Area?

ODS:- Operational Data Store which contains data .
ODS comes after the staging area
e.g.:-
In our e.g. lets consider that we have day level Granularity in the OLTP & Year level Granularity in the Data warehouse.
If the business (manager) asks for week level Granularity then we have to go to the oltp and summarize the day level to the week level which would be pain taking. So what we do is that we maintain week level Granularity in the ods for the data, for about 30 to 90 days.

Note : Ods information would contain cleansed data only. ie after staging area

Staging Area :-
It comes after the etl has finished.Staging Area consists of
1.Meta Data .
2.The work area where we apply our complex business rules.
3.Hold the data and do calculations.
In other words we can say that its a temp work area.

How you capture changes in data if the source system does not have option of storing date/time field in source table from where you need to extract the data?

The DW database can be Oracle or Teradata. The requirement here is to pull data from source system and ETL need to device a mechanism to identify the changes or new records. The source system can be a legacy system like AS400 application or Mainframe application. List out all such methods of data capture. The ETL can be Informatica, data stage or custom etl code.

If LKP on target table is taken, can we update the rows without update strategy transformation?

Yes, by using dynamic lookup

In what scenario ETL coding is preferred than Database level SQL, PL/SQL coding?

Data scrubbing process is difficult. That is, file contains date column like 20070823 but data warehouse requires date as 08/23/2007 in that case it is difficult.

What is snapshot?

You can disconnect the report from the catalog to which it is attached by saving the report with a snapshot of the data. However, you must reconnect to the catalog if you want to refresh the data.

What is the difference between data warehouse and BI?

Simply speaking, BI is the capability of analyzing the data of a datawarehouse in advantage of that business. A BI tool analyzes the data of a data warehouse and to come into some business decision depending on the result of the analysis.

What are non-additive facts in detail?

A fact may be measure, metric or a dollar value. Measure and metric are non additive facts.

Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount.

A non additive fact, for eg measure height(s) for ‘citizens by geographical location’ , when we rollup ‘city’ data to ’state’ level data we should not add heights of the citizens rather we may want to use it to derive ‘count’

What is the difference between Datawarehousing and Business Intelligence?

Datawarehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term “business intelligence” is used to encompass OLAP, data visualization, data mining and query/reporting tools. Think of the data warehouse as the back office and business intelligence as the entire business including the back office. The business needs the back office on which to function, but the back office without a business to support, makes no sense.

What is the difference between OLAP and datawarehouse?

Datawarehouse is the place where the data is stored for analyzing
where as OLAP is the process of analyzing the data, managing aggregations,
partitioning information into cubes for in-depth visualization.

What is fact less fact table? Where you have used it in your project?

Fact less table means only the key available in the Fact there is no measures available

Why De-normalization is promoted in Universe Designing?

In a relational data model, for normalization purposes, some lookup tables are not merged as a single table. In a dimensional data modeling (star schema), these tables would be merged as a single table called DIMENSION table for performance and slicing data. Due to this merging of tables into one large Dimension table, it comes out of complex intermediate joins. Dimension tables are directly joined to Fact tables. Though, redundancy of data occurs in DIMENSION table, size of DIMENSION table is 15% only when compared to FACT table. So only De-normalization is promoted in Universe Designing.

What is the difference between ODS and OLTP?

ODS:- It is nothing but a collection of tables created in the Datawarehouse that maintains only current data

Where as OLTP maintains the data only for transactions, these are designed for recording daily operations and transactions of a business

What is the difference between datawarehouse and BI?

Simply speaking, BI is the capability of analyzing the data of a datawarehouse in advantage of that business. A BI tool analyzes the data of a datawarehouse and to come into some business decision depending on the result of the analysis.

Can a dimension table contains numeric values?

Yes. But those data type will be char (only the values can numeric/char)

What is the difference between view and materialized view?

View - store the SQL statement in the database and let you use it as a table. Every time you access the view, the SQL statement executes.

Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that every time you run the query, the stored result set is used. Pros include quick query results.

What is meant by metadata in context of a Datawarehouse and how it is important?

Meta data is the data about data; Business Analyst or data modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existence, valid values etc) and behavior of data (how it is modified / derived and the life cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.

Differences between star and snowflake schemas?

Star schema
A single fact table with N number of Dimension
Snowflake schema
Any dimensions with extended dimensions are know as snowflake schema

Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true?

Star schema contains the dimension tables mapped around one or more fact tables.
It is a de-normalized model.
No need to use complicated joins.
Queries results fastly.
Snowflake schema
It is the normalized form of Star schema.
Contains in-depth joins because the tables are splitted into many pieces. We can easily do modification directly in the tables.
We have to use complicated joins, since we have more tables.
There will be some delay in processing the Query.

What is VLDB?

The perception of what constitutes a VLDB continues to grow. A one terabyte database would normally be considered to be a VLDB.

What’s the data types present in bo? What happens if we implement view in the designer n report?

Three different data types: Dimensions, Measure and Detail.
View is nothing but an alias and it can be used to resolve the loops in the universe.

Can a dimension able contain numeric values?

Yes. But those datatype will be char (only the values can numeric/char)

What is the difference between view and materialized view?

View - store the SQL statement in the database and let you use it as a table. Every time you access the view, the SQL statement executes.

Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that every time you run the query, the stored result set is used. Pros include quick query results.

What is aggregate table and aggregate fact table? Any examples of both?

Aggregate table contains summarized data. The materialized views are aggregated tables.

For ex, in sales we have only date transaction. if we want to create a report like sales by product per year. in such cases we aggregate the date values into week_agg, month_agg, quarter_agg, year_agg. To retrieve data from these tables we use @aggregate function.

What is active data warehousing?

An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships nimbly, efficiently and proactively. Active data warehousing is all about integrating advanced decision support with day-to-day-even minute-to-minute-decision making in a way that increases quality of those customer touches which encourages customer loyalty and thus secure an organization’s bottom line. The marketplace is coming of age as we progress from first-generation “passive” decision-support systems to current- and next-generation “active” data warehouse implementations

What is the main difference between schema in RDBMS and schemas in Datawarehouse?

RDBMS Schema
* Used for OLTP systems
* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* Cannot solve extract and complex problems
* Poorly modeled

DWH Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model

What is hybrid slowly changing dimension?

Hybrid SCDs are combination of both SCD 1 and SCD 2.
It may happen that in a table, some columns are important and we need to track changes for them i.e. capture the historical data for them whereas in some columns even if the data changes, we don’t care. For such tables we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.

What are the different architectures of datawarehouse?

There are two main things
1. Top down - (bill Inmon)
2.Bottom up - (Ralph kimbol)

1. What is incremental loading?
2. What is batch processing?
3. What is crass reference table?
4. What is aggregate fact table?

Incremental loading means loading the ongoing changes in the OLTP.
Aggregate table contains the [measure] values, aggregated /grouped/summed up to some level of hierarchy.

What is junk dimension? What is the difference between junk dimension and degenerated dimension?

Junk dimension: Grouping of Random flags and text attributes in a dimension and moving them to a separate sub dimension.

Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table in order eliminate unnecessary joins while retrieving order information..

What are the possible data marts in Retail sales?

Product information, sales information

What is the definition of normalized and de-normalized view and what are the differences between them?

Normalization is the process of removing redundancies.
De-normalization is the process of allowing redundancies.

What is meant by metadata in context of a Datawarehouse and how it is important?

Meta data is the data about data; Business Analyst or data modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existence, valid values etc) and behavior of data (how it is modified / derived and the life cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.

Differences between star and snowflake schemas?

Star schema
A single fact table with N number of Dimension
Snowflake schema
Any dimensions with extended dimensions are know as snowflake schema

What is the datatype of the surrogate key?

Datatype of the surrogate key is either integer or numeric or number

What is degenerate dimension table?

Degenerate Dimensions: If a table contains the values, which r neither dimension nor measures is called degenerate dimensions. Ex: invoice id, empno

What is Dimensional Modeling?

Dimensional Modeling is a design concept used by many data warehouse designers to build their datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated.

What are the methodologies of Datawarehousing?

Every company has methodology of their own. But to name a few SDLC Methodology, AIM methodology are standard ones other methodologies are AMM, World class methodology and many more.

What is a linked cube?

Linked cube in which a sub-set of the data can be analyzed into great detail. The linking ensures that the data in the cubes remain consistent.

What is the main difference between Inmon and Kimball philosophies of data warehousing?

Both differed in the concept of building the datawarehouse.
According to Kimball, Kimball views data warehousing as a constituency of Data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level.
Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.

Kimball–First Datamarts–Combined way —Datawarehouse

Inmon—First Datawarehouse–Later—-Datamarts

What is Datawarehousing Hierarchy?

Hierarchies
Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.

Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies–one for product categories and one for product suppliers.

Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.

When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization.

Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.

Levels
A level represents a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels. Levels range from general to specific, with the root level as the highest or most general level. The levels in a dimension are organized into one or more hierarchies.

Level Relationships: Level relationships specify top-to-bottom ordering of levels from most general (the root) to most specific information. They define the parent-child relationship between the levels in a hierarchy.

Hierarchies are also essential components in enabling more complex rewrites. For example, the database can aggregate existing sales revenue on a quarterly base to a yearly aggregation when the dimensional dependencies between quarter and year are known.

What is a general purpose scheduling tool?

The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from Source To Target at specific time or based on some condition.

What is ER Diagram?

The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views.

Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represent data objects.

Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is:

it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user.

In addition, the model can be used as a design plan by the database developer to implement a data model in specific database management software.

Which columns go to the fact table and which columns go the dimension table?

The Primary Key columns of the Tables (Entities) go to the Dimension Tables as Foreign Keys.
The Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign Keys.

What are modeling tools available in the Market?

Here are a number of data modeling tools

Tool Name Company Name
Erwin Computer Associates
Embarcadero Technologies
Rational Rose IBM Corporation
Power Designer Sybase Corporation
Oracle Designer Oracle Corporation

Name some of modeling tools available in the Market?

These tools are used for Data/dimension modeling

1. Oracle Designer
2. Erwin (Entity Relationship for windows)
3. Informatica (Cubes/Dimensions)
4. Embarcadero
5. Power Designer Sybase

How do you load the time dimension?

Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. It is not unusual for 100 years to be represented in a time dimension, with one row per day.

Explain the advantages of RAID 1, 1/0, and 5? What type of RAID setup would you put your TX logs?

Transaction logs write sequentially and don’t need to be read at all. The ideal is to have each on RAID 1/0 because it has much better write performance than RAID 5.

RAID 1 is also better for TX logs and costs less than 1/0 to implement. It has a tad less reliability and performance is a little worse generally speaking.

RAID 5 is best for data generally because of cost and the fact it provides great read capability.

What Snow Flake Schema?

Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table.

What is real time data-warehousing?

Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it.

Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.

What are slowly changing dimensions?

SCD stands for slowly changing dimensions. Slowly changing dimensions are of three types

SCD1: only maintained updated values. Ex: a customer address modified we update existing record with new address.

SCD2: maintaining historical information and current information by using
A) Effective Date
B) Versions
C) Flags
or combination of these
scd3: by adding new columns to target table we maintain historical information and current information

What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?

Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive.

EX: Average daily balance

A fact table without numeric fact columns is called factless fact table.

Ex: Promotion Facts

While maintain the promotion values of the transaction (ex: product samples) because this table doesn’t contain any measures.

Differences between star and snowflake schemas?

Star schema - all dimensions will be linked directly with a fat table.
Snow schema - dimensions maybe interlinked or may have one-to-many relationship with other tables.

What is a Star Schema?

Star schema is a type of organizing the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment. Usually a star schema consists of one or more dimension tables around a fact table which looks like a star, so that it got its name.

What is ETL?

ETL stands for extraction, transformation and loading.
ETL provide developers with an interface for designing source-to-target mappings, transformation and job control parameter.
- Extraction
Take data from an external source and move it to the warehouse pre-processor database.
- Transformation
Transform data task allows point-to-point generating, modifying and transforming data.
- Loading
Load data task adds records to a database table in a warehouse.

What does level of Granularity of a fact table signify?

Granularity
The first step in designing a fact table is to determine the granularity of the fact table. By granularity, we mean the lowest level of information that will be stored in the fact table. This constitutes two steps:

Determine which dimensions will be included.
Determine where along the hierarchy of each dimension the information will be kept.
The determining factors usually go back to the requirements

What is the Difference between OLTP and OLAP?

Main Differences between OLTP and OLAP are:-

1. User and System Orientation

OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals.

OLAP: market-oriented, used for data analysis by knowledge workers (managers, executives, analysis).

2. Data Contents

OLTP: manages current data, very detail-oriented.

OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process.

3. Database Design

OLTP: adopts an entity relationship(ER) model and an application-oriented database design.

OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design.

4. View

OLTP: focuses on the current data within an enterprise or department.

OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores

What are SCD1, SCD2, and SCD3?

SCD stands for slowly changing dimensions.

SCD1: only maintained updated values.

Ex: a customer address modified we update existing record with new address.

SCD2: maintaining historical information and current information by using

A) Effective Date
B) Versions
C) Flags

or combination of these

SCD3: by adding new columns to target table we maintain historical information and current information.

What are Aggregate tables?

Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions. Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance. To avoid this we can aggregate the table to certain required level and can use it. This tables reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.

What is Dimensional Modeling? Why is it important?

Dimensional Modeling is a design concept used by many data warehouse designers to build their datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated.

Why is Data Modeling Important?

Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? A common response by practitioners who write on the subject is that you should no more build a database without a model than you should build a house without blueprints.
The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language, it can be reviewed and verified as correct by the end-users.
The data model is also detailed enough to be used by the database developers to use as a “blueprint” for building the physical database. The information contained in the data model will be used to define the relational tables, primary and foreign keys, stored procedures, and triggers. A poorly designed database will require more time in the long-term. Without careful planning you may create a database that omits data required to create critical reports, produces results that are incorrect or inconsistent, and is unable to accommodate changes in the user’s requirements.

What is data mining?

Data mining is a process of extracting hidden trends within a datawarehouse. For example an insurance datawarehouse can be used to mine data for the most high risk people to insure in a certain geographical area.

What’s a Datawarehouse?

Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated….This makes it much easier and more efficient to run queries over data that originally came from different sources. Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are designed differently than traditional relational databases.

What is ODS?

1. ODS means Operational Data Store.
2. A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases.

What is a dimension table?

A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it contains only the textual attributes.

What is a lookup table?

A lookup table is the one which is used when updating a warehouse. When the lookup is placed on the target table (fact table / warehouse) based upon the primary key of the target, it just updates the table by allowing only new records or updated records based on the lookup condition.

Why should you put your data warehouse on a different system than your OLTP system?

Answer1:
AN OLTP system is basically” data oriented” (ER model) and not” Subject oriented “(Dimensional Model) .That is why we design a separate system that will have a subject oriented OLAP system. Moreover if a complex query is fired on a OLTP system will cause a heavy overhead on the OLTP server that will affect the day-to-day business directly.

Answer2:
The loading of a warehouse will likely consume a lot of machine resources. Additionally, users may create queries or reports that are very resource intensive because of the potentially large amount of data available. Such loads and resource needs will conflict with the needs of the OLTP systems for resources and will negatively impact those production systems.

No comments: