Total Pageviews

Sunday, April 8, 2012

DW KB

Datawarehouse Basics

Datawarehousing Basics Blogspot

Data Warehousing Interview Questions and Answers:

What is Data Warehousing?

A data warehouse is the main repository of an organization’s historical data, its corporate memory. It contains the raw material for management’s decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems (Ref:Wikipedia). Data warehousing collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. It is a repository of integrated information, available for queries and analysis.

Data Warehouse: A subject-oriented, integrated, time-variant, nonupdatable (non volatile)
collection of data used in support of management decision-making processes
  1. Subject-oriented: e.g. customers, patients, students, products
  2. Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources.
  3. Time-variant: Can study trends and changes
  4. Non-updatable (non volatile): Read-only, periodically refreshed.
Datawarehousing Concepts

What are fundamental stages of Data Warehousing?

Offline Operational Databases – Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system’s performance.

Offline Data Warehouse – Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure.

Real Time Data Warehouse – Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)

Integrated Data Warehouse – Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.

What is Dimensional Modeling?

Dimensional data model concept involves two types of tables and it is different from the 3rd normal form. This concepts uses Facts table which contains the measurements of the business and Dimension table which contains the context(dimension of calculation) of the measurements.

http://www.dcs.bbk.ac.uk/~mark/download/star.pdf

What is Fact Table?

Fact table contains measurements of business processes also fact table contains the foreign keys for the dimension tables. For example, if your business process is “paper production” then “average production of paper by one machine” or “weekly production of paper” would be considered as measurement of business process.

Measure Types 
  1. Additive - Measures that can be added across any dimensions. 
  2. Non Additive - Measures that cannot be added across any dimension.
  3. Semi Additive - Measures that can be added across some dimensions.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables).
Special care must be taken when handling ratios and percentage. One good design rule[1] is to never store percentages or ratios in fact tables but only calculate these in the data access tool. Thus only store the numerator and denominator in the fact table, which then can be aggregated and the aggregated stored values can then be used for calculating the ratio or percentage in the data access tool.
In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called "factless fact tables", or "junction tables".
The "Factless fact tables" can for example be used for modeling many-to-many relationships or capture events.

Types of fact tables
There are basically three fundamental measurement events, which characterizes all fact tables.[2]
  • Transactional
      A transaction fact table or transaction fact entity records one row per transaction.
A transactional table is the most basic and fundamental. The grain associated with a transactional fact table is usually specified as "one row per line in a transaction", e.g., every line on a receipt. Typically a transactional fact table holds data of the most detailed level, causing it to have a great number of dimensions associated with it.
  • Periodic snapshots
periodic fact table or periodic fact entity stores one row for a group of transactions that happen over a period of time.
The periodic snapshot, as the name implies, takes a "picture of the moment", where the moment could be any defined period of time, e.g. a performance summary of a salesman over the previous month. A periodic snapshot table is dependent on the transactional table, as it needs the detailed data held in the transactional fact table in order to deliver the chosen performance output.
  • Accumulating snapshots
An accumulating fact table or accumulating fact entity stores one row for the entire lifetime  stores one row for the entire lifetime of an event. An example of an accumulating fact table or entity records the lifetime of a credit card application from the time it is sent to the time it is accepted.
This type of fact table is used to show the activity of a process that has a well-defined beginning and end, e.g., the processing of an order. An order moves through specific steps until it is fully processed. As steps towards fulfilling the order are completed, the associated row in the fact table is updated. An accumulating snapshot table often has multiple date columns, each representing a milestone in the process. Therefore, it's important to have an entry in the associated date dimension that represents an unknown date, as many of the milestone dates are unknown at the time of the creation of the row.

Temporal snapshots
By applying Temporal Database theory and modelling techniques the Temporal Snapshot Fact Table [3] allows to have the equivalent of daily snapshots without really having daily snapshots. It introduces the concept of Time Intervals into a fact table, allowing to save a lot of space, optimizing performances while allowing the end user to have the logical equivalent of the "picture of the moment" he is interested in.

References[edit]

  1. Jump up to:a b c Kimball & Ross - The Data Warehouse Toolkit, 2nd Ed [Wiley 2002]
  2. Jump up^ Kimball, Ralph (2008). The Data Warehouse Lifecycle Toolkit, 2. edition. Wiley. ISBN 978-0-470-14977-5.
  3. Jump up^ Davide, Mauri. "Temporal Snapshot Fact Table"
    .

The facts inside the fact table could be of several different types

1) Additive facts

-Additive facts are facts that can be summed up through all of the dimensions in the fact table.
-facts that can be added across all of the dimensions in the fact table, and are the most common type of fact.
-Also be called fully additive facts. They are identified here because these facts would be used across several dimensions for summation purposes.
-It is important to understand that since dimensional modeling involves hierarchies in dimensions, aggregation of information over different members in the hierarchy is a key element in the usefulness of the model.
-Since aggregation is an additive process, it is good if we have additive facts.

2) Semi-additive facts

-Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
-These are facts that can be added across some dimensions but not all.
-They are also sometimes referred to as partially-additive facts.

3) Non-additive facts

-Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
-Non-additive Facts that cannot be added for any of the dimensions. That is, they cannot be logically added between records or fact rows.
-Facts are usually the result of ratios or other mathematical calculations. The only calculation that can be made for such a fact is to get a count of the number of rows of such facts.

Based on the above classifications, there are two types of fact tables:

A)    Cumulative: This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table.

B)    Snapshot: This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.

4) Derived facts
-Derived fact is shown as a kind of fact, along with base fact.
-A derived fact is created by an inference or a mathematical calculation from terms, facts, other derivations, or even action assertions.
-A base fact is a fact that is a given in the world and is remembered (stored) in the system.
-Derived facts are created by performing a mathematical calculation on a number of other facts, and are sometimes referred to as calculated facts.
-Derived facts may or may not be stored inside the fact table.
-Base fact is generally defined, when a fact can be picked-up from the source system without any derivation. The derived fact is a data field which is derived from the base fact

5) Textual facts
-Textual fact consists of one or more characters (codes). They should be strictly avoided in the fact table. Textual codes such as flags and indicators should be stored in dimension tables so they can be included in queries.
- Textual facts are non-additive, but could be used for counting.


6) Pseudo fact
-Pseudo fact when summed, a pseudo fact gives no valid result. They typically result when you design event-based fact tables.
-In SQL Server 2005, you cannot perform a non-logged load directly into a partition. However, you can load into a separate table that we will call the pseudo-partition. Under certain conditions, you can switch the pseudo-partition into the partitioned table as a metadata operation that occurs extremely quickly.

7) Factless fact
-A fact table with only foreign keys and no facts is called a factless fact table.
-Is a table which is similar to Fact Table except for having any measure; I mean that this table just has the links to the dimensions.

-Use of Factless fact table is these tables enable you to track events; indeed they are for recording events.

In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called “Factless Fact tables”.
Eg: A fact table which has only product key and date key is a factless fact. There are no measures in this table. But still you can get the number products sold over a period of time.

Steps in designing a fact table.
  1. Identify a business process for analysis (like sales).
  2. Identify measures or facts (sales dollar), by asking questions like 'What number of XX are relevant for the business process?', replacing the XX with various options that make sense within the context of the business.
  3. Identify dimensions for facts (product dimension, location dimension, time dimension, organization dimension), by asking questions that make sense within the context of the business, like 'Analyse by XX', where XX is replaced with the subject to test.
  4. List the columns that describe each dimension (region name, branch name, business unit name).
  5. Determine the lowest level (granularity) of summary in a fact table (e.g. sales dollars).
An alternative approach is the four-step design process described in Kimball.[1]

Types of Fact Tables
Different types of facts inside fact
Kimball DWT 1
Fact Tables
Fact table core concepts
Different types of dimensions and facts in data warehouse
FC DM fact tables
Fact Table
What are different kind of fact tables
Wiki fact table
https://community.informatica.com/thread/34717
Incremental loading for fact tables
DWH Dimensional Modeling

What is Dimension table?

Dimensional table contains textual attributes of measurements stored in the facts tables. Dimensional table is a collection of hierarchies, categories and logic which can be used for user to traverse in hierarchy nodes.

What are the Different methods of loading Dimension tables?

There are two different ways to load data in dimension tables.

Conventional (Slow) :

All the constraints and keys are validated against the data before, it is loaded, this way data integrity is maintained.

Direct (Fast) :

All the constraints and keys are disabled before the data is loaded. Once data is loaded, it is validated against all the constraints and keys. If data is found invalid or dirty it is not included in index and all future processes are skipped on this data.

Types of Dimensions:

Outriggers
An outrigger is a dimension table or entity that is joined to other dimension tables in a star schema. Outriggers are used when a dimension table is snowflaked.
Outriggers are tables or entities that are shared by more than one dimension.
A table or entity that is included in a hierarchy but is not directly related to the fact table are known as outriggers. Outriggers are often used when a dimension table or entity is referenced by another dimension. The primary key of an outrigger is referenced by the foreign key of a dimension table or entity.

Slowly Changing Dimensions:
Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension.

Rapidly Changing Dimensions:
A dimension attribute that changes frequently is a Rapidly Changing Attribute. If you don’t need to track the changes, the Rapidly Changing Attribute is no problem, but if you do need to track the changes, using a standard Slowly Changing Dimension technique can result in a huge inflation of the size of the dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. This new dimension is called a Rapidly Changing Dimension.

Junk Dimensions:
A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign keys in the fact table. Junk dimensions are often created to manage the foreign keys created by Rapidly Changing Dimensions.

Inferred Dimensions:
While loading fact records, a dimension record may not yet be ready. One solution is to generate an surrogate key with Null for all the other attributes. This should technically be called an inferred member, but is often called an inferred dimension.

Conformed Dimensions:
A Dimension that is used in multiple locations is called a conformed dimension. A conformed dimension may be used with multiple fact tables in a single database, or across multiple data marts or data warehouses.

Degenerate Dimensions:
A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table. These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often used as the result of a drill through query to analyze the source of an aggregated number in a report. You can use these values to trace back to transactions in the OLTP system.

Role Playing Dimensions:
A role-playing dimension is one where the same dimension key — along with its associated attributes — can be joined to more than one foreign key in the fact table. For example, a fact table may include foreign keys for both Ship Date and Delivery Date. But the same date dimension attributes apply to each foreign key, so you can join the same dimension table to both foreign keys. Here the date dimension is taking multiple roles to map ship date as well as delivery date, and hence the name of Role Playing dimension.

Shrunken Dimensions:
A shrunken dimension is a subset of another dimension. For example, the Orders fact table may include a foreign key for Product, but the Target fact table may include a foreign key only for ProductCategory, which is in the Product table, but much less granular. Creating a smaller dimension table, with ProductCategory as its primary key, is one way of dealing with this situation of heterogeneous grain. If the Product dimension is snowflaked, there is probably already a separate table for ProductCategory, which can serve as the Shrunken Dimension.

Static Dimensions:
Static dimensions are not extracted from the original data source, but are created within the context of the data warehouse. A static dimension can be loaded manually — for example with Status codes — or it can be generated by a procedure, such as a Date or Time dimension.

FC DM Fact tables
Different types of dimensions and facts in data warehouse
Design tip 152 slowly changing dimension types 0 4 5 6 7

Ralph introduced the concept of slowly changing dimension (SCD) attributes in 1996. Dimensional modelers, in conjunction with the business’s data governance representatives, must specify the data warehouse’s response to operational attribute value changes. Most Kimball readers are familiar with the core SCD approaches: type 1 (overwrite), type 2 (add a row), and type 3 (add a column). Since legibility is a key component of the Kimball mantra, we sometimes wish Ralph had given these techniques more descriptive names, such as “overwrite” instead of “type 1.” But at this point, the SCD type numbers are part of our industry’s vernacular.
We have written about more advanced SCD patterns, such as the 2005 article entitled “Slowly Changing Dimensions are Not Always as Easy as 1, 2, 3.” However, we’ve not consistently named the more advanced and hybrid techniques. With the third edition of The Data Warehouse Toolkit (Wiley, 2013), we’ve decided to assign “type numbers” to several techniques that have been described, but not precisely labeled in the past. Our hope is that more specific technique names will facilitate clearer communication between DW/BI team members.
Type 0: Retain Original
With type 0, the dimension attribute value never changes, so facts are always grouped by this original value. Type 0 is appropriate for any attribute labeled “original,” such as a customer’s original credit score, or any durable identifiers. Type 0 also applies to most date dimension attributes.
Type 4: Add Mini-Dimension
The type 4 technique is used when a group of dimension attributes are split off into a separate mini-dimension. This approach is useful when dimension attribute values are relatively volatile. Frequently-used attributes in multi-million row dimension tables are also mini-dimension design candidates, even if they don’t change frequently. A surrogate key is assigned to each unique profile or combination of attribute values in the mini-dimension. The surrogate keys of both the base dimension and mini-dimension profile are captured as foreign keys in the fact table.

The following type 5, 6, and 7 techniques are hybrids that combine the basics to support the common requirement to both accurately preserve historical attribute values, plus report historical facts according to current attribute values. The hybrid approaches provide more analytic flexibility, albeit with greater complexity.
Type 5: Add Mini-Dimension and Type 1 Outrigger          
The type 5 technique builds on the type 4 mini-dimension by embedding a “current profile” mini-dimension key in the base dimension that’s overwritten as a type 1 attribute. This approach, called type 5 because 4 + 1 equals 5, allows the currently-assigned mini-dimension attribute values to be accessed along with the base dimension’s others without linking through a fact table. Logically, we typically represent the base dimension and current mini-dimension profile outrigger as a single table in the presentation layer. The outrigger attributes should have distinct column names, like “Current Income Level,” to differentiate them from attributes in the mini-dimension linked to the fact table. The ETL team must update/overwrite the type 1 mini-dimension reference whenever the current mini-dimension changes over time. If the outrigger approach does not deliver satisfactory query performance, then the mini-dimension attributes could be physically embedded (and updated) in the base dimension.

Type 6: Add Type 1 Attributes to Type 2 Dimension
Type 6 builds on the type 2 technique by also embedding current attributes in the dimension so that fact rows can be filtered or grouped by either the type 2 value in effect when the measurement occurred or the attribute’s current value. The type 6 moniker was suggested by an HP engineer in 2000 because it’s a type 2 row with a type 3 column that’s overwritten as a type 1; both 2 + 3 + 1 and 2 x 3 x 1 equal 6. With this approach, the current attributes are updated on all prior type 2 rows associated with a particular durable key, as illustrated by the following sample rows:

Type 7: Dual Type 1 and Type 2 Dimensions
With type 7, the fact table contains dual foreign keys for a given dimension: a surrogate key linked to the dimension table where type 2 attributes are tracked, plus the dimension’s durable supernatural key linked to the current row in the type 2 dimension to present current attribute values.
Type 7 delivers the same functionality as type 6, but it’s accomplished via dual keys instead of physically overwriting the current attributes with type 6. Like the other hybrid approaches, the current dimension attributes should be distinctively labeled to minimize confusion.
Finally, here’s a figure from theThe Data Warehouse Toolkit, Third Edition that highlights the implications of each slowly changing dimension technique on the analysis of fact table performance metrics. As we’ve warned in the past, there’s more to consider than simply 1, 2 and 3!
Unlike most OLTP systems, a major objective of a data warehouse is to track history. So, accounting for change is one of the analyst’s most important responsibilities. A sales force region reassignment is a good example of a business change that may require you to alter the dimensional data warehouse. We’ll discuss how to apply the right technique to account for the change historically. Hang on to your hats — this is not an easy topic.
Data warehouse design presumes that facts, such as customer orders or product shipments, will accumulate quickly. But the supporting dimension attributes of the facts, such as customer name or product size, are comparatively static. Still, most dimensions are subject to change, however slow. When dimensional modelers think about changing a dimension attribute, the three elementary approaches immediately come to mind: slowly changing dimension (SCD) types 1, 2 and 3.
As a quick review… Type 1 is most appropriate when processing corrections; this technique won’t preserve historically accurate associations. The changed attribute is simply updated (overwritten) to reflect the most current value.With a type 2 change, a new row with a new surrogate primary key is inserted into the dimension table to capture changes. Both the prior and new rows contain as attributes the natural key (or durable identifier), the most-recent-row flag and the row effective and expiration dates.With type 3, another attribute is added to the existing dimension row to support analysis based on either the new or prior attribute value. This is the least commonly needed technique.
These three fundamental techniques are adequate for most situations. However, what happens when you need variations that build on these basics to serve more analytically mature data warehouse users? Business folks sometimes want to preserve the historically accurate dimension attribute associated with a fact (such as at the time of sale or claim), but maintain the option to roll up historical facts based on current dimension characteristics. That’s when you need hybrid variations of the three main types. We’ll lead you through the hybrids in this column.

The Mini Dimension with “Current” Overwrite

When you need historical tracking but are faced with semi-rapid changes in a large dimension, pure type 2 tracking is inappropriate. If you use a mini dimension, you can isolate volatile dimension attributes in a separate table rather than track changes in the primary dimension table directly. The mini-dimension grain is one row per “profile,” or combination of attributes, while the grain of the primary dimension might be one row per customer. The number of rows in the primary dimension may be in the millions, but the number of mini-dimension rows should be a fraction of that. You capture the evolving relationship between the mini dimension and primary dimension in a fact table. When a business event (transaction or periodic snapshot) spawns a fact row, the row has one foreign key for the primary dimension and another for the mini-dimension profile in effect at the time of the event.
Profile changes sometimes occur outside of a business event, for example when a customer’s geographic profile is updated without a sales transaction. If the business requires accurate point-in-time profiling, a supplemental factless fact table with effective and expiration dates can capture every relationship change between the primary and profile dimensions. One more embellishment with this technique is to add a “current profile” key to the primary dimension. This is a type 1 attribute, overwritten with every profile change, that’s useful for analysts who want current profile counts regardless of fact table metrics or want to roll up historical facts based on the current profile. You’d logically represent the primary dimension and profile outrigger as a single table to the presentation layer if doing so doesn’t harm performance. To minimize user confusion and error, the current attributes should have column names that distinguish them from the mini-dimension attributes. For example, the labels should indicate whether a customer’s marketing segment designation is a current one or an obsolete one that was effective when the fact occurred — such as “historical marital status at time of event” in the profile mini dimension and “current marital status” in the primary customer dimension.

Type 2 with “Current” Overwrite

Another variation for tracking unpredictable changes while supporting rollup of historical facts to current dimension attributes is a hybrid of type 1 and type 2. In this scenario, you capture a type 2 attribute change by adding a row to the primary dimension table. In addition, you have a “current” attribute on each row that you overwrite (type 1) for the current and all previous type 2 rows. You retain the historical attribute in its own original column. When a change occurs, the most current dimension row has the same value in the uniquely labeled current and historical (“as was” or “at time of event”) columns.
You can expand this technique to cover not just the historical and current attribute values but also a fixed, end-of-period value as another type 1 column. Although it seems similar, the end-of-period attribute may be unique from either the historical or current perspective. Say a customer’s segment changed on Jan. 5 and the business wants to create a report on Jan. 10 to analyze last period’s data based on the customer’s Dec. 31 designation. You could probably derive the right information from the row effective and expiration dates, but providing the end-of-period value as an attribute simplifies the query. If this query occurs frequently, it’s better to have the work done once, during the ETL process, rather than every time the query runs. You can apply the same logic to other fixed characteristics, such as the customer’s original segment, which never changes. Instead of having the historical and current attributes reside in the same physical table, the current attributes could sit in an outrigger table joined to the dimension natural key. The same natural key, such as Customer ID, likely appears on multiple type 2 dimension rows with unique surrogate keys. The outrigger contains just one row of current data for each natural key in the dimension table; the attributes are overwritten whenever change occurs. To promote ease of use, have the core dimension and outrigger of current values appear to the user as one — unless this hurts application performance.

Type 2 with Natural Keys in the Fact Table

If you have a million-row dimension table with many attributes requiring historical and current tracking, the last technique we described becomes overly burdensome. In this situation, consider including the dimension natural key as a fact table foreign key, in addition to the surrogate key for type 2 tracking. This technique gives you essentially two dimension tables associated with the facts, but for good reason. The type 2 dimension has historically accurate attributes for filtering or grouping based on the effective values when the fact table was loaded. The dimension natural key joins to a table with just the current type 1 values. Again, the column labels in this table should be prefaced with “current” to reduce the risk of user confusion. You use these dimension attributes to summarize or filter facts based on the current profile, regardless of the values in effect when the fact row was loaded. Of course, if the natural key is unwieldy or ever reassigned, then you should use a durable surrogate reference key instead.
This approach delivers the same functionality as the type 2 with “current” overwrite technique we discussed earlier; that technique spawns more attribute columns in a single dimension table, while this approach relies on two foreign keys in the fact table.
While it’s uncommon, we’re sometimes asked to roll up historical facts based on any specific point-in-time profile, in addition to reporting them by the attribute values when the fact measurement occurred or by the current dimension attribute values. For example, perhaps the business wants to report three years of historical metrics based on the attributes or hierarchy in effect on Dec. 1 of last year. In this case, you can use the dual dimension keys in the fact table to your advantage. You first filter on the type 2 dimension table’s row effective and expiration dates to locate the attributes in effect at the desired point in time. With this constraint, a single row for each natural or durable surrogate key in the type 2 dimension has been identified. You can then join to the natural/durable surrogate dimension key in the fact table to roll up any facts based on the point-in-time attribute values. It’s as if you’re defining the meaning of “current” on the fly. Obviously, you must filter on the type 2 row dates or you’ll have multiple type 2 rows for each natural key, but that’s fundamental in the business’s requirement to report any history on any specified point-in-time attributes. Finally, only unveil this capability to a limited, highly analytic audience. This embellishment is not for the faint of heart.

Series of Type 3 Attributes

Say you have a dimension attribute that changes with a predictable rhythm, such as annually, and the business needs to summarize facts based on any historical value of the attribute (not just the historically accurate and current, as we’ve primarily been discussing). For example, imagine the product line is recategorized at the start of every fiscal year and the business wants to look at multiple years of historical product sales based on the category assignment for the current year or any prior year.
This situation is best handled with a series of type 3 dimension attributes. On every dimension row, have a “current” category attribute that can be overwritten, as well as attributes for each annual designation, such as “2004 category” and “2003 category.” You can then group historical facts based on any annual categorization.
This seemingly straightforward technique isn’t appropriate for the unpredictable changes we described earlier. Customer attributes evolve uniquely. You can’t add a series of type 3 attributes to track the prior attribute values (“prior-1,” “prior-2” and so on) for unpredictable changes, because each attribute would be associated with a unique point in time for nearly every row in the dimension table.

Balance Power against Ease of Use

Before using hybrid techniques to support sophisticated change tracking, remember to maintain the equilibrium between flexibility and complexity. Users’ questions and answer sets will vary depending on which dimension attributes are used for constraining or grouping. Given the potential for error or misinterpretation, hide the complexity (and associated capabilities) from infrequent users.


Hierarchies

hierarchy is a many-to-one relationship between members of a table or between tables. A hierarchy basically consists of different levels, each corresponding to a dimension attribute.
In other words, a hierarchy is a specification of levels that represents relationships between different attributes within a hierarchy. For example, one possible hierarchy in the date dimension is Year > Quarter > Month > Day.
There are four major types of hierarchies that are supported by the workbench:

http://www.data-e-education.com/DDM/Data_Marts.php#.Ve__BWOAhb4


What is OLTP?

OLTP is abbreviation of On-Line Transaction Processing. This system is an application that modifies data the instance it receives and has a large number of concurrent users.

What is OLAP?

OLAP is abbreviation of Online Analytical Processing. This system is an application that collects, manages, processes and presents multidimensional data for analysis and management purposes.

What is the difference between OLTP and OLAP?

Data Source

OLTP: Operational data is from original data source of the data

OLAP: Consolidation data is from various source.

Process Goal

OLTP: Snapshot of business processes which does fundamental business tasks

OLAP: Multi-dimensional views of business activities of planning and decision making

Queries and Process Scripts

OLTP: Simple quick running queries ran by users.

OLAP: Complex long running queries by system to update the aggregated data.

Database Design

OLTP: Normalized small database. Speed will be not an issue due to smaller database and normalization will not degrade performance. This adopts entity relationship(ER) model and an application-oriented database design.

OLAP: De-normalized large database. Speed is issue due to larger database and de-normalizing will improve performance as there will be lesser tables to scan while performing tasks. This adopts star, snowflake or fact constellation mode of subject-oriented database design.

Describes the foreign key columns in fact table and dimension table?

Foreign keys of dimension tables are primary keys of entity tables.

Foreign keys of facts tables are primary keys of Dimension tables.

What is Data Mining?

Data Mining is the process of analyzing data from different perspectives and summarizing it into useful information.

What is the difference between view and materialized view?

A view takes the output of a query and makes it appear like a virtual table and it can be used in place of tables.

A materialized view provides indirect access to table data by storing the results of a query in a separate schema object.

What is ER Diagram?

Entity Relationship Diagrams are a major data modelling tool and will help organize the data in your project into entities and define the relationships between the entities. This process has proved to enable the analyst to produce a good database structure so that the data can be stored and retrieved in a most efficient manner.

An entity-relationship (ER) diagram is a specialized graphic that illustrates the interrelationships between entities in a database. A type of diagram used in data modeling for relational data bases. These diagrams show the structure of each table and the links between tables.

What is ODS?

ODS is abbreviation of Operational Data Store. A database structure that is a repository for near real-time operational data rather than long term trend data. The ODS may further become the enterprise shared operational database, allowing operational systems that are being re-engineered to use the ODS as there operation databases.

What is ETL?

ETL is abbreviation of extract, transform, and load. ETL is software that enables businesses to consolidate their disparate data while moving it from place to place, and it doesn’t really matter that that data is in different forms or formats. The data can come from any source.ETL is powerful enough to handle such data disparities. First, the extract function reads data from a specified source database and extracts a desired subset of data. Next, the transform function works with the acquired data – using rules orlookup tables, or creating combinations with other data – to convert it to the desired state. Finally, the load function is used to write the resulting data to a target database.

What is VLDB?

VLDB is abbreviation of Very Large DataBase. A one terabyte database would normally be considered to be a VLDB. Typically, these are decision support systems or transaction processing applications serving large numbers of users.

Is OLTP database is design optimal for Data Warehouse?

No. OLTP database tables are normalized and it will add additional time to queries to return results. Additionally OLTP database is smaller and it does not contain longer period (many years) data, which needs to be analyzed. A OLTP system is basically ER model and not Dimensional Model. If a complex query is executed on a OLTP system, it may cause a heavy overhead on the OLTP server that will affect the normal business processes.

If de-normalized is improves data warehouse processes, why fact table is in normal form?

Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact table contains columns which are primary key to other table that itself make normal form table.

What are Lookup tables?

A lookup table is the table placed on the target table based upon the primary key of the target, it just updates the table by allowing only modified (new or updated) records based on thelookup condition.

What are Aggregate tables?

Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions. It is always easy to retrieve data from aggregated tables than visiting original table which has million records. Aggregate tables reduces the load in the database server and increases the performance of the query and can retrieve the result quickly.

What is real time data-warehousing?

Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly.

What are conformed dimensions?

Conformed dimensions mean the exact same thing with every possible fact table to which they are joined. They are common to the cubes.

What is conformed fact?

Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.

How do you load the time dimension?

Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. 100 years may be represented in a time dimension, with one row per day.

What is a level of Granularity of a fact table?

Level of granularity means level of detail that you put into the fact table in a data warehouse. Level of granularity would mean what detail are you willing to put for each transactional fact.

What are non-additive facts?

Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. However they are not considered as useless. If there is changes in dimensions the same facts can be useful.

What is factless facts table?

A fact table which does not contain numeric fact columns it is called factless facts table. For example attendance.

What are slowly changing dimensions (SCD)?

SCD is abbreviation of Slowly changing dimensions. SCD applies to cases where the attribute for a record varies over time.

There are three different types of SCD.

1) SCD1 : The new record replaces the original record. Only one record exist in database – current data.

2) SCD2 : A new record is added into the customer dimension table. Two or more records exist in database – current data and previous history data.

3) SCD3 : The original data is modified to include new data. One record exist in database – new information are attached with old information in same row.

What is hybrid slowly changing dimension?

Hybrid SCDs are combination of both SCD 1 and SCD 2. It may happen that in a table, some columns are important and we need to track changes for them i.e capture the historical data for them whereas in some columns even if the data changes, we don’t care.

What is BUS Schema?

BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts.

What is a Star Schema?

Star schema is a type of organizing the tables such that we can retrieve the result from the database quickly in the warehouse environment. Fact at the center joined with all the dimensions surrounding it forming a star like formation.

What Snow Flake Schema?

Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table.

Differences between star and snowflake schema?

Star schema – A single fact table with N number of Dimension, all dimensions will be linked directly with a fact table. This schema is de-normalized and results in simple join and less complex query as well as faster results.

Snow schema – Any dimensions with extended dimensions are know as snowflake schema, dimensions maybe interlinked or may have one to many relationship with other tables. This schema is normalized and results in complex join and very complex query as well as slower results.

What is Difference between ER Modeling and Dimensional Modeling?

ER modeling is used for normalizing the OLTP database design. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.

What is degenerate dimension table?

If a table contains the values, which are neither dimension nor measures is called degenerate dimensions.

Why is Data Modeling Important?

Data modeling is probably the most labor intensive and time consuming part of the development process. The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language , it can be reviewed and verified as correct by the end-users.

In computer science, data modeling is the process of creating a data model by applying a data model theory to create a data model instance. A data model theory is a formal data model description. When data modelling, we are structuring and organizing data. These data structures are then typically implemented in a database management system. In addition to defining and organizing the data, data modeling will impose (implicitly or explicitly) constraints or limitations on the data placed within the structure.

Managing large quantities of structured and unstructured data is a primary function of information systems. Data models describe structured data for storage in data management systems such as relational databases. They typically do not describe unstructured data, such as word processing documents, email messages, pictures, digital audio, and video.

What is Surrogate Key?

Surrogate key is a substitution for the natural primary key. It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table. It is useful because the natural primary key can change and this makes updates more difficult.Surrogated keys are always integer or numeric.

What is Data Mart?

A data mart (DM) is a specialized version of a data warehouse (DW). Like data warehouses, data marts contain a snapshot of operational data that helps business people to strategize based on analyses of past trends and experiences. The key difference is that the creation of a data mart is predicated on a specific, predefined need for a certain grouping and configuration of select data. A data mart configuration emphasizes easy access to relevant information (Reference : Wiki). Data Marts are designed to help manager make strategic decisions about their business.

What is the difference between OLAP and Data Warehouse?

Datawarehouse is the place where the data is stored for analyzing where as OLAP is the process of analyzing the data, managing aggregations, partitioning information into cubes for in depth visualization.

What is a Cube and Linked Cube with reference to data warehouse?

Cubes are logical representation of multidimensional data.The edge of the cube contains dimension members and the body of the cube contains data values. The linking in cube ensures that the data in the cubes remain consistent.

What is Junk Dimension?

A number of very small dimensions might be lumped together to form a single dimension, a junk dimension – the attributes are not closely related. Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.

What is snapshot with reference to data warehouse?

You can disconnect the report from the catalog to which it is attached by saving the report with a snapshot of the data.

What is active data warehousing?

An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships nimbly, efficiently and proactively.

What is the difference between data warehousing and business intelligence?

Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc.

Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term “business intelligence” is used to encompass OLAP, data visualization, data mining and query/reporting tools.

  1. Operational tactical strategic dashboards
  2. The Data Warehouse Bus Architecture
  3. Solutions by Design: Adding a Questionnaire to a Design
  4. Step-by-Step Data Warehousing
  5. Predictive analytics ROI Siegel Prediction
  6. Building Your Data Warehouse
  7. Data Warehousing: Dimension Basics
  8. Data Warehousing: Dimensional Role-Playing
  9. Data Warehousing: Slowly Changing Dimensions
  10. Data Warehousing: Rapidly Changing Monster Dimensions
  11. Data Warehousing: Measuring the Facts
  12. Data Warehousing: Horizontally Partitioning the Fact Table 
  13. Developing data warehouse architecture 
  14. Data warehouse architect 
  15. Why data warehouse 
  16. Overview of OLAP capabilities 
  17. Introduction to data warehousing 
  18. Datawarehouse data model 
  19. DWH tutorials 
  20. DWH Tutorial 1 
  21. DWH tutorial 2 
  22. http://dwhetltool.blogspot.com/2011_05_08_archive.html
  23. http://dwhetltool.blogspot.com/2011_08_07_archive.html
  24. http://dwhetltool.blogspot.com/2011_01_23_archive.html
  25. http://dwhetltool.blogspot.com/2011_01_16_archive.html
  26. Online analytical processing OLAP
  27. OLAP workshop Part 2 understanding OLAP
  28. Evolution of real time datawarehouse


Oracle 1z0-515 data warehousing 11g

Oracle 1z0-515 Data Warehousing 11g Essentials 2/2

Visit the Below Website to access the full version of exam questions for all IT vendors

http://www.free-online-exams.com

Oracle 1z0-515 Data Warehousing 11g Essentials 2/2

See the next post for the rest of the exam questions

Email the author ayman.rayyan@hotmail.com for a pdf/word version of the questions

QUESTION NO: 35

Which unique method of improving performance is NOT used by the Oracle Exadata Database

Machine?

A. Flash to improve query performance

B. Reduces the amount of data required to flow through I/O

C. Increases the I/O using InfiniBand

D. Performs analysis in a special in-memory database

Answer: D

Explanation:
http://techsatwork.com/blog/?p=743

QUESTION NO: 36

You can perform what-if analysis of potential changes with Oracle Warehouse Builder.

A. TRUE

B. FALSE

Answer: A

Explanation:

QUESTION NO: 37

Indentify the dimension that appears most often in queries in a data warehouse.

A. Product dimension

B. Time dimension

C. Cost dimension

D. Location dimension

Answer: A

Explanation:
http://philip.greenspun.com/sql/data-warehousing.html

QUESTION NO: 38

Which questions CANNOT be addressed by Oracle Data Mining?

A. Fraud detection

B. Prediction of customer behavior

C. Root cause de

D. Identify factors associated with a business problem

Answer: C

Explanation:

QUESTION NO: 39

Which is NOT an available composite partition in Oracle Database 11g?

A. range-list

B. list-list

C. list-range

D. interval-hash

Answer: D

Explanation:

http://www.oracle-base.com/articles/11g/PartitioningEnhancements_11gR1.php

QUESTION NO: 40

You are looking to create a RAC cluster to deliver high performance for your client's data

warehouse. Which statement is true about a configuration with a few large nodes versus a

configuration with n smaller nodes?

A. A few large nodes always perform better than many small nodes.

B. A few large nodes always perform worse than many small nodes.

C. It depends on the workload specifics and the effect of a node failure.

D. Performance should be the same with either option.

Answer: A

Explanation:
http://www.oracle.com/us/solutions/datawarehousing/039574.pdf

QUESTION NO: 41

You can use Oracle Data Mining unstructured data.

A. TRUE

B. FALSE

Answer: A

Explanation:
http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129.pdf

QUESTION NO: 42

How can you use Oracle Data Mining with Oracle Warehouse builder?

A. To identify records to extract

B. As a standard transform operation

C. To increase write performance

D. To eliminate ETL logging

Answer: B

Explanation:
http://blogs.oracle.com/datamining/

QUESTION NO: 43

You customer wants to segment their customers1 demographic data into those that use and do

not use loyalty card. What would you recommend?

A. Use Oracle OLAP Option.

B. Use Oracle SQL Analytic Functions. C. Use classification algorithm in Oracle Data Mining.

C. Use non-negative matrix factorization in Oracle Data Mining.

Answer: C

Explanation:

http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/intro_concepts.htm

QUESTION NO: 44

One goal of your Information Lifecycle Management strategy using Oracle's ILM capabilities is to

reduce e cost or online storage. Identify two database options that would help in enabling such a

strategy.

A. RAC and Advanced Compression

B. RAC and Partitioning

C. Partitioning and Advanced Compression

D. RAC One and Advanced Compression

Answer: B,D

Explanation:

QUESTION NO: 45

Which can be used in scenario where there are large data loads °f a "me sensitive nature into a

data warehouse?

A. Direct path loading

B. External tables for loading flat files

C. Partition exchange loading

D. Any of these are valid for certain situations.

Answer: A

Explanation:

http://download.oracle.com/docs/cd/B19306_01/server.102/b14215/ldr_modes.htm#i1008815

QUESTION NO: 46

How can you implement near real time data integration with Oracle Data Integrator?

A. By accessing Change Data Capture records from logs

B. By using Exchange Partition

C. By mining Oracle UNDO segments

D. By reading operating system logs

Answer: B

Explanation:

QUESTION NO: 47

Your customer wants to determine "market baskets." What do you recommend?

A. Use Oracle OLAP Option.

B. Use Oracle SQL Analytic Functions.

C. Use associations algorithm in Oracle Data Mining.

D. Use regression analysis in Oracle Data Mining

Answer: D

Explanation:
http://download.oracle.com/docs/cd/B28359_01/datamine.111/b28129/regress.htm

QUESTION NO: 48

Identify the type of refresh that is NOT supported by materialized views.

A. Deferred

B. Incremental

C. Full

D. Heuristic

Answer: D

Explanation: http://download.oracle.com/docs/cd/B10500_01/server.920/a96520/mv.htm#42135

QUESTION NO: 49

For which task would you NOT use Oracle Data Mining?

A. Predicting customer behavior

B. Associating factors with a business issue

C. Determining associations within a population

D. Reducing the amount of data used in a data warehouse

E. All of the above

Answer: E

Explanation: For All above we can use Oracle Data Mining

http://portal.acm.org/citation.cfm?id=1157021

QUESTION NO: 50

What is the estimated maximum speed of data loads for a Quarter Rack with the Exadata Storage

Server?

A. 1 TB/hr

B. 2 TB/hr

C. 4 TB/hr

D. 5 TB/hr

E. It depends on the number of CPUs in the server.

Answer: A

Explanation:
http://techsatwork.com/blog/?p=743

QUESTION NO: 51

Your BI tool (for example, Oracle Business Intelligence Enterprise Edition Plus, Business Objects

and Cognos) will be used to query an Oracle database that includes the Oracle OLAP Option

What does tool generate in submitting queries that might include data stored in relational tables

and OLAP cubes?

A. SQL

B. PL/SQL

C. Proprietary API code

D. SQL for relational and proprietary API code for OLAP

Answer: B

Explanation:

QUESTION NO: 52

Data Guard compresses data:

A. Always

B. When using logical standby

C. When using physical standby

D. When catching up after a network failure

Answer: C

Explanation:
http://forums.oracle.com/forums/thread.jspa?threadID=886988

QUESTION NO: 53

What are Oracle Data Integrator templates used for?

A. To model SAP applications

B. To define how to transform data

C. As reports to monitor ETL activity

D. None of these

Answer: B

Explanation: http://www.docstoc.com/docs/24079944/Oracle-Data-Integrator-Architecture

QUESTION NO: 54

Your BI tool (for example, Oracle Business Intelligence Enterprise Edition Cognos) will be used to

query an Oracle database that includes the Oracle BI tool generate in submitting queries that

might include data stored in cubes?

A. SQL

B. PIVSQL

C. Proprietary API code

D. SQL for relational and proprietary API code for OLAP

Answer: C

Explanation:

QUESTION NO: 55

How does compression affect resource utilization?

A. Reduces the amount of CPU and disk utilization

B. Increases the amount of CPU and disk utilization

C. Reduces the amount of disk but increases CPU utilization for loading

D. Increases the amount of disk but reduces CPU utilization for loading!

Answer: C

Explanation:
http://publib.boulder.ibm.com/tividd/td/ITSMFD/SC32-9065-00/en_US/HTML/ab5w0007.htm

QUESTION NO: 56

Your customer wants to use Database Resource Manager to help ensure consistent performance

based on users and operations. In designing this implementation, which limitation CANNOT be

imposed through Database Resource Manager?

A. Specifying the maximum number of concurrent operations for a resource group

B. Limiting resource consumption for a resource group

C. Specifying the amount of parallelism for a resource group

D. Limiting access to particular data for a resource group

Answer: A

Explanation:

QUESTION NO: 57

How many Exadata Storage Server cells are there in a Full Rack Exadata database machine

configuration that has 8 Database Server nodes?

A. 2

B. 14

C. 16

D. 24

Answer: B

Explanation:

QUESTION NO: 58

Identify the true statement about adaptive parallelism.

A. It Is turned on by default.

B. It is turned off by default.

C. You should always leave the default setting

D. There is no such thing.

Answer: B

Explanation:
http://kerryosborne.oracle-guy.com/category/oracle/exadata/page/3/

QUESTION NO: 59

Identify the statement about Oracle OLAP that is NOT true.

A. Oracle OLAP cubes are stored in the Oracle relational database

B. Oracle OLAP uses standard Oracle database security.

C. Meta data for Oracle OLAP is accessible in an external data dictionary

D. Oracle OU\P can be deployed using RAC.

Answer: A

Explanation:

http://download.oracle.com/docs/cd/B28359_01/olap.111/b28124/whatsnew.htm

According to the answer; "Oracle OLAP uses standard Oracle database security." is a false

statement and "Meta data for Oracle OLAP is accessible in an external data dictionary" is a correct

statement.

But in Oracle documentation,

http://download.oracle.com/docs/cd/E11882_01/olap.112/e17123/overview.htm#i1

011393 ; is says "With Oracle OLAP, standard Oracle Database security features are used to

secure your multidimensional data.".

QUESTION NO: 60

Identity the true statement about a data warehouse

A. The data warehouse is typically refreshed as often as a transactional system,

B. Data warehouse queries are simpler than OLTP queries.

C. A data warehouse typically contains historical data.

D. Queries against a data warehouse never need summarized information.

Answer: C

Explanation:
http://wiki.oracle.com/page/Oracle+OLAP+Terminology-D

QUESTION NO: 61

Identify the statement about ASM that is NOT true.

A. ASM is easier to manage than file systems.

B. ASM delivers the performance of raw partitions.

C. ASM is an extra cost option for Oracle databases.

D. ASM delivers automatic striping and mirroring.

Answer: B

Explanation:
http://download.oracle.com/docs/cd/B28359_01/server.111/b31107/asmprepare.htm

QUESTION NO: 62

Which two statements are true about the advantages of using a data warehouse?

A. A data warehouse uses fewer database structures, so access to answers is faster and easier

B. A data warehouse is typically implemented with a different design, making access faster.

C. A data warehouse is optimized for ongoing write activity, making response faster.

D. A data warehouse uses specialized features of the Oracle database, like materialized views

and star transformations, making response faster.

Answer: A,D

Explanation:

QUESTION NO: 63

What are two ways in which query performance can be improved with partitioning?

A. Partition pruning

B. Partition optimization

C. Partition compression

D. Partition-wise joins

Answer: A,D

Explanation:

QUESTION NO: 64

Your customer is looking to implement ad-hoc analysis in a data warehouse. Which approach is

least likely to be used assuming that the customer does not want the expense of managing view?

A. Star schema

B. Snowflake schema

C. Third normal form schema

D. OLAP

Answer: A

Explanation:

QUESTION NO: 65

You have analyzed your client's workload and the Materialized Views Advisor in Enterprise

Manager recommends that you create some materialized views to improve performance. What

should you do in order to most simply implement this change?

A. Rewrite all the queries in the application to identify materialized view

B. Rewrite existing queries. New queries will automatically use the views.

C. Respond positively to the Advisor to create the materialized views.

D. Build virtual views on a third normal form schema.

Answer: C

Explanation:

QUESTION NO: 66

A Full Rack Oracle Exadata Database Machine can contain 100 TB of disk space with high

performance. How much user data should you conservatively estimate this configuration can hold?

A. 100 TB

B. 75 TB

C. 50 TB

D. 28 TB

E. 20 TB

Answer: A

Explanation:

QUESTION NO: 67

What would you do to compress data in partitions that are frequently updated in Oracle Database

11g?

A. Use Hybrid Columnar Compression.

B. Use Advanced Compression Option.

C. Use Hybrid Partitions.

D. Avoid compressing any data.

Answer: A

Explanation:

http://blogs.oracle.com/datawarehousing/2010/01/new_whitepaper_on_exadata_hybr.html

QUESTION NO: 68

The Analytic Workspace Manager would be used to generate_______.

A. Materialized views

B. Oracle OLAP Option cubes

C. Oracle Data Mining algorithms

D. Oracle SQL Analytic functions

Answer: B

Explanation:

http://forums.oracle.com/forums/thread.jspa?threadID=616811

QUESTION NO: 69

Which feature of Oracle Warehouse Builder can be used to help ensure data quality?

A. Data extraction

B. Data profiling

C. Logical mapping

D. Exception reporting

Answer: A

Explanation:

QUESTION NO: 70

Flash in the Oracle Exadata Database Machine is of lower latency (and enables faster access to

data) than:

A. Memory

B. Memory and locally attached disk

C. Memory and network-attached disk

D. Locally attached and network-attached disk

Answer: B

Explanation:

Reference Links :
Oracle BI By Bakboord My view on Oracle Business Intelligence (Applications)

Multiple Fact Reporting on (Non-)Conforming dimensions
Multiple Fact Reporting on (Non-)Conforming dimensions – Part II

http://obibb.wordpress.com/2010/05/31/multiple-fact-reporting-on-non-conforming-dimensions/
http://obibb.wordpress.com/tag/conformed/

Data Warehousing and Analytics on Unstructured Data / Big Data using Hadoop on Microsoft platform http://siddhumehta.blogspot.in/2011/07/data-warehousing-and-analytics-on.html

Developing a Dimensional Model w/ conformed dimensions and facts
http://blog.tallan.com/2008/09/23/developing-a-dimensional-model-w-conformed-dimensions-and-facts/

Data Warehousing Blog Posts
http://tdwi.org/blogs/blogs-data-warehousing.aspx

Surrogate Keys
http://www.jamesserra.com/archive/2012/01/surrogate-keys/

The Data Warehousing Blog
http://dwhblog.org/blog/

A Real World Experience
http://www.jamesserra.com/archive/2012/01/difference-between-etl-and-elt/

Difference between ETL and ELT
http://www.jamesserra.com/archive/2012/01/normalizing-your-database/

Normalizing Your Database

Filter on a dimension with a fact that don't depend on that dimension
https://forums.oracle.com/forums/thread.jspa?messageID=3609020&

filter and non-conforming dimensions
https://forums.oracle.com/forums/thread.jspa?threadID=1075609&tstart=0

Partitioned Fact Tables
http://blog.tallan.com/2008/12/04/partitioned-fact-tables/

SCD via SQL Stored Procedure
http://blog.tallan.com/2008/09/08/scd-via-sql-stored-procedure/


A Warehousing Overview
http://blog.tallan.com/2008/07/31/a-warehousing-overview/

No comments:

Post a Comment