DATA WAREHOUSE SCHEMA
One of the key questions
to be answered by the database designer is: How can we design a database that
allows unknown queries to be performant? This question encapsulates the
differences between designing for a data warehouse and designing for an operational
system. In a data warehouse one designs to support the business process
rather than specific query requirements. In order to achieve this, the designer
must understand the way in which the information within the data warehouse will
be used.
In general, the queries
directed at a data warehouse tend to ask questions about
some essential fact,
analyzed in different ways. For example reporting on:
-The
average number of light bulbs sold per store over the past month
-The
top ten most viewed cable-TV programs during the past week
-Top
spending customers over the past quarter
-Customers
with average credit card balances more than Rs.10,000 during the past year.
Each of these queries has
one thing in common: they are all based on factual data. The content and
presentation of the results may differ between examples, but the factual and
transactional nature of the underlying data is the same.
sbobet asia indonesia Read More About News, Trends and Ideas that matter most to Entrepreneurs Visit SnapMunk
Table 1: Fact Tables and Attributes
REQUIREMENT
|
FACTS
|
ATTRIBUTES
|
Sales of Light Bulbs
|
EPOS
Transaction.
|
Quantity
Sold
Product
Identifier (SKU)
Store
Identifier
Data
and Time
Revenue
Achieved.
|
Cable Programs
|
Cable
Pay-per-view
Transaction
|
Customer
Identifier
Cable
Channel Watched
Program
Watched
Data
and Time
Duration
Household
Identifier.
|
Customer Spend
|
Loyalty
Card
Transaction
|
Customer
Identifier
Store
Identifier
Transaction
Value
Date
and Time
|
Customer Account
|
Account
Transactions
|
Customer
Identifier
Account
Number
Type
of Transaction
Destination
Account Number
|
Fact data possesses some
characteristics that allow the underlying information in the database to be
structured. Facts are transactions that have occurred at some point in the
past, and are unlikely to change in the future. Facts can be analyzed in
different ways by cross-referencing the facts with different reference
information.
For example, we can look
at sales by store, sales by region, or sales by product. In a data warehouse
facts also tend to have few attributes, because there are no operational data
overheads. For each of the examples described, the attributes of the fact could
be as listed in Table 1 above.
One of the major
technical challenges within the design of a data warehouse is to structure a
solution that will be effective for a reasonable period of time (at least three
to five years). This implies that the data should not have to be restructured when
the business changes or the query profiles change. This is an important point, because
in more traditional applications it is not uncommon to restructure the underlying
data in order to address query performance issues.
Fig : 1 Star Schema
The inherent advantage of
factual transactions is that their content is unlikely to change, regardless of
how it is analyzed. Also, the majority of the data volume of a data warehouse
comes from the factual information. It is therefore possible to treat fact data
as read-only data, and the reference data (used to interpret the fact data) as data
that is liable to change over time. Thus, if reference data needs to change,
the voluminous fact data would not have to be changed or restructured.
Star schemas (so called due to their ‘star like’ appearance) are physical database
structures that store the factual data in the ‘center’, surrounded by the reference
(or dimension) data (see Figure 1 above).
The dimension tables
should be relatively small (typically less than 10 GB in total) in comparison
to the size of the data warehouse, so that restructuring costs are small as long
as the keys to the fact tables are not changed. In star schema arrangements,
the reference information is often denormalized to a single table to speed up
query performance. The redundancy overheads are acceptable, as the sizes
involved are small and even the reference information changes infrequently.
The dimensional
information, represented by the reference data is often organized in form of concept
hierarchies that are used for analysis, and designing data warehouse
aggregations.
A typical concept hierarchy for a retail chain
is depicted in Figure 2 below :
Fig. 2 : Concept Hierarchies
The number of concept
hierarchies that can be defined on any given dimension is by no means
restricted to one. There may be several arbitrary concept hierarchies in use in
any organization to enable data analysis from various angles. Figure 3 below
shows how the ‘time’ dimension in a retail data warehouse (represented
by the day / date reference data) may be organized into several concept
hierarchies or groupings (which are single level concept hierarchies –
‘Easter’ and ‘Summer’ in Figure 3)
Fig. 3 : Multiple Hierarchies
When the concept
hierarchies and groupings are incorporated into a star schema diagram (like Figure 4
below), the appearance resembles a ‘snowflake’. Hence, schemas of this type are
called Snowflake schemas.
0 comments:
Post a Comment