METADATA OF DATA WAREHOUSE
Metadata in a data
warehouse is similar to the data dictionary in the context of a database.
It stores data about data in the data warehouse.
Types of Metadata
Metadata in a data
warehouse fall into three major categories:
-Operational
Metadata
-Extraction and
Transformation Metadata
-End-User Metadata
Operational Metadata: As we know, data for the data warehouse comes from several
operational systems of the enterprise. These source systems contain different
data structures. The data elements selected for the data warehouse have various
field lengths and data types. Selecting data from different source files, and loading
it into the data warehouse, requires splitting of records, combining parts of records
from different source files, and dealing with multiple coding schemes and field
lengths. When information is delivered to the end-users, it is essential to be
able relate back to the original source data sets. Operational metadata contain
all of this information about the operational data sources that allow us to
trace back to the original source.
Vi hjelper deg med alt for din reise på http://www.londontown.no Her finner du alt du trenger
Extraction and
Transformation Metadata: Extraction and
transformation metadata contain data about the extraction of data from the
source systems, namely, the extraction frequencies, extraction methods, and
business rules for the data extraction. Also, this category of metadata
contains information about all the data transformations that take place in the
data staging area.
End-User Metadata. The end-user metadata is the navigational map of the data warehouse.
It enables the end-users to find information from the data warehouse. The
end-user metadata allows the end-users to use their own business terminology and
look for information in those ways in which they normally think of the
business.
Special Significance
Why is metadata
especially important in a data warehouse?
1. First, it acts as the glue that connects all parts of the data
warehouse.
2. Next, it provides information about the contents and structure
to the developers.
3. Finally, it opens the door to the end-users and makes the
contents recognizable in their own terms.
Metadata Requirements
According to Inmon,
a new user approaching a data warehouse wants to know :
·
What tables, attributes,
and keys does the data warehouse contain?
·
From where did each set
of data come?
·
What transformation logic
was applied in loading the data?
·
How has the data changed
over time?
·
What aliases exist, and
how are they related to each other?
·
What are the
cross-references between technical and business terms? (For instance, the field
name XVT-351J presumably meant something to a COBOL programmer in 1965, but
what does it mean to me today?)
·
How often does the data
get reloaded?
·
How much data is there?
This helps end-users to avoid submitting unrealistic queries. Given some means
of determining the size of tables, staff can tell the end users, “You can do
what you like with 15,000 rows, but if it turns out be 15 million rows, back
off and ask for help!”
Metadata requirements of
various classes of users are summarized in Table 1 below:
Table 1: Uses of Metadata
IT Professionals
|
Power Users
|
Casual Users
|
|
Analysis
and Discovery
|
Database
Tables, Columns,
Server
Platforms.
|
Databases,
Tables, Columns
|
List
of Predefined Queries and Reports, Business views.
|
Meaning
of Data
|
Data
structures, Data Definitions, Data Mapping, Cleansing Functions,
Transformation
Rules
|
Business
Terms, Data Definitions, Data Mapping,
Cleansing
Functions,
Transformation
Rules
|
Business
Terms, Data Definitions, Filters, Data Sources, Conversion, Data
Owners
|
Information
Access
|
Program
Code in SQL, 3GL, 4GL, Front-end Applications,
Security
|
Query
Toolsets, Database Access for Complex
Analysis
|
Authorization
Requests, Information Retrieval into Desktop Applications such as Spreadsheets.
|
Metadata Components
Warehouse metadata is not
very different in kind from ordinary database metadata, although it is
versioned in order to permit historical analysis. Prism gives the following breakdown
of warehouse metadata in its Tech Topic, “Metadata in the Data Warehouse:”
Mapping
The mapping information
records how data from operational sources is transformed on its way into the
warehouse. Typical contents are:
·
Identification of source
fields
·
Simple
attribute-to-attribute mapping
·
Attribute conversions
·
Physical characteristic
conversions
·
Encoding/reference table
conversions
·
Naming changes
·
Key changes
·
Defaults
·
Logic to choose from
among multiple sources
·
Algorithmic changes
Extract History
Whenever historical
information is analyzed, meticulous update records have to be kept. The
metadata history is a good place to start any time-based report, because the
analyst has to know when the rules changed in order to apply the right rules to
the right data. If, for example, sales territories were remapped in 2001,
results from before that date may not be directly comparable with more recent
results.
Miscellaneous
·
Aliases can make the
warehouses much more use-friendly by allowing a table to be queried by “Widgets
produced by each factory” rather than “MFSTATS.” Aliases also come in useful
when different departments want to use their own names to refer to the same
underlying data. Obviously, though, aliases can also cause a great deal of
confusion if they are not carefully tracked.
·
Often, parts of the same
data warehouse may be in different stages of development. Status information
can be used to keep track of this: for instance, tables might be classified “in-design,”
“in-test,” ‘inactive,” or “active.”
·
Volumetric information
lets users know how much data they are dealing with, so that they can have some
idea how much their queries will cost in terms of time and computational
resources. Volumetrics could usefully include such information as number of
rows, growth rate, usage characteristics, indexing, and byte specifications.
·
It is also useful to
publish the criteria and time scales for purging old data.
Summarization and Aggregation Algorithms
As discussed above, a
typical data warehouse contains lightly and heavily summarized data, and
aggregations as well as full detailed records. The algorithms for summarizing
(and aggregating) the detail data are obviously of interest to anyone who takes
responsibility for interpreting the meaning of the summaries. This metadata can
also save time by making it easier to decide which level of summarization is
most appropriate for a given purpose.
Relationship Artifacts and History
Data warehouses implement
relationships in a different way form production databases. Metadata pertaining
to related tables, constraints, and cardinality are maintained, together with
text descriptions and ownership records. This information and the history of
changes to it can be useful to analysts.
Ownership / Stewardship
Operational databases are
often owned by particular departments or business groups. In an enterprise data
warehouse, however, all data is stored in a common format and accessible to
all. This makes it necessary to identify the originator of each set of
data, so that inquiries and corrections can be made by the proper group. It is
useful to distinguish between ownership of data in the
operational environment and stewardship in the data
warehouse.
Access Patterns
It is desirable to record
patterns of access to the warehouse in order to optimize and tune performance.
Less frequently used data can be migrated to cheaper storage media, and various
methods can be used to accelerate access to the data that is most in demand.
Most databases do a good job of hiding such physical details, but specialized
performance analysis tools are usually available. Some general-purpose tools,
such as Information Builders’ Site Analyzer, also are available.
Reference Tables / Encoded Data
Reference data is stored
in an external table (see discussion on Star Schema) and contains commonly used
translations of encoded values. The contents of these tables must be stored in
order to guarantee the ability to recover the original unencoded data, together
with effective from and effective to dates.
Data Model-Design Reference
Building a data warehouse
without first constructing a data model is very difficult and frustrating. When
a data model is used, metadata describing the mapping between the data model
and the physical design should be stored. This allows all ambiguities or uncertainties
to be resolved.
From the point of view of
the Query Manager of the data warehouse, the Metadata Repository
can be perceived to have three logical layers: the Information
Navigator, the Business Metadata, and the Technical
Metadata.
Figure-1 : Metadata Repository
Figure-1 above illustrates this concept. The query manager accesses the metadata
through the Information Navigator layer which is the topmost layer of the metadata
repository. The higher layers, in turn, access more detailed metadata components
resident in the lower layers whenever required.
0 comments:
Post a Comment