MODELLING DATA
All organizations need to
store the information and this is done through database. Data models refer to
the conceptual model of the data and the underlying relationships among them.
DBMS abstract some generic structures to represent conceptually every possible
file structure.
Data models can be
classified in two classes viz; Record-based logical Models and Object-based
logical models. Record-Based logical data models can be classified (Sadgopan,
1997) into the following categories:
Hierarchical Models: These are the early data models used in 1970’s. Hierarchical
models capture the intuitive hierarchy of the data elements. The early generation
of large DBMS e.g. IMS belongs to the hierarchical data models. Even today some
large databases are maintained on IMS platform.
Network Models: Since hierarchical models are unable to represent data items
that existing at two different level of hierarchy, network models were
proposed. The notable systems built using this model were ADABAS and DBMS-10 on
DEC-10 machines.
B2Bdata provides world class B2B Phone and email lists for a wide range of industries and regions.
Relational Models: Though network models were quite powerful, they lacked in elegance. The systems
built on this data model were dBase, Xbase and ORACLE. Almost all commercial
systems presently available like Oracle 8i, 9i, 11 etc., SQL Server, MySQL are built
on the relational models. There are 12 rules that are required to be followed
in a relational model.
Data modelling is
achieved in two levels:
1) E-R modelling that
builds the conceptual model of the data.
2) Normalization, which
removes the redundancies.
Object Oriented Logical
Models can be of several types. One of them is Entity Relationship model. The Entity
Relationship (ER) data model is the most common data model which is based
on the perception of the real world.. This model allows us to represent the
relationships among the objects called entities. It uses following three concepts
to represent itself graphically. Although we have discussed them earlier, we revisit
them here once again:
Entity: Any real-world object that has certain properties (attributes of
its own) and this object are uniquely identified by the system on the basis of
these. This is represented by a rectangle with the entity name specified in the
centre. For e.g.: student is an entity.
Attribute: Attribute is the properties of an entity, like a student entity
can have attribute as student name, student roll no., student class etc.
Attributes are placed inside the circles and attached to the entities and
relationships.
Relationship: Relationship specifies the meaningful relation between two
entities. This relationship can also have attributes. A rhombus represents
these with a relation specified in it. For e.g.: if we have student and
course as entities, then we can relate these entities as student enrols for a
course.
Fig-1
Degree of Relationship
(DOR): This specifies the occurrence of entity with
other entity.
One to One (1:1): In this, there can be almost one related occurrence for each entity.
For example, A single manager manages one department and one department can be
managed by a single manager. It is represented as follows.
One to Many (1:N): Here, for one occurrence of the first entity, there may exist many
occurrences of the second entity and for every occurrence of the second entity,
these exists only one occurrence of the first entity e.g.:
Here one manager
supervises many employees and every employee reports to only one manager.
Many to Many (M: N): In this degree of relationship, for one occurrence of the first entity, these may
exists many occurrences of the second entity and also for every occurrences of
second entity, there exists many occurrences of the first entity e.g.: One
employee can work for many projects and many employees can handle one project.
Normalization is another concept in data modelling. Normalization is a process
of converting complex data into simpler form without any loss of information.
The Normalization technique ensures that there is no dependent and duplicate
data when the E-R model has been made. Normalization first of all, converts the
data into tables or relations. These tables are checked for redundancy. Finally
the form is converted to a database definition. Why do we need normalization?
Normalization improves the database design. Whenever the design is modified, we
need to re-organize the data.
Normalization reduces the
need of reorganizing the data. Process of normalization removes all undesirable
consequences that may occur due to inserting, updating and deleting values from
the tables. Normalization reduces the unnecessary repetition of data
(redundancy) that causes the problem with storage and retrieval of data. Poorly
designed databases have several data management difficulties e.g. Consider
a relation.
LIBRARY (MNAME, MADD,
B_TITLE, ISS_DATE, DUE_DATE)
This relation has the
following problems:
1) Here duplicate data
exists because member address will be repeated if a member issues more than one
book.
2) Insertion Problems/
Anomalies: If a member does not issue a book then we cannot record the
address of a member. Here MNAME & B_TITLE are both necessary and hence we
cannot leave any one of them as blank.
3) Updating Problem: If
a member has issued two or more books, then address is necessary in all and if
a member’s address needs to be updated, then all the records needs to be
updated.
4) Delete Anomalies: Here
if we delete the issue details, then address would not be there.
But all the above
problems can be eliminated by normalization. If this single relation is broken
down in two relations:
MEMBER_DETAILS (MNAME,
MADDR)
MEMBER_TRANS (MNAME,
B_TITLE, ISS_DATE, DUE_DATE)
But this requires a join
between these two relations, which is very expensive. Now lets look into several
normal forms with the help of an example:
Table-1
Roll No.
|
Name
|
Subject
|
Mark
|
1
|
Amit
|
English
|
65
|
Hindi
|
72
|
||
Math
|
70
|
||
2
|
Seema
|
English
|
70
|
3
|
Anjali
|
English
|
54
|
Hindi
|
60
|
To make this unnormalized
data into 1 NF (First Normal Form), we have to make each cell containing one
value. And all the repeated information should be removed.
Table-2
Roll No.
|
Name
|
Subject
|
Mark
|
1
|
Amit
|
English
|
65
|
1
|
Amit
|
Hindi
|
72
|
1
|
Amit
|
Math
|
70
|
2
|
Seema
|
English
|
70
|
3
|
Anjali
|
English
|
54
|
3
|
Anjali
|
Hindi
|
60
|
The data is still redundant
in this form. Here combination of two keys (composite key), Roll No. and
Subject, is a primary key (PK, that uniquely identifies a record). But the
attributes of this table depend on the part of the PK. Roll No. and subject determines
marks, Roll No. determines Name, Name has no dependency on the attribute
subject. This may lead to following problems:
Insertion: the name of a subject cannot be recorded until a student gives
any exam.
Updation: Roll No. and name is repeated several times. If name of the
student is to be changed, then at every place the name needs to be updated
otherwise may lead to inconsistencies.
Deletion: if a student has not given any of the three exams, then the name
of that particular student will be deleted.
So, for overcoming these
problems, this first normal form needs to be decomposed and converted into
Second Normal Form (2NF) without any loss of information.
Table-3
Roll No.
|
Name
|
Roll No.
|
Name
|
Subject
|
Mark
|
1
|
Amit
|
1
|
Amit
|
English
|
65
|
2
|
Seema
|
1
|
Amit
|
Hindi
|
72
|
3
|
Anjali
|
1
|
Amit
|
Math
|
70
|
2
|
Seema
|
English
|
70
|
||
3
|
Anjali
|
English
|
54
|
||
3
|
Anjali
|
Hindi
|
60
|
Other major normal forms
are 3NF and 4NF. Third normal form or 3NF is used to prevent loss of
information and dependencies preserving decomposition. Fourth normal form or 4NF is
used to preserve multi-valued dependency, which is essentially a constraint.
There are several other normal forms like project-join NF, domain-key NF.
0 comments:
Post a Comment