Are We There Yet?
Designing
your tables to comply with 3NF (third normal form) is usually
sufficient to ensure good design so, most of the time, you can stop right here.
The higher normal forms address less common data problems.
They are included here so you'll know what they are and what to do about them
if you come across them. These normal forms do get a bit more complicated.
The truth is
there are times when you will want to denormalize your data. That means
you may sometimes want to put the data in two normalized tables back into one
denormalized table. The reasons for doing this are usually associated with
performance, for example, the speed at which queries run. But at least it will
be a conscious decision and represents one of the beauties of normalization.
Properly normalized tables can always be put back together with no loss or gain
of data.
Higher Normal Forms - Boyce-Codd Normal Form:
A table is
in third normal form (3NF) and all determinants are candidate keys.
Boyce-Codd normal form (BCNF) can be thought of as a "new" third
normal form. It was introduced to cover situations that the "old"
third normal form did not address. Keep in mind the mean of a determinant
(determines the value in another field) and candidate keys (qualify for
designation as primary key). This normal form applies to situations where you
have overlapping candidate keys.
If a table has no non-key fields, it is automatically in BCNF.
A Practical
Approach
Look for potential problems in updating existing data (modification
anomaly) and in entering new data (insertion anomaly).
Imagine that you were designing a table for a college to hold information
about courses, students, and teaching assistants. You have the following business
rules.
§ Each course
can have many students.
§ Each student
can take many courses.
§ Each course
can have multiple teaching assistants (TAs).
§ Each TA is
associated with only one course.
§ For each
course, each student has one TA.
Some sample
data:
CourseNum
|
Student
|
TA
|
ENG101
|
Jones
|
Clark
|
ENG101
|
Grayson
|
Chen
|
ENG101
|
Samara
|
Chen
|
MAT350
|
Grayson
|
Powers
|
MAT350
|
Jones
|
O'Shea
|
MAT350
|
Berg
|
Powers
|
COURSES_STUDENTS_TA's
|
To uniquely identify each record, you could choose CourseNum + Student as a
primary key. This would satisfy third normal form also because the combination
of CourseNum and Student determines the value in TA. Another candidate key
would be Student + TA. In this case, you have overlapping candidate keys
(Student is in both). The second choice, however, would not comply with third
normal form because the CourseNum is not determined by the combination of
Student and TA; it only depends on the value in TA (see the business rules).
This is the situation that Boyce-Codd normal form addresses; the combination of
Student + TA could not be considered to be a candidate key.
If you wanted to assign a TA to a course before any students enrolled, you
couldn't because Student is part of the primary key. Also, if the name of a TA
changed, you would have to update it in multiple records.
If you assume you have just these fields, this data would be better stored
in three tables: one with CourseNum and Student, another with Student and TA,
and a third with CourseNum and TA.
*CourseNum
|
*Student
|
ENG101
|
Jones
|
ENG101
|
Grayson
|
ENG101
|
Samara
|
MAT350
|
Grayson
|
MAT350
|
Jones
|
MAT350
|
Berg
|
COURSES
|
*Student
|
*TA
|
Jones
|
Clark
|
Grayson
|
Chen
|
Samara
|
Chen
|
Grayson
|
Powers
|
Jones
|
O'Shea
|
Berg
|
Powers
|
STUDENTS
|
*CourseNum
|
*TA
|
ENG101
|
Clark
|
ENG101
|
Chen
|
MAT350
|
O'Shea
|
MAT350
|
Powers
|
TA's
|
Above, showing tables that comply with BCNF
Fourth Normal Form:
A table is
in Boyce-Codd normal form (BCNF) and there are no multi-valued dependencies.
A multi-valued
dependency occurs when, for each value in field A, there is a set of
values for field B and a set of values for field C but fields B and C are not
related.
A Practical
Approach
Look for repeated or null values in non-key fields.
A multi-valued dependency occurs when the table contains fields that are
not logically related. An often
used example is the following table:
*Movie
|
*Star
|
*Producer
|
Once Upon a Time
|
Julie Garland
|
Alfred Brown
|
Once Upon a Time
|
Mickey Rooney
|
Alfred Brown
|
Once Upon a Time
|
Julie Garland
|
Muriel Humphreys
|
Once Upon a Time
|
Mickey Rooney
|
Muriel Humphreys
|
Moonlight
|
Humphrey Bogart
|
Alfred Brown
|
Moonlight
|
Julie Garland
|
Alfred Brown
|
MOVIES
|
A movie can have more than one star and more than one producer. A star can
be in more than one movie. A producer can produce more than one movie. The
primary key would have to include all three fields and so this table would be
in BCNF. But you have unnecessarily repeated values, with the data maintenance
problems that causes, and you would have trouble with deletion anomalies.
The Star and the Producer really aren't logically related. The Movie
determines the Star and the Movie determines the Producer. The answer is to
have a separate table for each of those logical relationships - one holding
Movie and Star and the other with Movie and Producer, as shown below:
*Movie
|
*Star
|
Once Upon a Time
|
Julie Garland
|
Once Upon a Time
|
Mickey Rooney
|
Moonlight
|
Humphrey Bogart
|
Moonlight
|
Julie Garland
|
STARS
|
*Movie
|
*Producer
|
Once Upon a Time
|
Alfred Brown
|
Once Upon a Time
|
Muriel Humphreys
|
Moonlight
|
Alfred Brown
|
PRODUCERS
|
Above, showing tables that comply with 4NF
Below is another example of a common design error, and it's easily spotted
by all the missing or blank values.
DeptCode
|
ProjectNum
|
ProjectMgrID
|
Equipment
|
PropertyID
|
IS
|
36-272-TC
|
EN1-15
|
CD-ROM
|
657
|
IS
|
|
|
VGA desktop monitor
|
305
|
AC
|
35-152-TC
|
EN1-15
|
|
|
AC
|
|
|
Dot-matrix printer
|
358
|
AC
|
|
|
Calculator with tape
|
239
|
TW
|
30-452-T3
|
EN1-10
|
486 PC
|
275
|
TW
|
30-457-T3
|
EN1-15
|
|
|
TW
|
31-124-T3
|
EN1-15
|
Laser printer
|
109
|
TW
|
31-238-TC
|
EN1-15
|
Hand-held scanner
|
479
|
RI
|
|
|
Fax machine
|
775
|
MK
|
|
|
Laser printer
|
858
|
MK
|
|
|
Answering machine
|
187
|
TW
|
31-241-TC
|
EN1-15
|
Standard 19200 bps modem
|
386
|
SL
|
|
|
486 Laptop PC
|
772
|
SL
|
|
|
Electronic notebook
|
458
|
PROJECTS_EQUIPMENT
|
Above, a table with many null values (note: it also does not comply with
3NF and BCNF)
It's the same problem here because not all of the data is logically
related. As usual, the answer is more tables - one to hold the information on
the equipment assigned to departments (with PropertyID as the primary key) and
another with projects and departments. You'd have to know the business rules to
know whether a project might involve more than one department or manager and be
able to figure out the primary key. Assuming a project can have only one
manager and be associated with only one department, the tables would be as
follows.
*PropertyID
|
Equipment
|
DeptCode
|
657
|
CD-ROM
|
IS
|
305
|
VGA desktop monitor
|
IS
|
358
|
Dot-matrix printer
|
AC
|
239
|
Calculator with tape
|
AC
|
275
|
486 PC
|
TW
|
109
|
Laser printer
|
TW
|
479
|
Hand-held scanner
|
TW
|
775
|
Fax machine
|
RI
|
858
|
Laser printer
|
MK
|
187
|
Answering machine
|
MK
|
386
|
Standard 19200 bps modem
|
TW
|
772
|
486 Laptop PC
|
SL
|
458
|
Electronic notebook
|
SL
|
EQUIPMENT
|
*ProjectNum
|
ProjectMgrID
|
DeptCode
|
36-272-TC
|
EN1-15
|
IS
|
35-152-TC
|
EN1-15
|
AC
|
30-452-T3
|
EN1-10
|
TW
|
30-457-T3
|
EN1-15
|
TW
|
31-124-T3
|
EN1-15
|
TW
|
31-238-TC
|
EN1-15
|
TW
|
31-241-TC
|
EN1-15
|
TW
|
PROJECTS_EQUIPMENT
|
Above, tables that eliminate the null values and comply with 4NF
Fifth Normal Form:
A table is
in fourth normal form (4NF) and there are no cyclic dependencies.
A cyclic
dependency can occur only when you have a multi-field primary key
consisting of three or more fields. For example, let's say your primary key
consists of fields A, B, and C. A cyclic dependency would arise if the values
in those fields were related in pairs of A and B, B and C, and A and C.
Fifth normal
form is also called projection-join normal form. A projection is
a new table holding a subset of fields from an original table. When properly
formed projections are joined, they must result in the same set of data that
was contained in the original table.
A Practical
Approach
Look for the number of records that will have to be added or maintained
Following is some sample data about buyers, the products they buy, and the
companies they buy from on behalf of MegaMall, a large department store.
*Buyer
|
*Product
|
*Company
|
Chris
|
jeans
|
Levi
|
Chris
|
jeans
|
Wrangler
|
Chris
|
shirts
|
Levi
|
Lori
|
jeans
|
Levi
|
BUYING
|
Above, a table with cyclic dependencies
The primary key consists of all three fields. One data maintenance problem
that occurs is that you need to add a record for every buyer who buys a product
for every company that makes that product or they can't buy from them. That may
not appear to be a big deal in this sample of 2 buyers, 2 products, and 2
companies (2 X 2 X 2 = 8 total records). But what if you went to 20 buyers, 50
products, and 100 companies (20 X 50 X 100 = 100,000 potential records)? It
quickly gets out of hand and becomes impossible to maintain.
You might be tempted to solve this by dividing this into the following two
tables.
*Buyer
|
*Product
|
Chris
|
jeans
|
Chris
|
shirts
|
Lori
|
jeans
|
BUYERS
|
*Product
|
*Company
|
jeans
|
Wrangler
|
jeans
|
Levi
|
shirts
|
Levi
|
PRODUCTS
|
However, if you joined the two tables above on the Product field, it would
produce a record not part of the original data set (it would say that Lori buys
jeans from Wrangler). This is
where the projection-join concept comes in.
The correct solution would be three tables:
*Buyer
|
*Product
|
Chris
|
jeans
|
Chris
|
shirts
|
Lori
|
jeans
|
BUYERS
|
*Product
|
*Company
|
jeans
|
Wrangler
|
jeans
|
Levi
|
shirts
|
Levi
|
PRODUCTS
|
*Buyer
|
*Company
|
Chris
|
Levi
|
Chris
|
Wrangler
|
Lori
|
Levi
|
COMPANIES
|
Above, tables that comply with 5NF
When the first two tables are joined by Product and the result joined to
the third table by Buyer and Company, the result is the original set of data.
In our scenario of 20 buyers, 50 products, and 100 companies, you would
have, at most, 1000 records in the Buyers table (20 X 50), 5000 records in the
Products table (50 X 100), and 2000 records in the Companies table (20 X 100).
With a maximum of 8000 records, these tables would be much easier to maintain
than the possible 100,000 records we saw earlier.
No comments:
Post a Comment