Solving Data Problems
Database design theory includes design standards called normal forms. The process of making your data and tables match these standards is called normalizing data or data normalization. By normalizing your data, you eliminate redundant information and organize your table to make it easier to manage the data and make future changes to the table and database structure. This process removes the insertion, deletion, and modification anomalies you may see. In normalizing your data, you usually divide large tables into smaller, easier to maintain tables. You can then use the technique of adding foreign keys to enable connections between the tables.
Data normalization is part of the database design process and is not specific nor unique to any particular RDBMS. There are, in order, first, second, third, Boyce-Codd, fourth, and fifth normal forms. Each normal form represents an increasingly stringent set of rules; that is, each normal form assumes that the requirements of the preceding forms have been met. Many relational database designers feel that, if their tables are in third normal form, most common design problems have been addressed. However, the higher level normal forms can be of use and are included here.
This can be a rather intimidating topic, particularly if you try to understand it by reading some of the more theoretical texts. However, normal forms serve a very practical purpose in that they are designed to eliminate certain problems in table design and, as a result, in the data they contain. The exact wording of definitions of the normal forms varies depending upon the source; a set of definitions is provided in this topic. Our approach to understanding the normal forms will be to examine the problems they address.
First Normal Form
A table is in first normal form (1NF) if there are no repeating groups.
A repeating group is a set of logically related fields or values that occur multiple times in one record.
A Practical Approach
The sample tables below do not comply with first normal form. Look for fields that contain too much data and repeating group of fields.
EMPLOYEES_PROJECTS_TIME
EmployeeID
|
Name
|
Project
|
Time
|
EN1-26
|
Sean O'Brien
|
30-452-T3, 30-457-T3, 32-244-T3
|
0.25, 0.40, 0.30
|
EN1-33
|
Amy Guya
|
30-452-T3, 30-382-TC, 32-244-T3
|
0.05, 0.35, 0.60
|
EN1-35
|
Steven Baranco
|
30-452-T3, 31-238-TC
|
0.15, 0.80
|
EN1-36
|
Elizabeth Roslyn
|
35-152-TC
|
0.90
|
EN1-38
|
Carol Schaaf
|
36-272-TC
|
0.75
|
EN1-40
|
Alexandra Wing
|
31-238-TC, 31-241-TC
|
0.20, 0.70
|
The example above is also related to another design issue, namely, that each field should hold the smallest meaningful value and that there should not be multiple values in a single field.
Why is this table design a problem?
There would be no way to sort by last names nor to know which allocation of time belonged to which project.
EMPLOYEES_PROJECTS_TIME
EmpID
|
Last Name
|
First Name
|
Project1
|
Time1
|
Project2
|
Time2
|
Project3
|
Time3
|
EN1-26
|
O'Brien
|
Sean
|
30-452-T3
|
0.25
|
30-457-T3
|
0.40
|
32-244-T3
|
0.30
|
EN1-33
|
Guya
|
Amy
|
30-452-T3
|
0.05
|
30-382-TC
|
0.35
|
32-244-T3
|
0.60
|
EN1-35
|
Baranco
|
Steven
|
30-452-T3
|
0.15
|
31-238-TC
|
0.80
|
||
EN1-36
|
Roslyn
|
Elizabeth
|
35-152-TC
|
0.90
|
||||
EN1-38
|
Schaaf
|
Carol
|
36-272-TC
|
0.75
|
||||
EN1-40
|
Wing
|
Alexandra
|
31-238-TC
|
0.20
|
31-241-TC
|
0.70
|
A table with repeating groups
of fields.
So why is this one a problem?
If an employee was assigned to a fourth project, you would have to add two new fields to the table. Also, it would be very difficult to total the amount of time devoted to a particular project.
The design problems addressed are very common-particularly among new designers who are accustomed to tracking data in a spreadsheet. Often, when building a spreadsheet, we arrange the data horizontally, laying it out across the spreadsheet. When designing tables, we have to think more vertically. Similar data belongs in the same column or field with a single value in each row.
Designing to meet first normal form
Now we will take the table you saw above and redesign it so it will comply with first normal form.
Look at the repeating groups of data. Identify tables and fields that will hold this data without the repeating groups. Think vertically and remember that similar data belongs in the same field.
Enter the sample data from the table to make sure you don't have repeating groups. If necessary, include foreign key field(s) to connect the tables.
EMPLOYEES
EmployeeID
|
Last Name
|
First Name
|
EN1-26
|
O'Brien
|
Sean
|
EN1-33
|
Guya
|
Amy
|
EN1-35
|
Baranco
|
Steven
|
EN1-36
|
Roslyn
|
Elizabeth
|
EN1-38
|
Schaaf
|
Carol
|
EN1-40
|
Wing
|
Alexandra
|
EMPLOYEES_PROJECTS_TIME
* ProjectNum
|
* EmployeeID
|
Time
|
30-328-TC
|
EN1-33
|
0.35
|
30-452-T3
|
EN1-26
|
0.25
|
30-452-T3
|
EN1-33
|
0.05
|
30-452-T3
|
EN1-35
|
0.15
|
31-238-TC
|
EN1-35
|
0.80
|
30-457-T3
|
EN1-26
|
0.40
|
31-238-TC
|
EN1-40
|
0.20
|
31-241-TC
|
EN1-40
|
0.70
|
32-244-T3
|
EN1-33
|
0.60
|
35-152-TC
|
EN1-36
|
0.90
|
36-272-TC
|
EN1-38
|
0.75
|
* EmployeeID
|
Last Name
|
First Name
|
EN1-26
|
O'Brien
|
Sean
|
EN1-33
|
Guya
|
Amy
|
EN1-35
|
Baranco
|
Steven
|
EN1-36
|
Roslyn
|
Elizabeth
|
EN1-38
|
Schaaf
|
Carol
|
EN1-40
|
Wing
|
Alexandra
|
EMPLOYEES_PROJECTS_TIME
* ProjectNum
|
* EmployeeID
|
Time
|
30-328-TC
|
EN1-33
|
0.35
|
30-452-T3
|
EN1-26
|
0.25
|
30-452-T3
|
EN1-33
|
0.05
|
30-452-T3
|
EN1-35
|
0.15
|
31-238-TC
|
EN1-35
|
0.80
|
30-457-T3
|
EN1-26
|
0.40
|
31-238-TC
|
EN1-40
|
0.20
|
31-241-TC
|
EN1-40
|
0.70
|
32-244-T3
|
EN1-33
|
0.60
|
35-152-TC
|
EN1-36
|
0.90
|
36-272-TC
|
EN1-38
|
0.75
|
If an employee was assigned to an additional project, it would involve merely adding a new record. Also, it would be much easier to search for a particular project number as they are all held in a single column.
Introducing Functional Dependency
Before we go any further, there's a new concept you need to be aware of and that'sfunctional dependency. A functional dependency is a relationship between fields so that the value in Field A determines the value in Field B, and there can be only one value in Field B. In that case, Field B is functionally dependent on Field A. Consider the following sample table:
Airport
|
City
|
National
|
Washington, DC
|
JFK
|
New York
|
LaGuardia
|
New York
|
Logan
|
Boston
|
Dulles
|
Washington, DC
|
Each airport name is unique and each airport can be in only one city. Therefore, City is functionally dependent on Airport. The value in the Airport field determines what the value will be in the City field (making Airport the determinant field) and there can be only one value in the City field. This does not need to work in the reverse. As shown in the table, a city can have more than one airport, so Airport is not functionally dependent on City; the value in City does not necessarily determine what the value in Airport will be.
You will sometimes see a functional dependency written in this format:
Determinant field(s) >> Functionally dependent field
as in:
Airport >> City
That's all for the first article on normal forms. View the articles detailing 2NF (2nd Normal Form) and 3NF (3rd Normal Form) for the next stages in your database development
No comments:
Post a Comment