Tuesday, March 27, 2012

Normalising Your Database - First Normal Form (1NF)


Solving Data Problems

Database design theory includes design standards called normal forms. The process of making your data and tables match these standards is called normalizing data or data normalization. By normalizing your data, you eliminate redundant information and organize your table to make it easier to manage the data and make future changes to the table and database structure. This process removes the insertion, deletion, and modification anomalies you may see. In normalizing your data, you usually divide large tables into smaller, easier to maintain tables. You can then use the technique of adding foreign keys to enable connections between the tables.

Data normalization is part of the database design process and is not specific nor unique to any particular RDBMS. There are, in order, first, second, third, Boyce-Codd, fourth, and fifth normal forms. Each normal form represents an increasingly stringent set of rules; that is, each normal form assumes that the requirements of the preceding forms have been met. Many relational database designers feel that, if their tables are in third normal form, most common design problems have been addressed. However, the higher level normal forms can be of use and are included here.

This can be a rather intimidating topic, particularly if you try to understand it by reading some of the more theoretical texts. However, normal forms serve a very practical purpose in that they are designed to eliminate certain problems in table design and, as a result, in the data they contain. The exact wording of definitions of the normal forms varies depending upon the source; a set of definitions is provided in this topic. Our approach to understanding the normal forms will be to examine the problems they address.

First Normal Form
A table is in first normal form (1NF) if there are no repeating groups.

A repeating group is a set of logically related fields or values that occur multiple times in one record.

A Practical Approach
The sample tables below do not comply with first normal form. Look for fields that contain too much data and repeating group of fields.

EMPLOYEES_PROJECTS_TIME
EmployeeID
Name
Project
Time
EN1-26
Sean O'Brien
30-452-T3, 30-457-T3, 32-244-T3
0.25, 0.40, 0.30
EN1-33
Amy Guya
30-452-T3, 30-382-TC, 32-244-T3
0.05, 0.35, 0.60
EN1-35
Steven Baranco
30-452-T3, 31-238-TC
0.15, 0.80
EN1-36
Elizabeth Roslyn
35-152-TC
0.90
EN1-38
Carol Schaaf
36-272-TC
0.75
EN1-40
Alexandra Wing
31-238-TC, 31-241-TC
0.20, 0.70
A table with fields containing too much data.

The example above is also related to another design issue, namely, that each field should hold the smallest meaningful value and that there should not be multiple values in a single field.
Why is this table design a problem?
There would be no way to sort by last names nor to know which allocation of time belonged to which project.

EMPLOYEES_PROJECTS_TIME
EmpID
Last Name
First Name
Project1
Time1
Project2
Time2
Project3
Time3
EN1-26
O'Brien
Sean
30-452-T3
0.25
30-457-T3
0.40
32-244-T3
0.30
EN1-33
Guya
Amy
30-452-T3
0.05
30-382-TC
0.35
32-244-T3
0.60
EN1-35
Baranco
Steven
30-452-T3
0.15
31-238-TC
0.80
EN1-36
Roslyn
Elizabeth
35-152-TC
0.90
EN1-38
Schaaf
Carol
36-272-TC
0.75
EN1-40
Wing
Alexandra
31-238-TC
0.20
31-241-TC
0.70

A table with repeating groups of fields.


So why is this one a problem?
If an employee was assigned to a fourth project, you would have to add two new fields to the table. Also, it would be very difficult to total the amount of time devoted to a particular project.

The design problems addressed are very common-particularly among new designers who are accustomed to tracking data in a spreadsheet. Often, when building a spreadsheet, we arrange the data horizontally, laying it out across the spreadsheet. When designing tables, we have to think more vertically. Similar data belongs in the same column or field with a single value in each row.

Designing to meet first normal form
Now we will take the table you saw above and redesign it so it will comply with first normal form.
Look at the repeating groups of data. Identify tables and fields that will hold this data without the repeating groups. Think vertically and remember that similar data belongs in the same field.
Enter the sample data from the table to make sure you don't have repeating groups. If necessary, include foreign key field(s) to connect the tables.

EMPLOYEES
EmployeeID
Last Name
First Name
EN1-26
O'Brien
Sean
EN1-33
Guya
Amy
EN1-35
Baranco
Steven
EN1-36
Roslyn
Elizabeth
EN1-38
Schaaf
Carol
EN1-40
Wing
Alexandra

EMPLOYEES_PROJECTS_TIME 
* ProjectNum
* EmployeeID
Time
30-328-TC
EN1-33
0.35
30-452-T3
EN1-26
0.25
30-452-T3
EN1-33
0.05
30-452-T3
EN1-35
0.15
31-238-TC
EN1-35
0.80
30-457-T3
EN1-26
0.40
31-238-TC
EN1-40
0.20
31-241-TC
EN1-40
0.70
32-244-T3
EN1-33
0.60
35-152-TC
EN1-36
0.90
36-272-TC
EN1-38
0.75
Mark the primary key field(s) and foreign keys in each table. Shown below with * indicating the Primary Key.

* EmployeeID
Last Name
First Name
EN1-26
O'Brien
Sean
EN1-33
Guya
Amy
EN1-35
Baranco
Steven
EN1-36
Roslyn
Elizabeth
EN1-38
Schaaf
Carol
EN1-40
Wing
Alexandra
EMPLOYEES_PROJECTS_TIME 
* ProjectNum
* EmployeeID
Time
30-328-TC
EN1-33
0.35
30-452-T3
EN1-26
0.25
30-452-T3
EN1-33
0.05
30-452-T3
EN1-35
0.15
31-238-TC
EN1-35
0.80
30-457-T3
EN1-26
0.40
31-238-TC
EN1-40
0.20
31-241-TC
EN1-40
0.70
32-244-T3
EN1-33
0.60
35-152-TC
EN1-36
0.90
36-272-TC
EN1-38
0.75
If an employee was assigned to an additional project, it would involve merely adding a new record. Also, it would be much easier to search for a particular project number as they are all held in a single column.

Introducing Functional Dependency
Before we go any further, there's a new concept you need to be aware of and that'sfunctional dependency. A functional dependency is a relationship between fields so that the value in Field A determines the value in Field B, and there can be only one value in Field B. In that case, Field B is functionally dependent on Field A. Consider the following sample table:

Airport
City
National
Washington, DC
JFK
New York
LaGuardia
New York
Logan
Boston
Dulles
Washington, DC
Each airport name is unique and each airport can be in only one city. Therefore, City is functionally dependent on Airport. The value in the Airport field determines what the value will be in the City field (making Airport the determinant field) and there can be only one value in the City field. This does not need to work in the reverse. As shown in the table, a city can have more than one airport, so Airport is not functionally dependent on City; the value in City does not necessarily determine what the value in Airport will be.

You will sometimes see a functional dependency written in this format:

Determinant field(s) >> Functionally dependent field
as in:

Airport >> City

That's all for the first article on normal forms. View the articles detailing 2NF (2nd Normal Form) and 3NF (3rd Normal Form) for the next stages in your database development

No comments:

Post a Comment