Tips On Access: Normalising Your Database

Solving Data Problems

Database design theory includes design standards called normal forms. The process of making your data and tables match these standards is called normalizing data or data normalization. By normalizing your data, you eliminate redundant information and organize your table to make it easier to manage the data and make future changes to the table and database structure. This process removes the insertion, deletion, and modification anomalies you may see. In normalizing your data, you usually divide large tables into smaller, easier to maintain tables. You can then use the technique of adding foreign keys to enable connections between the tables.

Data normalization is part of the database design process and is not specific nor unique to any particular RDBMS. There are, in order, first, second, third, Boyce-Codd, fourth, and fifth normal forms. Each normal form represents an increasingly stringent set of rules; that is, each normal form assumes that the requirements of the preceding forms have been met. Many relational database designers feel that, if their tables are in third normal form, most common design problems have been addressed. However, the higher level normal forms can be of use and are included here.

This can be a rather intimidating topic, particularly if you try to understand it by reading some of the more theoretical texts. However, normal forms serve a very practical purpose in that they are designed to eliminate certain problems in table design and, as a result, in the data they contain. The exact wording of definitions of the normal forms varies depending upon the source; a set of definitions is provided in this topic. Our approach to understanding the normal forms will be to examine the problems they address.

First Normal Form
A table is in first normal form (1NF) if there are no repeating groups.

A repeating group is a set of logically related fields or values that occur multiple times in one record.

A Practical Approach
The sample tables below do not comply with first normal form. Look for fields that contain too much data and repeating group of fields.

EMPLOYEES_PROJECTS_TIME

EmployeeID	Name	Project	Time
EN1-26	Sean O'Brien	30-452-T3, 30-457-T3, 32-244-T3	0.25, 0.40, 0.30
EN1-33	Amy Guya	30-452-T3, 30-382-TC, 32-244-T3	0.05, 0.35, 0.60
EN1-35	Steven Baranco	30-452-T3, 31-238-TC	0.15, 0.80
EN1-36	Elizabeth Roslyn	35-152-TC	0.90
EN1-38	Carol Schaaf	36-272-TC	0.75
EN1-40	Alexandra Wing	31-238-TC, 31-241-TC	0.20, 0.70

A table with fields containing too much data.

The example above is also related to another design issue, namely, that each field should hold the smallest meaningful value and that there should not be multiple values in a single field.

Why is this table design a problem?

There would be no way to sort by last names nor to know which allocation of time belonged to which project.

EMPLOYEES_PROJECTS_TIME

EmpID	Last Name	First Name	Project1	Time1	Project2	Time2	Project3	Time3
EN1-26	O'Brien	Sean	30-452-T3	0.25	30-457-T3	0.40	32-244-T3	0.30
EN1-33	Guya	Amy	30-452-T3	0.05	30-382-TC	0.35	32-244-T3	0.60
EN1-35	Baranco	Steven	30-452-T3	0.15	31-238-TC	0.80
EN1-36	Roslyn	Elizabeth	35-152-TC	0.90
EN1-38	Schaaf	Carol	36-272-TC	0.75
EN1-40	Wing	Alexandra	31-238-TC	0.20	31-241-TC	0.70

A table with repeating groups of fields.

So why is this one a problem?

If an employee was assigned to a fourth project, you would have to add two new fields to the table. Also, it would be very difficult to total the amount of time devoted to a particular project.

The design problems addressed are very common-particularly among new designers who are accustomed to tracking data in a spreadsheet. Often, when building a spreadsheet, we arrange the data horizontally, laying it out across the spreadsheet. When designing tables, we have to think more vertically. Similar data belongs in the same column or field with a single value in each row.

Designing to meet first normal form

Now we will take the table you saw above and redesign it so it will comply with first normal form.

Look at the repeating groups of data. Identify tables and fields that will hold this data without the repeating groups. Think vertically and remember that similar data belongs in the same field.

Enter the sample data from the table to make sure you don't have repeating groups. If necessary, include foreign key field(s) to connect the tables.

EMPLOYEES

EmployeeID	Last Name	First Name
EN1-26	O'Brien	Sean
EN1-33	Guya	Amy
EN1-35	Baranco	Steven
EN1-36	Roslyn	Elizabeth
EN1-38	Schaaf	Carol
EN1-40	Wing	Alexandra

EMPLOYEES_PROJECTS_TIME

* ProjectNum	* EmployeeID	Time
30-328-TC	EN1-33	0.35
30-452-T3	EN1-26	0.25
30-452-T3	EN1-33	0.05
30-452-T3	EN1-35	0.15
31-238-TC	EN1-35	0.80
30-457-T3	EN1-26	0.40
31-238-TC	EN1-40	0.20
31-241-TC	EN1-40	0.70
32-244-T3	EN1-33	0.60
35-152-TC	EN1-36	0.90
36-272-TC	EN1-38	0.75

Mark the primary key field(s) and foreign keys in each table. Shown below with * indicating the Primary Key.

* EmployeeID	Last Name	First Name
EN1-26	O'Brien	Sean
EN1-33	Guya	Amy
EN1-35	Baranco	Steven
EN1-36	Roslyn	Elizabeth
EN1-38	Schaaf	Carol
EN1-40	Wing	Alexandra

EMPLOYEES_PROJECTS_TIME

* ProjectNum	* EmployeeID	Time
30-328-TC	EN1-33	0.35
30-452-T3	EN1-26	0.25
30-452-T3	EN1-33	0.05
30-452-T3	EN1-35	0.15
31-238-TC	EN1-35	0.80
30-457-T3	EN1-26	0.40
31-238-TC	EN1-40	0.20
31-241-TC	EN1-40	0.70
32-244-T3	EN1-33	0.60
35-152-TC	EN1-36	0.90
36-272-TC	EN1-38	0.75

If an employee was assigned to an additional project, it would involve merely adding a new record. Also, it would be much easier to search for a particular project number as they are all held in a single column.

Introducing Functional Dependency

Before we go any further, there's a new concept you need to be aware of and that'sfunctional dependency. A functional dependency is a relationship between fields so that the value in Field A determines the value in Field B, and there can be only one value in Field B. In that case, Field B is functionally dependent on Field A. Consider the following sample table:

Airport	City
National	Washington, DC
JFK	New York
LaGuardia	New York
Logan	Boston
Dulles	Washington, DC

Each airport name is unique and each airport can be in only one city. Therefore, City is functionally dependent on Airport. The value in the Airport field determines what the value will be in the City field (making Airport the determinant field) and there can be only one value in the City field. This does not need to work in the reverse. As shown in the table, a city can have more than one airport, so Airport is not functionally dependent on City; the value in City does not necessarily determine what the value in Airport will be.

You will sometimes see a functional dependency written in this format:

Determinant field(s) >> Functionally dependent field

as in:

Airport >> City

That's all for the first article on normal forms. View the articles detailing 2NF (2nd Normal Form) and 3NF (3rd Normal Form) for the next stages in your database development

Tips On Access

Tuesday, March 27, 2012

Normalising Your Database - First Normal Form (1NF)

No comments:

Post a Comment

About Me