The Relational Database Design Process:
Before you build the tables and other objects that will make up your
system, it is important to take time to design it. A good design is the
keystone to creating a system that does what you want it to do effectively,
accurately and efficiently.
The BASIC steps in designing a database system
§ To determine
the purpose of your system
§ To determine
the tables that you need in your system
§ To determine
the fields that you need in the tables
§ To identify
fields with unique values
§ To determine
the relationships between tables
§ To refine the design
§ To add data
(populate tables) and create other system objects
NOTE: Some
of the listed steps (determining tables, data fields and relationships) may
cross and be repeated a few times when designing a relational database.
Building a database is a
process of examining the data that is necessary and useful for an application,
then breaking it down into a relatively simple row and column format.
To determine
the purpose of your system you need to know what
information you want from the database (detailed scenario). From that, you can
determine what subjects you need to store facts about (the tables) and what
facts you need to store about each subject (the data fields).
So, make sure that you do answer the following questions:
1. What type of
data should the system keep track off?
2. What would
the user want to know about the data?
3. What would
the user want to do to the data?
The first
step in creating a database is creating a plan that serves
both as a guide to be used when implementing the database and
as a functional specification for the database after it has
been implemented. The nature and complexity of a database application, as well
as the process of planning it, can vary greatly. In the first case, the
database design may be little more than a few notes on some scratch paper. In
the latter case, the design may be a formal document with hundreds of pages
that contain every possible detail about the database.
NOTE: Modeling
the structure on paper before opening computer and starting coding is highly
recommendable. Planning may seem time-consuming up front, but not planning is
twice as time-consuming later.
To determine
the tables can be the trickiest step in the database design process. That is
because the results that you want from the database (e.g. the reports that you
want to print, the forms that you want to use, the questions that you want
answered) don't necessarily provide clues about the structure of the tables
that can produce them. In fact, it may be better to sketch and rework your
design on paper first.
When you design your tables, divide up pieces of information by keeping
following fundamental design principles in mind:
1. Each piece
of information is stored in only one table
2. A table
should not contain duplicate information
3. Each table
should contain information about only one subject
4. Information
should not be duplicated between tables
Columns contain the attributes of the entities.
To identify
fields with unique values (define table primary key) - In
order to uniquely identify each individual record in a table and to be able to
relate information stored in separate tables. By having a different primary key
in each record you can tell two records apart - the goal of setting primary
keys is to ensure each records uniqueness. This is calledentity integrity in
the database management.
The primary key types:
1. Single-field primary
keys (AutoNumber or User-Defined type)
2.
Miltiple-field primary
keys
Notes:
§ The power of
a relational database system comes from its ability to quickly find and bring
together (related) information stored in separate tables by using queries,
forms and generating reports. In order to do this, each table should include a
field or set of fields to uniquely identify each record stored in the table.
This information is called theprimary key of the table. Once you
specify a primary key for the table, to ensure uniqueness, the system will
prevent any duplicated or null values from being entered in the primary key
fields.
§ To be able
to set relationships between tables, you must establish a link between fields
that contain common and related information. The link field in another table is
known as a foreign key data field. A relationship is established
by linking these key fields between tables - the primary key in the 'primary'
table and a foreign key in the 'related' table.
So, every
table must have a primary key - one or more data fields whose
contents are unique to each record. When linking tables you link the primary
key field from one (primary or 'parent') table to a field in another (related
or 'child') table that has the same name, structure and data type. By matching
the values from the primary key to the foreign key in both tables, you can
relate two records.
Tables store data about
entities.
The process
of designing a relational database includes making sure that a table contains
only data directly related to the primary key, that each data field contains
only one item of data, and that redundant (duplicated) data is eliminated. The
task of the database designer is to structure the data in a way that eliminates
unnecessary duplication and provides a rapid search path to all necessary information.
This process of specifying and defining tables, keys, columns and relationships
in order to create an efficient database is called normalisation.
Normalisation is part of
successful database design. Without normalisation, database systems can be inaccurate,
slow and inefficient and they might not produce the data you expect.
We use the
normalisation process to design efficient and functional databases. By
normalising, we store data where it logically and uniquely belongs.
Normalisation process involves a few steps and each step is called a
form. Forms range from the first normal form (1NF) to fifth normal form
(5NF). There is also one higher level, called domain key normal form (DK/NF).
However, we will cover the first 3 forms.
When normalising a database you should achieve four goals:
1. Arranging
data into logical groups such that each group describes a part of the whole
2. Minimizing
the amount of duplicated data stored in a database
3. Building a
database in which you can access and manipulate the data quickly and
efficiently without compromising the integrity of the data storage
4. Organising
the data such that, when you modify it, you make the changes in only one place
Normalisation
is a complex process with many specific rules and different intensity levels.
In its full definition, normalisation is the process of discarding
repeating groups, minimizing redundancy, eliminating composite keys for partial
dependency and separating non-key attributes.
In simple
terms, the rules for normalisation can be summed up in a single phrase: "Each
attribute (column) must be a fact about the key, the whole key and nothing but
the key". Said another way, each table should describe only one type
of entity (information).
A Properly normalised design allows you to:
§ Use storage space efficiently
§ Eliminate redundant data
§ Reduce or eliminate inconsistent
data
§ Ease the database maintenance burden
Relational
database theorists have divided normalisation into several rules called normal
forms:
§ Un-normalised
data = repeating groups, inconsistent data, delete and insert anomalies.
§ First Normal Form (no
repeating groups) = each cell of a table must contain a
single value, and the table must not contain repeating groups.
§ Second Normal Form (each
column must depend on the entire primary key) = must
have met all of the database requirements for the 1st form, and data,
which does not directly depend on the table's primary key must be moved into
another table.
§ Third Normal Form (each
column must depend directly on the primary key) = must
have met all database requirements for both 1st and 2nd forms, and
all fields that can be derived from data contained in the other fields and
tables must be removed.
NOTE: You
must be able to reconstruct the original flat view of the data. If
you violate this rule, you will have defeated the purpose of normalising the
database.
To determine
the relationships between tables - Now that you've
defined your information into tables and identified primary key fields, you
need a way to tell the system how to bring related information back together in
meaningful ways. To do this, you define relationships between tables.
Relationship is an association between common fields (columns) in two tables. A
relationship works by matching data in key fields. In most cases, these matching
fields are the primary key from one table, which provides a unique identifier
for each record, and a foreign key in the other table. The kind of relationship
that the system creates depends on how the related fields are defined. When you
physically join two tables by connecting fields with related information, you
create a relationship that is recognized by the system (Access). The specified
relationship is important. It tells the system how to find and display
information from fields in two or more tables. The program needs to know
whether to look for only one record in a table or to look for several records
on the basis of the relationship.
There are 3
relationship types:
1. One-to-one
(1:1) - each record in Table A can have only one matching record in Table
B, and each record in Table B can be related to only one record in Table A.
This type of relationship is not frequently used in database systems, but it
can be very useful way to link two tables together. However, the information
related in this way could be in one table. You might use a one-to-one
relationship to divide a table with many fields in order to isolate part of a
table for security reasons, or to store information that applies only to a
subset of the main table, or for efficient use of space. A one-to-one
relationship is created if both of the related fields are primary keys or have
unique indexes.
2. One-to-many
(1:M) - is the most common type of relationship and it is used to relate
one record from the 'primary' table with many records in the 'related' table.
In a one-to-many relationship, a record ('parent') in Table A can have many
matching records ('children') in Table B, but a record ('child') in Table B has
only one matching record ('parent') in Table A. This kind of relationship is
created if only one of the related fields is a primary key or has a unique
index.
3. Many-to-many
(M:M) - is used to relate many records in the table A with many records in
the table B. A record ('parent') in Table A can have many matching records
('children') in Table B, and a record ('child') in Table B can have many
matching records ('parents') in Table A. It is the hardest relationship to
understand and it is not correct. By breaking it into two one-to-many
relationships and creating a new (junction/link) table to stand between the two
existing tables will enable correct and appropriate relationship setting. A
many-to-many relationship is really two one-to-many
relationships with a junction/link table. NOTE: Link table
usually has the composite primary key that consists of the foreign keys from
both tables A and B.
When tables
are linked (joined) together, one table is usually called 'parent' or 'primary'table
('one end' in the 1:M relationship and 'one end' (primarily created
table) in the 1:1 relationship) and another table is called 'child' or 'related' table
('many end' in the 1:M relationship and 'one end' (subsequently
created table) in the 1:1 relationship). This is known as a parent-child
relationship between tables. Records in a primary table cannot be
modified or deleted if there are related records in the 'child' table - there
will not be an orphan (related) record without a parent (primary) record. Also,
a new record cannot be added to the related table if there is no associated
record in the primary table.
In addition
to specifying relationships between two tables in a database, you also set upreferential
integrity rules that help in maintaining a degree of accuracy between
tables. Setting referential integrity rules would prevent unwanted and
accidental deletions and modifications of the 'parent' records that relate to
records in the 'child' table. This type of problem could be catastrophic for
any system. The referential integrity rules keep the relationships between
tables intact and unbroken in a relational database management system -
referential integrity prohibits you from changing existing data in ways that
invalidate and harm the links between tables.
NOTE:
Referential integrity operates strictly on the basis of the tables' key fields.
It checks each time a key field, whether primary or foreign, is added, changed
or deleted. If any of these listed actions creates an invalid relationship
between two tables, it is said to violate referential integrity.
Referential integrity is a system of rules that Microsoft Access uses to ensure
that relationships between records in related tables are valid, and that you
don't accidentally delete or incorrectly change related data.
Data integrity:
§ Ensures the
quality of data within a database
§ Is about the
actual values that are stored and used in an application's data structures
§ Ensures that
an application exert deliberate control on every process that uses your data to
ensure the continued correctness of the information
§ Is applied
through the careful implementation of several key concepts such as normalizing
data, defining business rules, providing referential integrity and validating
the data
There are 4 types of the data integrity:
1. Entity
Integrity ensures that each row (record) is a unique instance in a particular
table by enforcing the integrity of the primary key or the identifier column(s)
of a table (e.g. ID, Reference Code, etc).
2. Domain
Integrity ensures validity of entries (data input) for a column through the
data type, the data format and the range of possible values (e.g. date, time,
age, etc.).
3. Referential
Integrity preserves the defined relationships between tables when records are
added, modified or deleted by ensuring that the key values are consistent
across tables; such consistency requires that there are no references to
non-existent values and if a key value changes, all references to it change
consistently through database, otherwise a key value cannot be changed.
4. User-Defined
Integrity enables specific (required) business rule(s) to be defined and
established in order to provide correct and consistent control of an
application's data access (e.g. who can have permissions to modify data, how
generated reports should look like, which data can be modified, etc.).
To refine
the design - After designing the tables, fields, and relationships you need the
time to study the design and detect any flaws that might remain. It is easier
to change your database design now, rather than after you have populated the
tables with data or created the relevant screens.
NOTE: Use
Microsoft Access to create your tables. Specify relationships between the
tables. Enter a few records of data in each table. See if you can use the
database to get the answers you want. Create rough drafts of your forms and
reports and see if they show the data you expect. Look for unnecessary duplications of
data and eliminate them.
To enter
data and create other system objects - When
you are satisfied that the table structures meet the design goals described
here, then it's time to go ahead and add all your existing data to the tables.
You can then create any queries, forms, reports, macros, and modules that you
may want.
No comments:
Post a Comment