Tuesday, March 27, 2012

Database Design - Basic Steps


The Relational Database Design Process:
Before you build the tables and other objects that will make up your system, it is important to take time to design it. A good design is the keystone to creating a system that does what you want it to do effectively, accurately and efficiently.

The BASIC steps in designing a database system
§  To determine the purpose of your system
§  To determine the tables that you need in your system
§  To determine the fields that you need in the tables
§  To identify fields with unique values
§  To determine the relationships between tables
§  To refine the design
§  To add data (populate tables) and create other system objects
NOTE: Some of the listed steps (determining tables, data fields and relationships) may cross and be repeated a few times when designing a relational database.
Building a database is a process of examining the data that is necessary and useful for an application, then breaking it down into a relatively simple row and column format.
To determine the purpose of your system you need to know what information you want from the database (detailed scenario). From that, you can determine what subjects you need to store facts about (the tables) and what facts you need to store about each subject (the data fields).
So, make sure that you do answer the following questions:
1.     What type of data should the system keep track off?
2.     What would the user want to know about the data?
3.     What would the user want to do to the data?
The first step in creating a database is creating a plan that serves both as a guide to be used when implementing the database and as a functional specification for the database after it has been implemented. The nature and complexity of a database application, as well as the process of planning it, can vary greatly. In the first case, the database design may be little more than a few notes on some scratch paper. In the latter case, the design may be a formal document with hundreds of pages that contain every possible detail about the database.
NOTE: Modeling the structure on paper before opening computer and starting coding is highly recommendable. Planning may seem time-consuming up front, but not planning is twice as time-consuming later.
To determine the tables can be the trickiest step in the database design process. That is because the results that you want from the database (e.g. the reports that you want to print, the forms that you want to use, the questions that you want answered) don't necessarily provide clues about the structure of the tables that can produce them. In fact, it may be better to sketch and rework your design on paper first.
When you design your tables, divide up pieces of information by keeping following fundamental design principles in mind:
1.     Each piece of information is stored in only one table
2.     A table should not contain duplicate information
3.     Each table should contain information about only one subject
4.     Information should not be duplicated between tables


Columns contain the attributes of the entities.


To identify fields with unique values (define table primary key) - In order to uniquely identify each individual record in a table and to be able to relate information stored in separate tables. By having a different primary key in each record you can tell two records apart - the goal of setting primary keys is to ensure each records uniqueness. This is calledentity integrity in the database management.
The primary key types:
1.     Single-field primary keys (AutoNumber or User-Defined type)
2.     Miltiple-field primary keys
Notes:
§  The power of a relational database system comes from its ability to quickly find and bring together (related) information stored in separate tables by using queries, forms and generating reports. In order to do this, each table should include a field or set of fields to uniquely identify each record stored in the table. This information is called theprimary key of the table. Once you specify a primary key for the table, to ensure uniqueness, the system will prevent any duplicated or null values from being entered in the primary key fields.
§  To be able to set relationships between tables, you must establish a link between fields that contain common and related information. The link field in another table is known as a foreign key data field. A relationship is established by linking these key fields between tables - the primary key in the 'primary' table and a foreign key in the 'related' table.
So, every table must have a primary key - one or more data fields whose contents are unique to each record. When linking tables you link the primary key field from one (primary or 'parent') table to a field in another (related or 'child') table that has the same name, structure and data type. By matching the values from the primary key to the foreign key in both tables, you can relate two records.
Tables store data about entities.
The process of designing a relational database includes making sure that a table contains only data directly related to the primary key, that each data field contains only one item of data, and that redundant (duplicated) data is eliminated. The task of the database designer is to structure the data in a way that eliminates unnecessary duplication and provides a rapid search path to all necessary information. This process of specifying and defining tables, keys, columns and relationships in order to create an efficient database is called normalisation.
Normalisation is part of successful database design. Without normalisation, database systems can be inaccurate, slow and inefficient and they might not produce the data you expect.
We use the normalisation process to design efficient and functional databases. By normalising, we store data where it logically and uniquely belongs. Normalisation process involves a few steps and each step is called a form. Forms range from the first normal form (1NF) to fifth normal form (5NF). There is also one higher level, called domain key normal form (DK/NF). However, we will cover the first 3 forms.

When normalising a database you should achieve four goals:
1.     Arranging data into logical groups such that each group describes a part of the whole
2.     Minimizing the amount of duplicated data stored in a database
3.     Building a database in which you can access and manipulate the data quickly and efficiently without compromising the integrity of the data storage
4.     Organising the data such that, when you modify it, you make the changes in only one place
Normalisation is a complex process with many specific rules and different intensity levels. In its full definition, normalisation is the process of discarding repeating groups, minimizing redundancy, eliminating composite keys for partial dependency and separating non-key attributes.
In simple terms, the rules for normalisation can be summed up in a single phrase: "Each attribute (column) must be a fact about the key, the whole key and nothing but the key". Said another way, each table should describe only one type of entity (information).
A Properly normalised design allows you to:
§  Use storage space efficiently
§  Eliminate redundant data
§  Reduce or eliminate inconsistent data
§  Ease the database maintenance burden
Relational database theorists have divided normalisation into several rules called normal forms:
§  Un-normalised data = repeating groups, inconsistent data, delete and insert anomalies.
§  First Normal Form (no repeating groups) = each cell of a table must contain a single value, and the table must not contain repeating groups.
§  Second Normal Form (each column must depend on the entire primary key) = must have met all of the database requirements for the 1st form, and data, which does not directly depend on the table's primary key must be moved into another table.
§  Third Normal Form (each column must depend directly on the primary key) = must have met all database requirements for both 1st and 2nd forms, and all fields that can be derived from data contained in the other fields and tables must be removed.
NOTE: You must be able to reconstruct the original flat view of the data. If you violate this rule, you will have defeated the purpose of normalising the database.
To determine the relationships between tables - Now that you've defined your information into tables and identified primary key fields, you need a way to tell the system how to bring related information back together in meaningful ways. To do this, you define relationships between tables. Relationship is an association between common fields (columns) in two tables. A relationship works by matching data in key fields. In most cases, these matching fields are the primary key from one table, which provides a unique identifier for each record, and a foreign key in the other table. The kind of relationship that the system creates depends on how the related fields are defined. When you physically join two tables by connecting fields with related information, you create a relationship that is recognized by the system (Access). The specified relationship is important. It tells the system how to find and display information from fields in two or more tables. The program needs to know whether to look for only one record in a table or to look for several records on the basis of the relationship.
There are 3 relationship types:
1.     One-to-one (1:1) - each record in Table A can have only one matching record in Table B, and each record in Table B can be related to only one record in Table A. This type of relationship is not frequently used in database systems, but it can be very useful way to link two tables together. However, the information related in this way could be in one table. You might use a one-to-one relationship to divide a table with many fields in order to isolate part of a table for security reasons, or to store information that applies only to a subset of the main table, or for efficient use of space. A one-to-one relationship is created if both of the related fields are primary keys or have unique indexes.
2.     One-to-many (1:M) - is the most common type of relationship and it is used to relate one record from the 'primary' table with many records in the 'related' table. In a one-to-many relationship, a record ('parent') in Table A can have many matching records ('children') in Table B, but a record ('child') in Table B has only one matching record ('parent') in Table A. This kind of relationship is created if only one of the related fields is a primary key or has a unique index.
3.     Many-to-many (M:M) - is used to relate many records in the table A with many records in the table B. A record ('parent') in Table A can have many matching records ('children') in Table B, and a record ('child') in Table B can have many matching records ('parents') in Table A. It is the hardest relationship to understand and it is not correct. By breaking it into two one-to-many relationships and creating a new (junction/link) table to stand between the two existing tables will enable correct and appropriate relationship setting. A many-to-many relationship is really two one-to-many relationships with a junction/link table. NOTE: Link table usually has the composite primary key that consists of the foreign keys from both tables A and B.
When tables are linked (joined) together, one table is usually called 'parent' or 'primary'table ('one end' in the 1:M relationship and 'one end' (primarily created table) in the 1:1 relationship) and another table is called 'child' or 'related' table ('many end' in the 1:M relationship and 'one end' (subsequently created table) in the 1:1 relationship). This is known as a parent-child relationship between tables. Records in a primary table cannot be modified or deleted if there are related records in the 'child' table - there will not be an orphan (related) record without a parent (primary) record. Also, a new record cannot be added to the related table if there is no associated record in the primary table.
In addition to specifying relationships between two tables in a database, you also set upreferential integrity rules that help in maintaining a degree of accuracy between tables. Setting referential integrity rules would prevent unwanted and accidental deletions and modifications of the 'parent' records that relate to records in the 'child' table. This type of problem could be catastrophic for any system. The referential integrity rules keep the relationships between tables intact and unbroken in a relational database management system - referential integrity prohibits you from changing existing data in ways that invalidate and harm the links between tables.
NOTE: Referential integrity operates strictly on the basis of the tables' key fields. It checks each time a key field, whether primary or foreign, is added, changed or deleted. If any of these listed actions creates an invalid relationship between two tables, it is said to violate referential integrity. Referential integrity is a system of rules that Microsoft Access uses to ensure that relationships between records in related tables are valid, and that you don't accidentally delete or incorrectly change related data.

Data integrity:
§  Ensures the quality of data within a database
§  Is about the actual values that are stored and used in an application's data structures
§  Ensures that an application exert deliberate control on every process that uses your data to ensure the continued correctness of the information
§  Is applied through the careful implementation of several key concepts such as normalizing data, defining business rules, providing referential integrity and validating the data

There are 4 types of the data integrity:
1.     Entity Integrity ensures that each row (record) is a unique instance in a particular table by enforcing the integrity of the primary key or the identifier column(s) of a table (e.g. ID, Reference Code, etc).
2.     Domain Integrity ensures validity of entries (data input) for a column through the data type, the data format and the range of possible values (e.g. date, time, age, etc.).
3.     Referential Integrity preserves the defined relationships between tables when records are added, modified or deleted by ensuring that the key values are consistent across tables; such consistency requires that there are no references to non-existent values and if a key value changes, all references to it change consistently through database, otherwise a key value cannot be changed.
4.     User-Defined Integrity enables specific (required) business rule(s) to be defined and established in order to provide correct and consistent control of an application's data access (e.g. who can have permissions to modify data, how generated reports should look like, which data can be modified, etc.).
To refine the design - After designing the tables, fields, and relationships you need the time to study the design and detect any flaws that might remain. It is easier to change your database design now, rather than after you have populated the tables with data or created the relevant screens.
NOTE: Use Microsoft Access to create your tables. Specify relationships between the tables. Enter a few records of data in each table. See if you can use the database to get the answers you want. Create rough drafts of your forms and reports and see if they show the data you expect. Look for unnecessary duplications of data and eliminate them.
To enter data and create other system objects - When you are satisfied that the table structures meet the design goals described here, then it's time to go ahead and add all your existing data to the tables. You can then create any queries, forms, reports, macros, and modules that you may want.

No comments:

Post a Comment