Tuesday, March 27, 2012

Simple Database Design Tips


Simple Relational Database Design Tips:
If your companies data is its lifeblood, then the database design can be the most important part of an application.
A great many subjects have been written on this topic, and entire college degrees have been built around it. However, as has been said time and time again, there’s no teacher like experience.
I'll get the show started by listing five favourite tips and giving a brief explanation of the reasoning behind each one. If you have a general design tip (try not to be specific to a particular system) that you'd like to share, post it in using the Contact Webmaster e-mail address. Where possible, explain the reasoning behind your tip by including an example or anecdote.
§  Use meaningful field names
If you have ever inherited a database from another programmer you may well know what I mean. Some like to name fields using the name of the on-screen control that displayed the data from that field. That's all well and good, but unfortunately, they also liked to name other controls using some strange convention that combined Hungarian notation with the order in which they have added the controls to the screen: cbo1, txt2, cmd3, lbl4 and so on.

Unless you are using a system that restricts you to short field names, make them as descriptive as possible-within reason, of course. It is however possible to go overboard with this; Customer_Address_Street_Line_1 is very descriptive and meaningful, but no one would want to have to type it more than once.
§  Do your homework
Not only should you research your business needs when designing a new database, you should check out the existing system, as well. Few database projects are built from scratch; there is almost always an existing system (maybe not computerised) that the organization is using to fulfil its needs.

Obviously, the existing system is not perfect; otherwise, you wouldn't be building a new one. But by studying it, you may discover nuances that you would otherwise have missed had you ignored it. If nothing else, examining the existing system is usually good for a laugh or two!
§  Take the user's keys away
When deciding which field or fields to use as keys in a table, always consider the fields that users will be editing. It's usually a bad idea to choose a user-editable field as a key. Doing so forces you to take one of these two actions:

Restrict the user from editing the field after the record's creation. If you do so, you may discover that your application isn't flexible enough when business requirements suddenly change, and users need to edit that un-editable field.
What happens when a user makes a mistake in data entry and doesn't notice until the record is saved?
Delete and re-create?
What if the record isn't re-creatable; suppose the customer left?
Provide some way of detecting and correcting key collisions. Usually, this can be done with some effort, but it is expensive in terms of performance. Also, a key correction may wind up being possible only from outside the data layer, forcing you to break the isolation between your data and business/user interface layers.
The underlying reasoning here is this: Make your design fit the user, don't make the user fit the design.
§  Don't over-normalise
For those unfamiliar with the term, normalisation helps eliminate the redundancy of data in a database by ensuring that all fields in a table are atomic. There are several forms of normalisation, but the Third Normal Form (3NF) is generally regarded as providing the best compromise between performance, extensibility, and data integrity.
Briefly, 3NF states that:
§  » Each value in a table is to be represented once and only once.
§  » Each row in a table should be uniquely identifiable. (It should have a unique key.)
§  » No non-key information that relies upon another key should be stored in the table.

Databases in 3NF are characterised by a group of tables storing related data that is joined together through keys. For example, a 3NF database for storing customers and their related orders would likely have two tables: Customer and Order. The Order table would not contain any information about an order's related customer. Instead, it would store the key that identifies the row containing the customer's information in the Customer table.
Advanced levels of normalisation exist, but is "more normal" necessarily better?
Not always. In fact, for some projects, even 3NF may introduce too much complexity into the database to be worth the rewards.
§  Build in the adaptation factor
Not much is worse than discovering, or being informed, that your "finished" database is missing a field for a crucial piece of information.

So, unless table size is a serious concern, always build in a factor to allow for new fields, even if this means just adding a couple of extra text fields to the end.
§  Submitted By Sandra Greer - 03/12/02
For any DBMS, avoid using a number unless it has to be a number (and you have to do maths on it). Half the time, the users will subsequently want to load in another dataset that uses alphanumerics, or hyphens, or spaces.

That's why the Real Key is the autonumber Access puts on your records, and the other user "key" is for searching.
§  Always design the database on paper first - Courtesy of Michael J. Hernandez.
This one tip will probably save you an enormous amount of time. Designing the database on paper first guarantees that you will keep focused on the task of designing the database as effectively and efficiently as possible. You want to make certain that your database will have only a minimum amount of redundant data and that you've eliminated as much duplicate data as possible. Making changes to the structure is also a much simpler task if done on paper.

You'd be surprised how fast an eraser works. Many people make the mistake of designing a database with their particular RDBMS in mind and will design tables in a manner they hope will fit certain forms or reports they envision themselves using. In most cases, designing a table in this manner will result in poorly constructed tables. Common side effects of this type of construction include difficulty in sorting the data and searching for specific values. In many cases, the tables will have to be redesigned using proper database design techniques.
§  Make absolutely certain that each table only represents one subject - Courtesy of Michael J. Hernandez.
A subject can be one of two things: and object or an event. In this case, and object represents something tangible - such as a person, place, or thing - while an event represents something that occurs at a specific point in time. Both have characteristics that can be stored as data and processed later as information.

When a table describes more than one subject, you will certainly have unnecessary duplicate data and possibly redundant data as well. Ensuring that a table represents a single subject guarantees that you will avoid these problem.
§  Every table should have a Primary Key - Courtesy of Michael J. Hernandez.
This is an important tip for two very good reasons. First, a Primary Key uniquely identifies each record in a given table and helps to ensure against redundant data. This is the mechanism used to refer to a particular record from other tables in the database. Second, it is also the instrument used to establish a relationship between a pair of tables. This will be of particular importance when you want to retrieve data from multiple tables in a query.
Here are some guidelines for establishing a Primary Key:
§  » It's value must be unique
§  » It can never be null
§  » It cannot be a multi-part field
§  » It should comprise the minimum number of fields to guarantee uniqueness
§  » It is not optional in whole or in part
§  » It must directly identify each value of the remaining fields in a given record of a table

In order for a field to become a Primary Key for its parent table, it must pass each of these guidelines. Failure to pass even one will disqualify it as a possible Primary Key. If you make certain you have established a valid Primary Key, you will greatly reduce the possibility of encountering problems later when you begin to work with table relationships.
§  Invest the time to implement data integrity - Courtesy of Michael J. Hernandez.
I can't over-emphasize the importance of this tip. Many of the problems you'll encounter with inaccurate or erroneous information will be a direct result of poor data integrity. While it sometimes seems like a waste of time to pay so much attention to the many details involved in establishing data integrity, it will actually save you an enormous amount of time in the long run - you won't have to continually go back to fix things. An interesting fact is that the very people who "…just don't have the time." to establish proper data integrity are the ones who usually spend a large amount of time fixing their improperly designed databases. In many cases they will spend up to three times the amount of time it would have taken to design the database properly in the first place!
So don't do it over - do it right!


No comments:

Post a Comment