Designing Good Relational Databases:
Databases have a reputation for being difficult to construct and hard to maintain. The power of modern database software makes it possible to create a database with a few mouse-clicks. The databases created this way, however, are typically the databases that are hard to maintain and difficult to work with because they are designed poorly. Modern software makes it easy to construct a database, but doesn't help much with the design aspect of database creation.
Database design has nothing to do with using computers. It has everything to do with research and planning. The design process should be completely independent of software choices. The basic elements of the design process are:
Defining the problem or objective
Researching the current database
Designing the data structures
Constructing database relationships
Implementing rules and constraints
Creating database views and reports
Implementing the design
Notice that implementing the database design in software is the final step. All of the preceding steps are completely independent of any software or other implementation concerns.
Defining the problem or objective.
The most important step in database design is the first one: defining the problem the database will address or the objective of the database. It is important however, to draw a distinction between:
How the database will be used and
What information needs to be stored in it.
The first step of database design is to clearly delineate the nature of the data that needs to be stored, not the questions that will be asked to turn that data into information.
This may sound a little contradictory at first, since the purpose of a database is to provide the appropriate information to answer questions. However, the problem with designing databases to answer specific or targeted questions is that invariably questions are left out, change over time, or even become superseded by other questions. Once this happens, a database designed solely to answer the original questions becomes useless. In contrast, if the database is designed by collecting all of the information that an individual or organization uses to address a particular problem or objective, the information to answer any question involving that problem or objective can theoretically be addressed.
Researching the current database.
In most database design situations, there is some sort of database already in existence. That database may be Post-it notes, paper order forms, a spreadsheet of sales data, a word processor file of names and addresses, or a full-fledged digital database (possibly in an outdated software package or older legacy system). Regardless of its format, it provides one essential piece of information: the data that the organization currently finds useful. This is an excellent starting point for determining the essential data structure of the database. The existing database information can also provide the nucleus for the content of the new database.
Designing the data structures.
A database is essentially a collection of data tables, so the next step in the design process is to identify and describe those data structures. Each table in a database should represent some distinct subject or physical object, so it seems reasonable to simply analyse the subjects or physical objects relevant to the purpose of the database, then arrive at a list of tables.
This can work successfully, but it's a much better to objectively analyse the actual fields that you have identified as essential in your research and see what logical groupings arise. In many cases, structures that seemed distinct are really reflections of the same underlying subject. In other cases, the complete opposite is true. And to complicate matters, organizations can use the same terms to describe data that they use or collect in fundamentally different ways.
Once the tables have been determined and fields have been assigned to each, the next step is to develop the specifications for each field. The perfect field should be atomic: It should be unique in all tables in the database (unless it is used as a key) and contain a single value, and it should not be possible to break it into smaller components. This is also an appropriate time to start thinking about the kind of data that goes in each field. This information should be fairly clear from the research phase of the project, but sometimes questions remain. Some advance planning can be done to make it easier to implement the database in the software at a later time, such as identifying the type of fields and examining (or re-examining) existing data that you've collected to make sure that the data always fits the model you are constructing. It's much easier and cheaper to fix that now than wait until the database is being rolled out!
Constructing database relationships.
Once the data structures are in place, the next step is to establish the relationships between the databases. First you must ensure that each table has a unique key that can identify the individual records in each table. Any field in the database that contains unique values is an acceptable field to use as a key. However, it is a much better practice to add an arbitrary field to each table that contains a meaningless, but unique value. This value is typically an integer that is assigned to each record as it is entered and never again repeated. This ensures that each entered record will have a unique key.
Implementing rules and constraints.
In this step, the fields in the database are still fairly amorphous. Defining the fields as text or numeric and getting a rough feel for the types of data that the client needs to store has narrowed them down, but there is room for further refinement. Rules and constraints typically lead to cleaner data entry and thus better information when using the data. Business rules and constraints limit the format that data can take or the ways that data tables can be related to other data tables.
Some of these constraints are imposed by the nature of the data itself; social security numbers are always in the same nine-digit format. This type of constraint is normally implemented to make sure that data is complete and accurate. In other cases, the situation itself explicitly constrains the data. The possible values for the data are usually checked against a list or the choice of values is otherwise constrained. This type of constraint is usually easy to implement and easy to change.
Creating database views and reports.
Now that the data design is essentially complete, the penultimate step is to create the specifications that help turn the data into useful information in the form of a report or view of the data. Views are simply collections of the data available in the database combined and made accessible in one place. It could be as simple as a subset of an existing data table or as complicated as a collection of multiple tables joined on particular set of criteria. Reports on the other hand, are typically snapshots of the database at a particular point in time.
Implementing the design in software.
All of the work to this point has been accomplished without explicitly worrying about the details of the program being used to produce the database. In fact, the design should only exist as diagrams and notes on paper. This is especially important at a later point when you or someone else need to update the database or port it to another package. Now it's time to boot the computer and get started.
No comments:
Post a Comment