Please see my other Database Development articles.
Introduction
Before I define the process of designing a database, or “normalization,” I think some background is essential.
In most implementations of SQL Server databases, the system they are designed to support are generally defined in two broad categories: Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP).
OLAP systems are designed solely for use with Business Intelligence (BI) applications using either Snowflake or Hierarchical schemas, but they are highly denormalized. The primary purpose of the data is for historical analysis – analyzing trends over long periods of time. Therefore, data rarely changes and recent changes are largely irrelevant to an OLAP system (what historic trends could have occurred in the last two days?).
Conversely, OLTP systems are highly normalized for use by web sites and software applications designed to make many concurrent changes to data in real time – the system’s data is alwayschanging.
Now that we understand that an OLTP system’s data is always changing, two very clear business requirements become clear.
First, all data must be available for creation, modification, or deletion and these operations must be supported at the most fundamental level of the data – instead of only needing to change a person’s “name,” the requirement would be to change both the “first” and “last” names.
Also, these operations must be optimized so that the database can support thousands of them concurrently, without degrading the performance of the system for other users, which brings us to the topic of “normalization.”
Normalizing a database design involves using formal methods to separate the data into multiple, related tables with a greater number of narrow tables (fewer columns).
In relational database design theory, normalization rules identifies certain attributes that must be present or absent:
- A table should have a unique identifier – this avoids duplicate records.
- A table should store data for only a single type of entity – no mixing data for different purposes (Books & Publishers)
- A table should avoid nullable columns. “NULL” represents the absence of a value (not ‘0’ or blank) so what is purpose of storing nothing?
- A table should not have repeating values or columns. For example, storing “color1,” “color2,” and so on to represent a person’s favorite colors. Instead, store their favorites in a separate table that is referenced by the primary table.
The following represents a basic checklist to normalizing data:
- Gather requirements for the data and how it is to be used (current systems, users, documentation, etc).
- Make a list of candidate fields.
- Identify any repeating data, separate from other data.
- Group logically-related data.
- Identify multipart fields and break into separate fields (“name” becomes “first” and “last” names).
- Qualify each field:
What type of data will it is (numeric, character, etc.)?
Maximum length?
Create a finite list of possible values, if possible. - Identify and separate repeating groups of fields (colors).
- Identify data within logically-similar data that would naturally distinguish each record from each other (SSN might distinguish a person). This is a candidate Primary Key (PK).
- Identify relationships between fields.
Entity Relationships
Relationships help prevent redundant data. There are three types of relationships: one-to-one, one-to-many, many-to-many.
- One-to-One: A row in table A can have only one matching row in table B and vice versa. This type of relationship is not very common and such data is usually located in one table.
- One-to-Many: A row in table A can have many rows in table B, but a row in table B only has one matching row in table A. This is the most common type of relationship.
Ex: Publishers & Titles tables, each Publisher has many titles, but each title comes from only one publisher. Publishers (1) – Titles (many). - Many-to-Many: A row in table A can have many matching rows in table (B) and vice versa.
Ex: junction table, whose PK consists of foreign keys from both table A and table B. Books and Authors table have many-to-many relationships (if books have more than one author).