1 1 12 Normalization

Visual LANSA Logical Modeler

1.1.12 Normalization

In 1972, Ted Codd from IBM developed the relational model and identified normal forms which were called first, second, and third. When normalizing a database, you are ensuring that it has been broken into its normal forms. Normalization is a common technique for analyzing relational data structures. It is important to understand the basic concept of normalization when data modeling. Normalization applies to physical database models.

The benefits of having a properly normalized data model and database design are that it:

  • Reduces data redundancy.
  • Produces a database structure that is easily understood by users and system developers.
  • Produces a data model that is flexible in the face of changing business requirements.
  • Produces a database structure that encourages structured, and therefore simpler, program development.

The process of normalization involves three main steps. These are:

First Normal Form

Remove repeating groups. Remove from an entity those elements which maintain the same value between occurrences (or records) while the rest of the elements change.

Second Normal Form

Remove elements which are only partially dependent on the key of the entity.

Third Normal Form

Remove elements which are dependent on (i.e. are identified by) a key other than that of the entity.

 

The result of the normalization process should be that each entity contains only those elements which are properly identified by the keys of that entity.

For example, consider the information which might appear on an order form. It could start with an entity which contains all the data items such as in this example:

Un-normalized Form

ORDER

Order Number, Customer Number, Customer Name, Customer Address, Product Number, Product Name, Quantity Ordered

 

 

First Normal Form

First remove Repeating Groups. The customer details will be the same for each product ordered (i.e. for each line of the order). They are separated from the ORDER information which will be different on each line.

ORDER

Order Number, Customer Number, Customer Name, Customer Address

 

The fields in bold are the unique or candidate key fields.

 

ORDERED PRODUCT

Order Number, Product Number, Product Name, Quantity Ordered

 

 

Second Normal Form

Next remove attributes only partially dependent on primary key (or part of compound key). The product name depends only on Product Number (not on Order Number) and so it is removed.

ORDER

Order Number, Customer Number, Customer Name, Customer Address

 

The fields in bold are the unique or candidate key fields.

ORDERED PRODUCT

Order Number, Product Number, Quantity Ordered

PRODUCT

Product Number, Product Name

 

 

Third Normal Form

Finally, remove attributes dependent on a key other than the primary (or compound key). The customer details depend only on the customer number and so are removed.

ORDER

Order Number, Customer Number

 

The fields in bold are the unique or candidate key fields.

ORDERED PRODUCT

Order Number, Product Number, Quantity Ordered

PRODUCT

Product Number, Product Name

CUSTOMER

Customer Number, Customer Name, Customer Address