Tutorial

Database normalization in MySQL: Four quick and easy steps

Database normalization was introduced as a procedure by Edgar Frank Codd, a computer scientist at IBM in his paper, A Relational Model of Data for Large Shared Data Banks in 1970. Database normalization is a process by which an existing schema is modified to bring its component tables into compliance through a series of progressive normal forms.


Tutorial index: 


It focuses on ridding developers and their projects of the ‘spreadsheet syndrome’. Spreadsheet syndrome refers to the tendency of developers to squeeze as much information as possible into as few tables as possible.

Earlier, due to the notions of spreadsheets and how data was managed in them, developers continued to design MySQL databases with the same mind frame. Today, this method is not considered a smart way to design MySQL databases since tables, when designed with the spreadsheet syndrome, call for constant redesigning for every small change to the database.

<< Back to Index

Benefits of database normalization in MySQL 

Reduced usage of storage space by intelligently categorizing data is one of the many benefits database normalization lends to MySQL. It aids in better, faster, stronger searches as it entails fewer entities to scan in comparison with the earlier searches based on mixed entities. Data integrity is improved through database normalization as it splits all the data into individual entities yet building strong linkages with the related data.

As Mike Hillyer, a technical writer in MySQL AB (now Oracle Corporation), explains: “The goal of database normalization is to ensure that every non-key column in every table is directly dependent on the key: the whole key and nothing but the key. And with this goal, come the benefits in the form of reduced redundancies, fewer anomalies, and improved efficiencies.”

<< Back to Index

Normalize your data easily 

The following example will illustrate how database normalization helps achieve a good design in MySQL. The table below presents data that needs to be captured in the database.

Title Author Bio ISBN Subject Pages Publisher
Beginning MySQL Database Design and Optimization

Chad Russell,

Jon Stephens

Chad Russell is a programmer and system administrator who owns his own internet hosting company. Jon Stephens is a member of the MySQL AB documentation team. 90593324 MySQL Database Design 520 Apress

In the example shown above, a lot of storage space will be wasted if any one criterion (author or publisher) is considered as the identification key. Database normalization, thus, is essential. It is a step by step process that cannot be carried out haphazardly. The following steps will help in attaining database normalization in MySQL.

<< Back to Index

Step 1: Create first normal form (1NF) 

The database normalization process involves getting data to conform to progressive normal forms, and a higher level of database normalization cannot be achieved unless the previous levels have been satisfied. First normal form is the basic level of database normalization.

For 1NF, ensure that the values in each column of a table are atomic; which means they are unique, containing no sets of values. In our case, Author and Subject do not comply.

One method for bringing a table into 1NF is to separate the entities contained in the table into separate tables. In our case this would result in Book, Author, Subject, and Publisher tables.

Book’s table:

ISBN

Title

Pages

1590593324

Beginning MySQL Database Design and Optimization

520

Author’s table:                                                                                              << Back to Index

Author_ID

First Name

Last Name

1

Chad

Russell

2

Jon

Stephens

3

Mike

Hilyer

<< Back to Index

Subject’s table:

Subject _ID

Last_name

1

Russell

2

Stephens

Publisher’s table:

     Publisher_ID

Name

Address

City

State

Zip

1

Apress

2   580, Ninth street, station 219

Berkeley

California

94710

<< Back to Index

Step 2: Define relationships 

Three types of relations can be established:

  • One-to-(Zero or)-one (Example: marriage)
  • One-to-(Zero or)-many (Example: kids)
  • Many-to-many (Example: facebook)

The book’s table may have many to many relations with the Author’s table.

Author’s table may have many books and a book may have more than one author.

The Book’s table may have many to many relations with the Subject table.

The books may fit in many subjects and the subjects may have many books.

Many-to-many relations have to be presented by “link” tables

Book Author table:

ISBN

Subject_ID

1590593324

1

1590593324

2

Book_Subject table:

ISBN

Subject_ID

1590593324

1

1590593324

2

  • << Back to Index
  • One-to-many in our example will be Books to Publisher. Each book has only one Publisher but one Publisher may have many books.

    We can achieve ‘one-to-many’ relationships with a foreign key. A foreign key is a mechanism in database management systems (DBMS) that defines relations and creates constraints between data segments. It is not possible to review what is not related to the specific book. It is not possible to have a book without an author or publisher.

    When deleting a publisher, all the related books may need to be deleted along with the reviews of those books. The authors would not be needed to be deleted.

    The foreign key is introduced in the table that represents the “many”, pointing to the primary key on the “one” table. Since the “Book” table represents the many portion of the one-to-many relationship, the primary key value of the Publisher as in a Publisher_ID column as a foreign key is added.

    ISBN

    Title

    Pages

    Publisher_ID

    1590593324

    Beginning MySQL Database Design and Optimization

    520

    1

    << Back to Index

    Step 3: Make second normal form (2NF) 

    Second normal form (2NF) cuts down the tautological/superfluous data in a table by selecting it, putting it in new tables and by establishing relations amongst them. In database normalization, 2NF is about the relations between the composite key columns and non-key columns. That means the non-key columns have to depend on the whole composite key.

    Here, the primary key is composite to eliminate the possibility of the same person writing more than one review of the book. Reviewer URL is dependent only on the Reviewer ID which is only part of the whole composite key.

    This table does not comply with the 2NF:

    ISBN

    Reviewer ID

    Summary

    Reviewer_URL

    1590593324

    3

    A great book!

    http://www.openwin.org

  • << Back to Index

    Step 4: Third Normal Form (3NF) 

    This requires that all columns depend directly on the primary key. Tables violate the 3NF when one column depends on another column which in turn depends on the primary key. (A transitive dependency)

    In the publisher table, the City and State are actually dependent on the zip code not the Publisher_ID

    Publisher_ID

    Name

    Address

    City

    State

    Zip

    1

    Apress

    2580, Ninth street, station 219

    Berkeley

    California

    94710

    To comply with 3NF we have to move these outside the publisher’s table:

    Zip

    City

    State

    94710

    Berkeley

    California

    Through the process of database normalization we bring our schema's tables into conformance with progressive normal forms. As a result the tables each represent a single entity (a book, an author, a subject, etc) and we benefit from decreased redundancy, fewer anomalies and improved efficiency.

    About the author: Ronen Baram is a MySQL sales consultant at Oracle for Australia / New Zealand markets. As part of his job, he meets with clients all over Australia, New Zealand as well as the rest of APAC, mainly India. Baram has 15 years of IT experience and has a special interest in Linux OS.

    (This tutorial is based on a presentation given by Ronen Baram at Oracle Develop Conference held in Hyderabad this year. Compiled by Sharon D’Souza)

  • << Back to Index

This was first published in May 2011