Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.
This is a continuation of an email exchange with readers in response to my post Normalization and Performance: Never the Twain Shall Meet started in Part 1.
On Normalization, Performance and Database Correctness
(originally posted 03/22/2001)
“I read your article on Normalization and database speed. The rules of normalization are used to provide a guide when designing a logical approach to managing information. If that information management plan (IMP) is to be implemented using a database, then the ultimate "test" of the IMP is its performance on the physical level. The IMP architecture is commonly expressed using an Entity Relationship Diagram or ERD. Sure CPU, RAM, and RAID all play an important role in database performance. But when a DBA changes the way tables, columns, or indexes are structured he is changing the IMP and corresponding ERD. Now there are two sides to this story.When the second rule is violated then data corruption must be safeguarded against. The safeguard must be implemented on the database level because that is the only way corrupt data can be prevented regardless of the source (i.e. data loaded from a flat file, DBAs and users logged onto SQL*Plus changing information, other application software that also has access to the database, and last but not least; application developers that are not aware of the duplicate data stored in more than one location). I usually implement the safeguard by using database triggers so that when a record is changes in one table a trigger will make the corresponding change in another table. I have found that most "performance gains" achieved by denormalization are eradicated by the use of the trigger needed to safeguard the integrity of the data. The bottom line is that De-Normalization can often increase database speed. But this is most often achieved at the expense of data integrity.”
- The Pro De-Normalization group viewpoint: If the above change results in a "faster" database then denormalization has been "proven" to be necessary.
- The Pro Normalization group viewpoint: A change is only valid if it maintains the integrity, and flexibility that are inherent in a fully Normalized database. For example, violation of the second rule "Eliminate Redundant data" can result in corrupt data when data in two or more locations is not properly update, changed, or inserted in all locations.