Sunday, April 28, 2013

Tables, Full Normalization and Business Rules



REVISED: 10/16/16
 

Often somebody produces a table and asks if it is fully normalized (in 5NF) and, if not, in what normal form it is. This is an indication of poor grasp of data fundamentals.

Consider ASSIGNMENTS:
 EMP# ENAME    PROJECT       DEPT#
===================================
 100  Spenser  Sys Support   E21
 100  Spenser  Comp Svcs     E21
 100  Spenser  Supp Svcs     E21
 160  Pianka   Info Center   D11
 310  Setright Documentation D11
 310  Setright Mfg Systems   D11
 150  Adamson  Info Center   D11
-----------------------------------
First, a normal form is a property of a relation, not a table (a R-table is only a "visual shorthand" for a relation -- a special kind of table that visualizes a relation on some physical medium (e.g., paper) -- and the two should not be confused.

Second, the normal form of a relation is determined from attribute dependencies. Formally, a relation is fully normalized (in 5NF) if and only if the only dependencies that hold in it are functional dependencies (FD) of the non-key attributes on the key (i.e., there is exactly one value of each non-key attribute for every key value, but not vice-versa). Since a key represents an entity identifier, this condition exists only when, informally, a relation represents entities of a single type (why?) Is this true for the relation pictured by ASSIGNMENTS?

The fact is that whether a relation represents a single type of entity -- and, therefore, is fully normalized -- cannot be ascertained from sheer visual inspection of the table picturing it. It requires knowledge of what the underlying relation means, namely the type(s) of entity specified by the business rules in the corresponding conceptual model that the relation represents.

For example, if the rules:

  • R1: Every employee is identified by an employee number.
  • R2: Every employee has an employee name.
  • R3: Every employee works in a department.
  • R4: Every project assignment is identified by an employee number and a project name.
model two types of entity:
  • Employees: {emp. number} --> {employee name, department number}
  • Project assignments: {employee number, project}
then the relation represents both and, consequently, is not in 5NF. In fact, without a well defined and complete conceptual specification of entity types, you can't even tell whether relations have keys (if they do not, they are not relation) and, if they do, what the key is.

Database design adhering to the Principle of Full Normalization (POFN) do not "bundle" entity types and produce 5NF relations, obviating the need for further normalization. For the advantages of full normalization and the drawbacks of "denormalization for performance" illusion, see paper #2 and the recently published DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS.

Explicit further normalization is necessary only to "repair" poorly designed non-5NF relations by replacing them 5NF projections. For example, ASSIGNMENTS with EMPLOYEES and PROJ_ASSIGNS pictured by R-tables:

EMP# ENAME    DEPT#
====================
 100  Spenser  E21
 160  Pianka   D11
 310  Setright D11
 150  Adamson  D11
--------------------

 EMP# PROJECT
=================  
 100  Sys Support
 100  Comp Svcs
 100  Supp Svcs
 160  Info Center
 310  Mfg Systems
 310  Comp Svcs
 150  Info Center
------------------
each representing a single entity type. The repair is possible because the following holds:
ASSIGNMENTS{EMP#,ENAME, DEPT#} JOIN ASSIGNMENTS{EMP#,PROJECT} = PROJ_ASSIGNMENTS
where the left-hand side is a join of two projections of ASSIGNMENTS (i.e., no information is lost).




No comments:

Post a Comment

View My Stats