Showing posts with label 1NF. Show all posts
Showing posts with label 1NF. Show all posts

Friday, October 13, 2023

EVERYBODY THINK THEY KNOW FIRST NORMAL FORM, BUT NOBODY DOES



“I have read this article in an effort to boost my academic knowledge on data modeling a bit and still have no idea what this academic author wanted to say. Apparently First Normal Form (1NF) doesn't get enough respect and then proceeds to talk about Non-First Normal Form (NFNF). But what about First Normal Form (1NF) damnit.”

By sheer chance this was posted on LinkedIn just after I published my new paper The First Normal Form: A Definitive Guide.

PRACTICAL DATABASE FOUNDATIONS

FIRST NORMAL FORM

A DEFINITIVE GUIDE

(September 2023)

Fabian Pascal

 

Table of Contents

 Introduction

1.      The Normal Form

2.      The First Normal Form

3.      Domain Decomposability & Atomicity

4.      1NF & Tables

5.      SQL & 1NF

5.1.     Repeating Groups & Repeated Attributes

5.2.   Information Principle & SQL
 

Friday, December 2, 2022

NOBODY UNDERSTANDS FURTHER NORMALIZATION 4 (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

In Part 3 we set the matter straight about normalization to 1NF. In this part we do it wit respect to further normalization to 5NF. Non-1NF relations (i.e., with relation-valued attributes) are no longer part of industry practice, so we focus on 2NF-5NF violations. The term further normalization originates with Codd, who initially thought 1NF was sufficient and 2NF-5NF were discovered later (hence, further = beyond 1NF). The industry lumps both under normalization, but the two are distinct (e.g., only further normalization involves redundancy).

What's right/wrong with the following?

“So, what is this theory of normal forms? It deals with the mathematical construct of relations (which are a little bit different from relational database tables). First, second, and third normal forms are the basic normal forms in database normalization. Normalization in relational databases is a design process that minimizes data redundancy and avoids update anomalies. Basically, you want each piece of information to be stored exactly once; if the information changes, you only have to update it in one place. The normalization process consists of modifying the design through different stages, going from an unnormalized set of relations (tables), to the first normal form, then to the second normal form, and then to the third normal form.”
--Vertabelo.com

Thursday, November 3, 2022

NOBODY UNDERSTANDS NORMALIZATION 3 (sms)



09/19/23: For the latest on 1NF see: FIRST NORMAL FORM - A DEFINITIVE GUIDE

 

Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

(Continued from Part 2)

In this part we set matters straight about first normal form (1NF)

What's right/wrong about this database picture?

“A relation is in first normal form (1NF) if (and only if):
  • Each attribute contains only one value.
  • All attribute values are atomic, which means they can’t be broken down into anything smaller.
In practice, 1NF means that you should not have lists or other composite structures as attribute values. Below is an example of a relation that does not satisfy 1NF criteria:This relation is not in 1NF because the courses attribute has multiple values.

STUDENT-COURSES
==================================================
 STUDENT          COURSES
-=======================--------------------------
 Jane Smith       Databases, Mathematics
 John Lipinsky    English Literature, Databases
 Dave Beyer       English Literature, Mathematics
--------------------------------------------------

To transform this relation to the first normal form, we should store each course subject as a single value, so that each student-course assignment is a separate tuple.”

--Vertabelo.com

Sunday, October 23, 2022

NOBODY UNDERSTANDS NORMALIZATION 2 (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

(Continued from Part 1)

What's right/wrong about this database picture?

“So, what is this theory of normal forms? It deals with the mathematical construct of relations (which are a little bit different from relational database tables). The normalization process consists of modifying the design through different stages, going from an unnormalized set of relations (tables), to the first normal form, then to the second normal form, and then to the third normal form.”
--Vertabelo.com

Misconceptions

  • All database relations are, mathematically, relations, but not all mathematical relations are database relations.
  • The tabular structure play practically no role in RDM.
  • In practice there is no normalization (to 1NF) and there should not be further normalization (to 5NF).
  • Further normalization does not go from 2NF sequentially through 3NF and 4NF to 5NF.

Saturday, October 8, 2022

NOBODY UNDERSTANDS NORMALIZATION 1 (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

What's right/wrong with this database picture?

“Normalization in relational databases is a design process that minimizes data redundancy and avoids update anomalies. Basically, you want each piece of information to be stored exactly once; if the information changes, you only have to update it in one place. The theory of normal forms gives rigorous meaning to these informal concepts. There are many normal forms. In this article, we’ll review the most basic:
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
There are normal forms higher than 3NF, but in practice you usually normalize your database to the third normal form or to the Boyce-Codd normal form, which we won’t cover here.”

                                                                     --Vertabelo.com

Monday, September 12, 2022

DATABASE DESIGN: THE STATE OF KNOWLEDGE IN THE INDUSTRY



Can you identify all the fallacies and misconceptions in the following online exchange? What is the elephant in the room?
Q: “I have done data normalization on dummy data and would like to know if I did it correctly. If it is done correctly, I would also like to ask two things below, because it is about 3NF.

1NF: This table should be 1NF. 

2NF: I selected composite key (userID and Doors) as they represent minimal candidate key and got three tables applying FD rule.

 

3NF: Applying the rule of transitive dependency on 1st table in 2NF, I got out 4 tables (showing only first two, because the last two remain unchanged).

Questions: Is this database normalisation correct? If not could you point me where I did mistake? If answer on first question is True: Should the last table in 3NF be transformed into two tables, given it is not in correct Third normal form. Two non-key atributes have FD keycode -> accessGroup.”

Sunday, August 28, 2022

NOBODY UNDERSTANDS DATABASE DESIGN 1 (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

In a previous SMS post I debunked an attempt to express something important about database practice that was handicapped by lack of foundation knowledge. Here is another example.

“This Codd guy might have been onto something. Unfortunately, normalization is usually taught in a somewhat backwards, overly technical way. If you start with concepts, connections between them and details about them, you usually are already at a fairly high normal form without going through any formal normalization steps.”
--LinkedIn.com

Saturday, August 20, 2022

DATABASE RELATIONS, DATABASE DESIGN & CORRECTNESS (sms)



Note: In "Setting Matters Straight" posts I debunk online Q&As that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“...The relational model organizes data through relations (aka tables). You then normalize it in one of six forms. By normalizing data you:
- Reduce redundancy
- Ensure consistency
- Optimize for atomic inserts, updates and deletes
The biggest drawback ... are keys that let you join different tables across multiple systems.”
                                                                      --LinkedIn.com

Sunday, June 26, 2022

REPEATING GROUPS AND 1NF (t&n)



09/19/23: For the latest on this subject see: FIRST NORMAL FORM - A DEFINITIVE GUIDE

 

“A commonly used example of a table that is not in 2-NF is one with repeated attributes (i.e. child1, child2, child3). However, after examining the definition of 2NF in your book PRACTICAL ISSUES IN DATABASE MANAGEMENT, it seems to me that tables such as these do in fact satisfy 2NF. Am I missing something?” --Reader

Monday, May 2, 2022

RELATION PROLIFERATION (sms)



Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “How do I avoid too many relations in databases?”

A: “You don’t. Every relation is there to store meaningful data, hopefully you do not define database relations for data that are not to be stored in your database.”

A: “By following proper design principles. Normalization, standard data patterns, and progressing from logical to physical always. Never denormalize (or avoid normalizing in the first place) because performance never trumps accuracy. It really doesn't matter how fast you get the wrong answer.”
--Quora.com

Saturday, January 1, 2022

SCHEMA & PERFORMANCE: NEVER THE TWINE SHALL MEET



One of the core objectives of this site (and my work) has been to demonstrate that there will not be progress in data management as long as the industry and trade media require and promote exclusively (mainly tool) experience in the absence of foundation knowledge. I have published and analyzed ample evidence that relational language and terminology are used without grasping what it actually means -- a good way to gauge lack of foundation knowledge.

Recently I posted a four part series titled "Nobody Understands the Relational Model" showing that even a practitioner steeped in the RDM does not really understand it. Consider now a practitioner's mistake at the beginning of career -- "a bad database schema and what it did to system performance" -- which, he claims, belatedly taught him a lesson. Hhhhmmm, did it, really?

Thursday, November 11, 2021

Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 2



 with David McGoveran 

(Title inspired by Richard Feynman)

In Part 1 we explained that all database relations are, mathematically, relations, but not all relations are database relations, which are in both 1NF and 5NF and we agreed with a statement in a LinkedIn discussion ending as follows: "Update anomalies are not as big of a problem as an algebra where relations aren't closed under join". Unfortunately, update anomalies, closure, and how relational operators were defined are all interrelated and represent an even "bigger problem". Update anomalies are not "bugs", let alone irrelevant, but actually a reflection of  that much bigger problem.

In this second part we delve into that problem.

Wednesday, October 27, 2021

Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 1



 with David McGoveran 

(Title inspired by Richard Feynman)

“As currently defined, relational algebra produces anomalies when applied to non-5NF relations. Since an algebra cannot have anomalies, they should have raised a red flag that RA was not defined quite right, especially defining "relation" as a 1NF table and claiming algebraic closure because 1NF was preserved. Being restricted to tabular representation as the "language" for relationships is like being restricted to arithmetic when doing higher mathematics like differential calculus -- you need more expressive power, not less! Defining RA operations in terms of table manipulations aided initial learning and implementations by making data management look simple and VISUAL. Unfortunately, it was never grasped how much was missing, let alone how much more "intelligent" the RA and the RDBMS needed to be made to fix the problems. And I can see that those oversights were, in part, probably due to having to spend so much time correcting the ignorance in the industry."
--David McGoveran
I recently posted the following Fundamental Truth of the Week on LinkedIn, together with links to more detailed discussions of 1NF and 5NF (see References):
“According to conventional understanding of the RDM (such as it is) [and I don't mean SQL], a relation is in at least first normal form (1NF) -- it has only attributes drawn from simple domains (i.e., no "nested relations") -- the formal way of saying that a relation represents at the logical level an entity group from the conceptual level that has only individual entities -- no groups thereof -- as members. 1NF is required for decidability of the data sublanguage.

However, correctness, namely (1) system-guaranteed logical validity (i.e., query results follow provably from the database) and (2) by-design semantic consistency (of query results with the conceptual model) requires that relations are in both 1NF and fifth normal form (5NF). Formally, the only dependencies that hold in a 5NF relation are functional dependencies of non-key attributes on the PK -- for each PK value there is exactly one value of every corresponding non-key attribute value. This is the formal way of saying that a relation represents facts about a group of entities of a single type.

Therefore we now contend that database relations are BY DEFINITION in both 1NF and 5NF, otherwise all bets are off.”
It triggered a discussion that raised some fundamental issues for which an online exchange is too limiting. This post offers further clarifications, including comments by David McGoveran, on whose interpretation of the RDM (LOGIC FOR SERIOUS DATABASE FOLKS, forthcoming) I rely on. The portions of my interlocutor in the discussion are in quotes.

Sunday, September 19, 2021

TYFK: Calculated Attributes -- Redundancy, Full Normalization and Relational Theory



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“If you have shopping cart, you probably have some field "TOTAL" somewhere that stores the final amount due for the customer. It so happens that such a thing violates relational theory...”

“Having a "TOTAL" field in your "order" table *might* violate relational theory, but if you make it so that only a trigger can update it based on what's in your "order_item" table, then I think it's fine. You still get data integrity and that is what matters.”

“I still fail to see what you mean by the "calculated TOTALS field" (attribute, really) violates the Relational Model.”

“The result of having the field ... is what is called a DELETE ANOMALY.”

“Most denormalizing means adding columns to tables that provide values you would otherwise have to calculate as needed.”

“There are four practical problems with a fully normalized database, three of which I have listed before. I will list them all here for completeness:
* No calculated values. Calculated values are a fact of life for all applications, but a normalized database lacks them. The burden of providing calculated values must be taken up by somebody somehow. Denormalization is one approach to this, though there are others.
--Database Programmer blog

“...I'm now working with IT to normalize part of the database to remove calculated fields...:
`lineitems`.`extended total` = `lineitems`.`units` * `biditems`.`price`.
`jobs`.`jobvalue` = the sum of related `lineitems`.`extended total` records
`orders`.`ordervalue` = the sum of related `jobs`.`jobvalue` records.”
--mySQL.com

Do calculated attributes (not fields!) violate relational theory and must be "normalized" out of them? Determining that requires foundation knowledge that is scarce in the industry, which has a poor and outdated understanding of the RDM.

Tuesday, August 31, 2021

TYFK: Normalized, Fully Normalized, Non-Normalized, Denormalized -- Clearing the Mess



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“A non-normalized database is a disorganized one, where nobody has bothered to work out where the facts should be stored. It is like a stack of paper files that has been tossed down the stairs. We are not interested in non-normalized databases.

A normalized database has been organized so that each fact is stored in exactly one place (2nf and greater) and no more than one fact is stored in each place (1nf). In a normalized database there is a place for everything and everything is in its place.

A denormalized database is a normalized database that has had redundancies deliberately re-introduced for some practical gain. Most denormalizing means adding columns to tables that provide values you would otherwise have to calculate as needed. Values are copied from table to table, calculations are made within a row, and totals, averages and other aggregrations are made between child and parent tables.”
--database-programmer.blogspot.com

Thursday, June 10, 2021

RE-WRITE



See: https://www.dbdebunk.com/2023/08/entities-properties-and-codds-sleight.html

Monday, May 10, 2021

TYFK: What Domains Are and Are Not



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“...a Data Domain refers to all the valid values which a data element (column) may contain. The rule for determining the domain boundary may be as simple as a data type with a list of possible values. For example, a database table that has information about people, with one record per person, might have an "age" column. This gender column might be declared as a SMALLINT data type, and allowed to have a value between 0 and 120. The data domain for the age column is hence 0 - 120. In a normalized data model, the reference domain is typically specified in a reference table. Following the previous example, the age reference table could have exactly 120 records, one per allowed value. Reference tables are formally related to other tables in a database by the use of foreign keys. A better way would be to enforce the data domain through a check constraint. For example, the age column would require positive numeric values between 0 and 120. I have found that the best way to figure out all of your data domains and constraints is to spend some time designing and normalizing all of your tables.”
--Quora.com

Misconceptions

  • There are no tables and, thus, no columns in relational databases;
  • Domains are not (programming) data types;
  • It is not the data model that is normalized;
  • A referenced relation does not reference domains;
  • A SQL CHECK constraint is not "better enforcement" of a referential constraint;
  • Constraints are not determined BY logical design;
  • Logical database design does not involve explicit normalization (to 1NF) or further normalization to 5NF.

Fundamentals

  • Relational databases consist of relations with attributes defined on domains; tables with columns visualize relations with attributes, but play no part in the RDM.
  • A relational domain represents a real world property and is a database object under DBMS control and, thus, is distinct from a programming data type which is an application object under programmer control that may not represent anything in the real world.
  • 1NF (normalization) and 5NF (full normalization) are properties of relations (which comprise logical models), not of the data model (i.e., the RDM).
  • An attribute which is a foreign key in a referencing relation references a primary key which is an attribute  in a referenced relation.
  • A constraint can be expressed in syntactically different ways by a data sublanguage. The CHECK constraint is a syntactic alternative in SQL to declare referential constraints.
  • Database relations are semantically constrained to be consistent with (i.e., represent faithfully) the corresponding conceptual model. Properties and properties in context (i.e., of specific entity types) are identified during conceptual modeling. Domain and attribute constraints respectively are specified during logical design to ensure consistency with the properties and properties in context they represent in the database.
  • Database design that adheres the three principles mandated by the RDM produces 1NF and 5NF databases that do not require explicit normalization and further normalization.

Note: The difference between relational domains and programming data types are specified in Codd's RM/V2 book. SQL tables are not relations and SQL data types are not relational domains.


Recommended reading

Domains: The Database Glue

Understanding Domains and Attributes

The Interpretation and Representation of Database Relations





Friday, February 12, 2021

TYFK: What Is a Relational Database and Why Is It Important?



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:
  • Data is stored in tables called relations.
  • Relations can be normalized.
  • In normalized relations, values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same [sic] domain.”
--What is a relational database and why is it important, Quora.com

Monday, February 1, 2021

Normalization -- Will They Ever Learn?



“To Normalize or not to Normalize? that really isn't a question. few things to consider:
Normalization is supposed to protect from data anomalies, but not prevent us from using data encapsulation is the magic trick that allows you to do what you want without breaking rules.what are your experiences with normalization?”
                                                                --LinkedIn


This is a question that at this time need -- and should -- not be asked anymore, and the fact that it still is is one confirmation -- among many -- that there is no progress in data management. According to the current understanding of the RDM:

  • Database relations are both normalized (in 1NF) and fully normalized (in 5NF) by definition, otherwise they are not relations and the relational algebra (RA) does not work;
  • Adherence to three database design principles produces 1NF and 5NF relational databases;
  • Consequently, there should not be such a thing as "doing" normalization (to 1NF) and further normalization (to 5NF) except to repair databases that are non-relational due to failure to adhere to the principles.


Note: The three design principles are fundamental to SST/FOPL foundation of the RDM, but were never understood even by relational proponents. I do not know what encapsulation has to do with this.

Saturday, October 17, 2020

Understanding Codd's 12 Rules for RDBMS



In response to an online publication of a book appendix regurgitating Codd's 12 famous rules (some of which were, typically, incorrect[1]) I posted earlier a clarification of the rules. This is a revision thereof for better consistency with the new understanding of the RDM based on McGoveran's re-interpretation, extension and formalization[2] of Codd's work.

View My Stats