DATABASE DEBUNKINGS: PK

Showing posts with label PK. Show all posts

Thursday, May 2, 2024

SMS: PRIMARY KEYS & INDEXES

I am working on entirely new papers (not re-writes) in the PRACTICAL DATABASE FOUNDATIONS series. I have already published two:

THE FIRST NORMAL FORM - A DEFINITIVE GUIDE
PRIMARY KEYS - A NEW UNDERSTANDING

available for ordering from the PAPERS page, and two more:

RELATIONAL DATABASE DOMAINS: A DEFINITIVE GUIDE
DATABASE RELATIONS: A DEFINITIVE GUIDE

are in progress and forthcoming, respectively.

In the process I am coming across industry common and entrenched "pearls" that I am using for my "Setting Matters Straight" (SMS) and "To Laugh or Cry" (TLC) posts on Linkedin. I do those posts to enable the few thinking database professionals left realize how scarce foundation knowledge is, and to illustrate fallacies that abound in the industry, of which they are unaware, and which the papers are intended to dispel.

Time permitting, I may expose and dispell some of those fallacies, treated in more depth in the papers, such that those thinking professionals can test their knowledge and decide whether the papers are a worthy educational investment.

Here's one:

“There seams to be some confusion between what a Primary Key is, and what an Index is and how they are used. The Primary Key is a logical object. By that I mean that is simply defines a set of properties on one column or a set of columns to require that the columns which make up the primary key are unique and that none of them are null. Because they are unique and not null, these values (or value if your primary key is a single column) can then be used to identify a single row in the table every time. In most if not all database platforms the Primary Key will have an index created on it. An index on the other hand doesn’t define niqueness. An index is used to more quickly find rows in the table based on the values which are part of the index. When you create an index within the database, you are creating a physical object which is being saved to disk.”

Can you identify the fallacies before you proceed?

ON RELATIONAL KEYS (& DOMAINS) (t&n)

Follow @DBDebunk Follow @ThePostWest

Note: "Then & Now" (t&n) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the disregarded evolution/progress of RDM, I am re-visiting my old debunkings, bringing them up to the current state of knowledge. This will enable you to judge how well arguments have held up and realize the increasing gap between industry stagnation -- and scientific progress.

Then

ON KEYS (originally published September 2002)

“I am an application developer who is competent in SQL and have designed some small-scale schemas that have gotten the job done. I came to your website seeking enlightment on the issue of whether it is better to use strings to represent attributes (I believe these are called "speaking keys"), or to use integers and key them to a list of strings. I see that you are avidly pro-normalization, but I'm not even sure if this is a normalization issue. Here is a quote from a web site that captures my gut feeling. I hope you can comment or point me to an article that presents your view of the matter.

Many popular database tools make it very easy to create some sort of arbitrary numeric sequence for your primary keys. Be careful not to use these to excess. If you have are designing a music catalog system, it is definitely not useful to create a lot of arbitrary codes for music classification. There is unlikely to be much benefit in designating "1" to stand in for "Jazz", 2 for "Rock", 3 for "Classical" and 4 for "Latin Jazz" in an application. Why not just use the word "Jazz" for "Jazz" and use "Rock" to stand for "Rock"? It is certainly easier to remember. It also means that you won't have to lookup the value of 1 to see that it stands for "Jazz".”

RELATIONSHIPS AND THE RDM V2 Part 3: SEMANTIC CONSTRAINTS

Follow @DBDebunk Follow @ThePostWest

Note: This is a multipart re-write of a previous series that, when completed, is intended to replace it.

In Part 1 we documented the differences between mathematical and database relations (see table in Part 1). We attributed the fallacy that the RDM can express only one type of relationship -- between relations using FKs -- to the industry being unaware of the adaptation of math relations for database management. We intimated that some of the additional features of database relations express relationships other than between relations.

In Part 2 we identified the intra-group c-relationships (and the corresponding within-relation l-relationships) in our approach to conceptual modeling:

Properties-entities relationships

- general dependencies

Properties Relationships
Entities Relationships

- entity uniqueness
- functional dependencies (FD)
- entity supertype-subtypes relationships

and used a simple conceptual model (CM) of six entity groups to illustrate them:

Customers (cID, cname, FICO, discount)
Products (pID, pname, price)
Salesmen (sID, sname, sales, salary, commission)
Orders (oID, pID, cID, sID, date, amount)
Order Items (oID, iID, pID, quantity)

Database design is the use of a data model (DM) (here, RDM) to formalize conceptual models (CM) -- including c-relationships -- as logical models (LM) for database representation, so it must be able to convert the business rules (BR) that express those relationships in specialized natural language at the conceptual level to formal constraints in a FOPL-based data sublanguage at the logical level.

Our intention is to demonstrate that the RDM can express all c-these relationships, but we face a difficulty.

RELATIONSHIPS & THE RDM V2 PART 2: INTRA-GROUP RELATIONSHIPS

Follow @DBDebunk Follow @ThePostWest

Note: This is a multi-part complete re-write of a previous series which, when completed, is intended to replace it.

In Part 1 we attributed the fallacy that the RDM can express only one type of relationship -- between relations, using FKs -- to practitioners being unaware of the adaptation of math relations to database management and missing the additional features of database relations. We documented the differences in features between math and database relations (see the table in Part 1) and intimated that some of the additional features express relationships other than those between relations using FKs (which we leave out in this discussion).

In this Part 2 we identify the c-relationships and use a simple conceptual model (CM) of six entity groups:

Customers (cID, cname, FICO, discount)
Products (pID, pname, price)
Salesmen (sID, sname, sales, salary, commission)
Orders (oID, pID, cID, sID, date, amount)
Order Items (oID, iID, pID, quantity)

to illustrate them (to recall, we prepend 'relationship' with c- and l- when we use the term at the conceptual and logical levels, respectively).

NOBODY UNDERSTANDS FURTHER NORMALIZATION 4 (sms)

Follow @DBDebunk Follow @ThePostWest

Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

In Part 3 we set the matter straight about normalization to 1NF. In this part we do it wit respect to further normalization to 5NF. Non-1NF relations (i.e., with relation-valued attributes) are no longer part of industry practice, so we focus on 2NF-5NF violations. The term further normalization originates with Codd, who initially thought 1NF was sufficient and 2NF-5NF were discovered later (hence, further = beyond 1NF). The industry lumps both under normalization, but the two are distinct (e.g., only further normalization involves redundancy).

What's right/wrong with the following?

“So, what is this theory of normal forms? It deals with the mathematical construct of relations (which are a little bit different from relational database tables). First, second, and third normal forms are the basic normal forms in database normalization. Normalization in relational databases is a design process that minimizes data redundancy and avoids update anomalies. Basically, you want each piece of information to be stored exactly once; if the information changes, you only have to update it in one place. The normalization process consists of modifying the design through different stages, going from an unnormalized set of relations (tables), to the first normal form, then to the second normal form, and then to the third normal form.”
--Vertabelo.com

NOBODY UNDERSTANDS NORMALIZATION 1 (sms)

Follow @DBDebunk Follow @ThePostWest

What's right/wrong with this database picture?

“Normalization in relational databases is a design process that minimizes data redundancy and avoids update anomalies. Basically, you want each piece of information to be stored exactly once; if the information changes, you only have to update it in one place. The theory of normal forms gives rigorous meaning to these informal concepts. There are many normal forms. In this article, we’ll review the most basic:
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
There are normal forms higher than 3NF, but in practice you usually normalize your database to the third normal form or to the Boyce-Codd normal form, which we won’t cover here.”

--Vertabelo.com

DATABASE DESIGN: THE STATE OF KNOWLEDGE IN THE INDUSTRY

Follow @DBDebunk Follow @ThePostWest

Can you identify all the fallacies and misconceptions in the following online exchange? What is the elephant in the room?

Q: “I have done data normalization on dummy data and would like to know if I did it correctly. If it is done correctly, I would also like to ask two things below, because it is about 3NF.

1NF: This table should be 1NF.

2NF: I selected composite key (userID and Doors) as they represent minimal candidate key and got three tables applying FD rule.

3NF: Applying the rule of transitive dependency on 1st table in 2NF, I got out 4 tables (showing only first two, because the last two remain unchanged).

Questions: Is this database normalisation correct? If not could you point me where I did mistake? If answer on first question is True: Should the last table in 3NF be transformed into two tables, given it is not in correct Third normal form. Two non-key atributes have FD keycode -> accessGroup.”

DATABASE RELATIONS, DATABASE DESIGN & CORRECTNESS (sms)

Follow @DBDebunk Follow @ThePostWest

Note: In "Setting Matters Straight" posts I debunk online Q&As that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“...The relational model organizes data through relations (aka tables). You then normalize it in one of six forms. By normalizing data you:
- Reduce redundancy
- Ensure consistency
- Optimize for atomic inserts, updates and deletes
The biggest drawback ... are keys that let you join different tables across multiple systems.”

--LinkedIn.com

RELATIONS, DATABASE RELATIONS AND TABLES (sms)

Follow @DBDebunk Follow @ThePostWest

Q: “What is a relation in database?”

A: “Relational databases were so named in 1970 by computer scientist E. F. Codd because the tables are themselves relations, which is a mathematical term. What makes a relation (aka a table) a relation? Basically:
A relation has a heading, which names a finite set of columns.
Columns are defined by their name and their type.
A relation has a finite set of tuples (aka rows), and every tuple has the same set of columns (i.e. same name and type) as those named in the heading.
Being finite sets, both the set of columns in the heading and the set of tuples in the relation have no duplicates and no inherent order.
See? There’s nothing about relationships between tables in the definition of a relation. You could have a relational database that contains just one relation. If there’s any relationship described in a relation, it’s actually the relationship between the columns within a relation. That is, the value "Pittsburgh" goes with the value "Steelers" on a given row because the relation is defined as "pro football teams by city" and therefore there’s a linkage between some values in the set of football teams and the set of city names.” --Quora.com

KEYS & INDEXES (sms)

Follow @DBDebunk Follow @ThePostWest

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals to encourage readers to test their foundation knowledge, which they can then compare with our debunking here, where we confirm what is correct and correct what is fallacious (with clarifications, wherever necessary). For in-depth treatment check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs).

Q: “What is the difference between a primary key, a unique key, and an index in databases?”

A: “Unique key is a field (or fields) with a set of unique values; the uniqueness is usually enforced with UNIQUE constraint. There might be one or more per table. Every PRIMARY key is always a unique key; there should be only one per table. It uniquely identifies record, and is used to enforce integrity - entity integrity, and, in tandem with FOREIGN key, referential integrity. Index is a data structure to facilitate records search. It might be created on PRIMARY key (best practice), unique key or any other field or combination thereof in the table. The limit on how many indices a table might have is defined in RDBMS implementation. An index might - or might not - speed up some queries.”

A: “The primary key is inherently indexed and unique and is the cross reference to related tables. Often the best primary key is an auto number integer as any value entered by humans is subject to error or delay that can be challenging to manage in the user interface ... whereas an auto number is assigned immediately upfront and eliminates any possible record conflict in tables during multi user entries. A unique key is somewhat of an informal definition. My view is that it is a definition of a field that is not being used as the primary key, but is unique unlike i.e. Last Name -- for instance a social security number field. So it is not the primary key as it is not the field/value being used to cross reference to related tables but it is unique in the table.”

A: “A primary key is a unique, non null value which can identify every tuple (row in the table) uniquely. A unique key/column/constraint ensures that no two rows contain the same value (almost the same as primary key). Unless specified explicitly for the column configuration, a NULL is a valid value for column with unique constraint. A index can be thought of as the appendix at the end of the book. The information is sorted in specific order so that look up is easy and it points to the location that is being searched for.”

--Quora.com

RELATIONSHIPS: UNIQUENESS & ATTRIBUTE CONSTRAINTS (tyfk)

Follow @DBDebunk Follow @ThePostWest

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“A unique constraint is a type of column restriction within a table, which dictates that all values in that column must be unique [and] allows null values ... a null is the complete absence of a value (not a zero or space). Thus, it is not possible to say that the value in that null field is not unique, as nothing is stored in that field.”
--Techopedia

This is one of my recent "What's Wrong with this database picture" posts on LinkedIn.

Misconceptions

In the RDM a uniqueness constraint:

Should not be viewed solely as a "column restriction within a table'.
Does NOT allow SQL "NULLs" (not "NULL values"), which have nothing to do with storage.

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 6: DEBUNKING AN ONLINE EXCHANGE 5 (obg)

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability offered by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references, which I enclose in square brackets).

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the) order of the original publication.

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

CLARIFICATIONS ON A DISCUSSION OF MY BOOK PART 2

(originally posted 2/21/01)

In Part 1 debunked a review of my book @Slashdot.Org. In parts 2-5 I tackled the discussion generated there by the review. In this last part I focus on the discussion of data hierarchies covered in chapter 7 of my book [the in-vogue re-emergent graph fad].

“Chapter 7 discusses data hierarchies and trees. In a nutshell: there are no trees in SQL. The author is distressed by this. Given that a foreign key is basically a pointer, you can store trees in databases, it might not be pretty and there may not be easy way to read them and it might not be a good thing to do - but if you feel the need then get right in there. Of course I could be totally wrong about this.”

Confusing keys with pointers is one of the major errors many practitioners make ]. One intentional core advantage of the RDM is precisely that it prohibits pointers -- both physical and, as in object-orientation, logical. Exposing pointers to users has caused many unnecessary problems and complications, but offered no benefit (Don't Mix Pointers and Relations and Don't Mix Pointers and Relations - Please! in Date's RELATIONAL DATABASE WRITINGS 1994-1997). There is an easy way to demonstrate that relational keys are not, like object IDs (OID), pointers, but values: they represent uniquely identifying names/attributes of rel world entities. Pointers are system-generated internals and have no real world counterpart. The desirability of a data model that produces logical models that are faithful representations of the real world, without adding artifacts of their own. Indeed, as Date points out in Why The Object Model' is Not a Data Model in his above-mentioned book, the fact that "in the object world all the references to objects are by means of their corresponding OIDs explains why -- as is well known -- OO systems typically provide (a) two different equality comparison operators, equal OID vs. equal value and (b) two different assignment operators, assign OID vs. assign value. Note the added complication -- what is the benefit?

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 5: DEBUNKING AN ONLINE EXCHANGE 4 (obg)

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

Slashing a Slashdot Exchange - Part 1

(first published @DBAzine.com in 2001)

I was recently contacted by a reporter for an interview. When I expressed my disappointment with the trade media’s tendency to regurgitate vendor marketing claims instead of assessing them, he admitted "that is what happens about 98 percent of the time", but added "There are some outlets with a good piece from time to time that deal with serious architecture issues", mentioning SlashDot as one of them.

There is, of course, a Catch 22 here: to judge the seriousness of such outlets, foundation and substantive knowledge is necessary in the first place. And, alas, reporters possess even less of it than vendors and users (see, for example, The Ignorance Mechanism, On Trade Media’s "Balance"),
without which sources may appear serious even when they are nothing of the sort. As luck would have it, I ran into a good opportunity to prove this point for SlashDot. It so happened that shortly after my exchange with the journalist, Database Debunkings experienced a sudden ten-fold increase in traffic. Now, [given that my target audience is thinking practitioners,] were my material to suddenly become "hot", I would worry as to where I did go wrong. But the odds for that are rather slim and, fortunately, there was no need for concern: an email from a reader informed me that "there recently was an article posted to SlashDot.org which refers to Dbdebunk.com and Mr. Pascal/Date" and "There [were] some 443 comments to that posting." Such volume is practically always indicative of heat (hot air, to be more precise), rather than light. Ah, well, I thought, yet another source of weekly quotes (as if one was needed).

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 2: DEBUNKING A BOOK REVIEW (obg)

Follow @DBDebunk Follow @ThePostWest

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

Clarifications of a Review of My Book Part 1

(originally posted 1/14/2001)

“Many of us ... do not think that harmony is the great goal, or unity or peacefulness, [and] actually quite like hard questions for their own sake, and enjoy ... the life of the mind. To the question of how to live, the answer is "by disagreement.” --Christopher Hitchens

Let me say, first and foremost, that as the subtitle of the book -- A REFERENCE FOR THE THINKING PRACTITIONER -- indicates, it is targeted at the minority of practitioners who think clearly, independently and critically. It should not be a surprise, then, that those not belonging to that (alas, very small) target audience don't see its practical value. As I said so many times, if my work gained mass appeal, I would wonder what I was doing wrong. This is the sad reality, whether we like it or not. In fact, to be consistent I will go one step further: I don't assume that positive reviews are any better than negative ones -- they are frequently grounded in as faulty reasoning and/or ignorance as the critiques.

Let me also make clear that I do not place all of the blame on the individual database practitioners or users. Rather, problems are rooted in a systemic, much more profound societal and business culture that fails to instill and encourage foundation knowledge and independent, critical thinking, which not only does not reward, but actually punishes such. This is true to a degree in all societies, of course, but in the US the problem is much more acute (there can hardly be a better demonstration of the horrendous implications of this than how the election was covered, perceived and accepted by most of the press and public) [I wrote this prior to the last two elections -- I leave it to the reader to judge the steepness of the subsequent regress.]

TYFK: What Domains Are and Are Not

Follow @DBDebunk Follow @ThePostWest

“...a Data Domain refers to all the valid values which a data element (column) may contain. The rule for determining the domain boundary may be as simple as a data type with a list of possible values. For example, a database table that has information about people, with one record per person, might have an "age" column. This gender column might be declared as a SMALLINT data type, and allowed to have a value between 0 and 120. The data domain for the age column is hence 0 - 120. In a normalized data model, the reference domain is typically specified in a reference table. Following the previous example, the age reference table could have exactly 120 records, one per allowed value. Reference tables are formally related to other tables in a database by the use of foreign keys. A better way would be to enforce the data domain through a check constraint. For example, the age column would require positive numeric values between 0 and 120. I have found that the best way to figure out all of your data domains and constraints is to spend some time designing and normalizing all of your tables.”
--Quora.com

Misconceptions

There are no tables and, thus, no columns in relational databases;
Domains are not (programming) data types;
It is not the data model that is normalized;
A referenced relation does not reference domains;
A SQL CHECK constraint is not "better enforcement" of a referential constraint;
Constraints are not determined BY logical design;
Logical database design does not involve explicit normalization (to 1NF) or further normalization to 5NF.

Fundamentals

Relational databases consist of relations with attributes defined on domains; tables with columns visualize relations with attributes, but play no part in the RDM.
A relational domain represents a real world property and is a database object under DBMS control and, thus, is distinct from a programming data type which is an application object under programmer control that may not represent anything in the real world.
1NF (normalization) and 5NF (full normalization) are properties of relations (which comprise logical models), not of the data model (i.e., the RDM).
An attribute which is a foreign key in a referencing relation references a primary key which is an attribute in a referenced relation.
A constraint can be expressed in syntactically different ways by a data sublanguage. The CHECK constraint is a syntactic alternative in SQL to declare referential constraints.
Database relations are semantically constrained to be consistent with (i.e., represent faithfully) the corresponding conceptual model. Properties and properties in context (i.e., of specific entity types) are identified during conceptual modeling. Domain and attribute constraints respectively are specified during logical design to ensure consistency with the properties and properties in context they represent in the database.
Database design that adheres the three principles mandated by the RDM produces 1NF and 5NF databases that do not require explicit normalization and further normalization.

Note: The difference between relational domains and programming data types are specified in Codd's RM/V2 book. SQL tables are not relations and SQL data types are not relational domains.

Recommended reading

Domains: The Database Glue

Understanding Domains and Attributes

The Interpretation and Representation of Database Relations

Sunday, April 11, 2021

TYFK: Relations, Tables, and Semantic Consistency

Follow @DBDebunk Follow @ThePostWest

“A relation, or table, in a relational database ... must have a set of columns or attributes, and it must have a set of rows to contain the data. A tuple (or row) can be a duplicate. In practice, a database might actually contain duplicate rows, but there should be practices in place to avoid this, such as the use of unique primary keys (next up). Given that a tuple cannot be a duplicate, it follows that a relation must contain at least one attribute (or column) that identifies each tuple (or row) uniquely. This is usually the primary key. This primary key cannot be duplicated.”

Misconceptions

A relation is not a table and, thus, has neither columns, nor rows (certainly not fields);
"Duplicate tuples" is a contradiction in terms -- a table with duplicate rows does not visualize a relation (i.e., is not a R-table) -- and a database with duplicated data is not relational;
Unlike a mathematical relation, there is no such thing as a database relation without a PK, which would be semantically inconsistent (i.e. it would not be a faithful representation of group of entities, which are distinguishable in the real world);

Fundamentals

A database relation:

- is a relationship among domains (sets of values) -- a subset of their cross-product -- a set of tuples (sets of values drawn from the domains) or, in other words, a set of sets from sets;
- has attributes, which are representations (1:1 mappings) of the domains in the relation;
- is semantically constrained to be consistent with (i.e., represent faithfully) the entity group in the conceptual model represented by the database), including by a PK constraint:

. domains represent properties;

. relations represent entity groups;

. attributes represent entity properties;
. tuples represent (facts about -- properties of) entities;
. some constraints represent relationships among properties, entities and groups (some properties are relationships);

Duplication would violate the RDM and mean semantic inconsistency with (inaccurate representation of) the group.
A R-table visualizes a relation on some physical medium -- it plays no part in the relational model.

Further Reading

What Relations Really Are and Why They Are Important

Understanding Relations series

What Is a Relational Database

Friday, March 12, 2021

TYFK: Tables Can Visualize -- But ARE NOT -- Relations!

Follow @DBDebunk Follow @ThePostWest

“A relation, or table, in a relational database ... must have a set of columns or attributes, and it must have a set of rows to contain the data. A tuple (or row) can be a duplicate. In practice, a database might actually contain duplicate rows, but there should be practices in place to avoid this, such as the use of unique primary keys (next up). Given that a tuple cannot be a duplicate, it follows that a relation must contain at least one attribute (or column) that identifies each tuple (or row) uniquely. This is usually the primary key. This primary key cannot be duplicated.”

Misconceptions

A relation is not a table and, thus, has neither fields, nor columns (which are not attributes) and rows;
"Duplicate tuples" is a contradiction in terms -- a table with duplicate rows does not visualize a relation (i.e., is not a R-table) -- and a database with duplicated data is not relational;
Without PKs a relation is not semantically consistent (i.e. it does not faithfully represent group of entities, which are distinguishable in the real world);

Fundamentals

A database relation

- is a relationship among domains (sets of values) -- a subset of their cross-product -- a set of tuples (sets of values drawn from the domains) or, in other words, a set of sets from sets;
- has attributes, which are representations (1:1 mappings) of the domains in the relation;
- is semantically constrained to be consistent with (i.e., represent faithfully) the entity group in the conceptual model it represents in the database.

A R-table visualizes a relation on some physical medium -- it plays no part in the relational model.

Absence of PKs is semantic inconsistency with (inaccurate representation of) reality and a violation of the RDM.

Further Reading

What Relations Really Are and Why They Are Important

Understanding Relations series

What Is a Relational Database

Duplicates: Stating the Same Fact More Than Once Does Not Make it Truer, Only Redundant

Saturday, January 23, 2021

"Codd was wrong" and "You're teaching the gospel" Betray Lack of Foundation Knowledge

Follow @DBDebunk Follow @ThePostWest

Note: I have documented and debunked these misconceptions so many times that I will no longer reference them -- the reader motivated to gain genuine understanding should use the (1) blog labels (2) Blogger search (3) POSTS page to locate the relevant posts.

I have long claimed that a core problem in the industry is the vast majority of practitioners who use relational terminology, do not know/understand what it means, yet are convinced they do -- the less the understanding, the greater the convinction. A recent LinkedIn exchange provided -- as if it were needed -- yet another example. It was triggered by my comment:

“How many know today that a relation is by definition in 5NF, otherwise it's not a relation, the relational algebra has "anomalies" and all bets are off? IMO, none! If you need to "do" normalization, you did not design correctly, which means you don't understand the RDM.”

that prompted the following reaction:

“Is that really true? You construct a table and fill it full of garbage. It may not even be in 1NF, but is it not still a "relation" of columns, even if it's not a relation of rows or attributes? Codd had no real conception of syntax as separate from semantics, I don't think relational theory has a clear position on this. This is where Kimball and dimensional systems differ from Codd's relational, it made some effort (not a lot) to distinguish syntactic and semantic elements.”
--Joshua Stern

TYFK: Relations, Tables, Domains and Normalization

Follow @DBDebunk Follow @ThePostWest

Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:

Data is stored in tables called relations.

Relations can be normalized. In normalized relations, values saved are atomic values.

Each row in a relation contains a unique value.

Each column in a relation contains values from a same domain.”

OBG: Data Independence and "Physical Denormalization"

Follow @DBDebunk Follow @ThePostWest

Note: I am re-publishing some of the articles and reader exchanges from the old DBDebunk (2000-06). How well do they hold up -- have industry knowledge and practice progressed? Judge for yourself and appreciate the difference between a sound foundation and the fad-driven cookbook approach.

January 2, 2001

"... one of the "4 great lies" is "I denormalize for performance." You state that normalization is a logical concept and, since performance is a physical concept, denormalization for performance reasons is impossible (i.e., it doesn't make sense). What term would you use to describe changing the physical database design to be different from the logical design to enhance performance? Because normalization is a logical concept, you imply that this is not called denormalization."

POSTS

Thursday, May 2, 2024

Monday, July 31, 2023

Then

Sunday, April 30, 2023

Sunday, April 16, 2023

Friday, December 2, 2022

Saturday, October 8, 2022

Monday, September 12, 2022

Saturday, August 20, 2022

Sunday, July 3, 2022

Friday, March 25, 2022

Sunday, March 6, 2022

Misconceptions

Saturday, February 19, 2022

CLARIFICATIONS ON A DISCUSSION OF MY BOOK PART 2

Sunday, February 13, 2022

Slashing a Slashdot Exchange - Part 1

Friday, December 17, 2021

Clarifications of a Review of My Book Part 1

Monday, May 10, 2021

Misconceptions

Fundamentals

Sunday, April 11, 2021

Misconceptions

Fundamentals

Friday, March 12, 2021

Misconceptions

Fundamentals

Saturday, January 23, 2021

Thursday, August 20, 2020

Monday, July 20, 2020