DATABASE DEBUNKINGS: NULL

Showing posts with label NULL. Show all posts

Thursday, May 2, 2024

SMS: PRIMARY KEYS & INDEXES

I am working on entirely new papers (not re-writes) in the PRACTICAL DATABASE FOUNDATIONS series. I have already published two:

THE FIRST NORMAL FORM - A DEFINITIVE GUIDE
PRIMARY KEYS - A NEW UNDERSTANDING

available for ordering from the PAPERS page, and two more:

RELATIONAL DATABASE DOMAINS: A DEFINITIVE GUIDE
DATABASE RELATIONS: A DEFINITIVE GUIDE

are in progress and forthcoming, respectively.

In the process I am coming across industry common and entrenched "pearls" that I am using for my "Setting Matters Straight" (SMS) and "To Laugh or Cry" (TLC) posts on Linkedin. I do those posts to enable the few thinking database professionals left realize how scarce foundation knowledge is, and to illustrate fallacies that abound in the industry, of which they are unaware, and which the papers are intended to dispel.

Time permitting, I may expose and dispell some of those fallacies, treated in more depth in the papers, such that those thinking professionals can test their knowledge and decide whether the papers are a worthy educational investment.

Here's one:

“There seams to be some confusion between what a Primary Key is, and what an Index is and how they are used. The Primary Key is a logical object. By that I mean that is simply defines a set of properties on one column or a set of columns to require that the columns which make up the primary key are unique and that none of them are null. Because they are unique and not null, these values (or value if your primary key is a single column) can then be used to identify a single row in the table every time. In most if not all database platforms the Primary Key will have an index created on it. An index on the other hand doesn’t define niqueness. An index is used to more quickly find rows in the table based on the values which are part of the index. When you create an index within the database, you are creating a physical object which is being saved to disk.”

Can you identify the fallacies before you proceed?

NULL & THE "2ND ADDRESS LINE" (sms)

Follow @DBDebunk Follow @ThePostWest

Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “Does anyone have an alternative solution to the issue where multiple nulls are allowed in the same column for a unique index containing that column? I have a number of such nullable columns and unique index combinations in my database schema and I'm trying to avoid having to create additional generated as columns for each case.”

A: “null<>null so the problem's built into unique columns, perhaps the columns need to become non-nullable?”

“Some attributes in a tuple are not meant to be part of any relationship so people are safe to populate such relationshipless attributes with a Null when needed. Just an example, how about the famous MailAddressSecondLine attribute describing the optional second line of a mail address? You can populate it if needed or you can leave it with no value a.k.a. Null if you have nothing to put in there.”

“So, when YOU create an address table, do you have a "second address" line or do you normalize it? Enquiring minds want to know.”

“You are asking how I would implement optional parts of an address within the limitations of a fixed set of numbered address line attributes? If an address has fewer lines than there are attributes then I'd populate the extra lines with zero-length strings. Null wouldn't be appropriate because all the parts of the address are known and present and using null would give undesirable results. For example if you compare all the parts of an address in a join or subquery you ought to expect two identical addresses to be returned as equivalent, but that wouldn't work if you start adding nulls into addresses.”

We have discussed extensively the problems with and solution to the treatment of missing data in relational databases:

PREDICATE LOGIC, SEMANTICS AND RDM (sms)

Follow @DBDebunk Follow @ThePostWest

“As I have said many times, if the original relational model had been based on predicate logic and also the semantics and rules of definitions we'd all be better off now. It wasn't. Full stop.”
--Ronald Ross, LinkedIn.com

Assessing such arguments normally requires clarification of what exactly is meant by "the relational model". Ross does refer specifically to the "original" -- which we take to mean that introduced by Codd in 1969-70 -- but given the massive misuse and abuse in the industry, perceptions of it may well be corrupted (Nobody Understands the Relational Model Semantics, Relational Closure and Database Correctness). Moreover, there are many predicate logic (PL) systems and many ways of categorizing them (1st vs n-th order being only one way) -- we assume Ross means RDM is based on none.

MISSING DATA: RDM VS SQL -- A REAL WORLD COMPARISON (t&n)

Follow @DBDebunk Follow @ThePostWest

Note: "Then & Now" (t&n) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the disregarded evolution/progress of RDM, I am re-visiting my old debunkings, bringing them up to the current state of knowledge. This will enable you to judge how well arguments have held up and realize the increasing gap between industry stagnation -- and scientific progress.

(This a revised version of an earlier post with clarity improvements).

Q: “What would you suggest for a datetime field where the value is not known and should therefore be not-applicable?”
A: ”NULL sounds good to me.”

While searching through records I came across an old consulting project involving the migration of a neo-natal research database from Focus -- an old hierarchic DBMS -- used to record extensive details about hundreds of monthly births at a university hospital for more than 20 years. The person who had designed the Focus database was the only one who knew and understood its complexity sufficiently to maintain it. Each time a researcher needed some subset of data to analyze, he would extract it and serve it upon request. Aside from the inefficiency of the process, the person was retiring at a time when hierarchic DBMSs reached the end of their usefulness, Focus experts were already few and expensive and "relational" (read: SQL) was the dominant fad.

MISSING DATA AND MULTI-RELATION QUERY RESULTS (t&n)

Follow @DBDebunk Follow @ThePostWest

Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

On NULLs and Multi-Table Relvars

(first published 04/05/02)

"I had a question about the missing-values suggestion in PRACTICAL ISSUES IN DATABASE MANAGEMENT, page 234. You write:

"Table operations would have to be modified to yield results with as many tables as there are types of propositions with only known values."

How would this be represented in a language like Tutorial D, where relvars are required to be strongly typed? One possible idea is to make use of type inheritance. Suppose I had a domain of tuple values {x,a,b,c} (all integers, say) where x is not allowed to be missing but a, b, and c are allowed to be missing. Suppose we extended the domains of a, b, and c with an "imaginary" special value that we will never represent, which I will show for diagram purposes only as '?'. Then the domain can be split into parts:

XABC {x,a,b,c} possrep: {X: int, A: int, B: int, C: int}
XAB {x,a,b,'?'} possrep: {X: int, A: int, B: int}
XAC {x,a,'?',c} possrep: {X: int, A: int, C: int}
XBC {x,'?',b,c} possrep: {X: int, B: int, C: int}
XA {x,a,'?','?'} possrep: {X: int, A: int}
XB {x,'?',b,'?'} possrep: {X: int, B: int}
XC {x,'?','?',c} possrep: {X: int, C: int}
X {x,’?','?','?'} possrep: {X: int}

Using Mr. Date's specialization by constraint idea, we can inherit all the subtuple types from the main tuple type. Updates could make a tuple change type. A relation of relations of XABC type could be used to return results of a query. Each relation within the relation would contain one subtype.

However, the exponential explosion of possible subtypes would be very difficult to handle, practically speaking. As you admit in your book, a real DBMS might have to handle thousands of small subtables. This cannot be passed off as an "implementation detail" since table operations "yield results" at the user presentation level. No matter how efficient the underlying system might be, this seems unacceptable. Perhaps we have to fall back on default values after all."

RELATIONSHIPS: UNIQUENESS & ATTRIBUTE CONSTRAINTS (tyfk)

Follow @DBDebunk Follow @ThePostWest

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“A unique constraint is a type of column restriction within a table, which dictates that all values in that column must be unique [and] allows null values ... a null is the complete absence of a value (not a zero or space). Thus, it is not possible to say that the value in that null field is not unique, as nothing is stored in that field.”
--Techopedia

This is one of my recent "What's Wrong with this database picture" posts on LinkedIn.

Misconceptions

In the RDM a uniqueness constraint:

Should not be viewed solely as a "column restriction within a table'.
Does NOT allow SQL "NULLs" (not "NULL values"), which have nothing to do with storage.

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 6: DEBUNKING AN ONLINE EXCHANGE 5 (obg)

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability offered by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references, which I enclose in square brackets).

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the) order of the original publication.

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

CLARIFICATIONS ON A DISCUSSION OF MY BOOK PART 2

(originally posted 2/21/01)

In Part 1 debunked a review of my book @Slashdot.Org. In parts 2-5 I tackled the discussion generated there by the review. In this last part I focus on the discussion of data hierarchies covered in chapter 7 of my book [the in-vogue re-emergent graph fad].

“Chapter 7 discusses data hierarchies and trees. In a nutshell: there are no trees in SQL. The author is distressed by this. Given that a foreign key is basically a pointer, you can store trees in databases, it might not be pretty and there may not be easy way to read them and it might not be a good thing to do - but if you feel the need then get right in there. Of course I could be totally wrong about this.”

Confusing keys with pointers is one of the major errors many practitioners make ]. One intentional core advantage of the RDM is precisely that it prohibits pointers -- both physical and, as in object-orientation, logical. Exposing pointers to users has caused many unnecessary problems and complications, but offered no benefit (Don't Mix Pointers and Relations and Don't Mix Pointers and Relations - Please! in Date's RELATIONAL DATABASE WRITINGS 1994-1997). There is an easy way to demonstrate that relational keys are not, like object IDs (OID), pointers, but values: they represent uniquely identifying names/attributes of rel world entities. Pointers are system-generated internals and have no real world counterpart. The desirability of a data model that produces logical models that are faithful representations of the real world, without adding artifacts of their own. Indeed, as Date points out in Why The Object Model' is Not a Data Model in his above-mentioned book, the fact that "in the object world all the references to objects are by means of their corresponding OIDs explains why -- as is well known -- OO systems typically provide (a) two different equality comparison operators, equal OID vs. equal value and (b) two different assignment operators, assign OID vs. assign value. Note the added complication -- what is the benefit?

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 5: DEBUNKING AN ONLINE EXCHANGE 4 (obg)

Follow @DBDebunk Follow @ThePostWest

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

Slashing a Slashdot Exchange - Part 1

(first published @DBAzine.com in 2001)

I was recently contacted by a reporter for an interview. When I expressed my disappointment with the trade media’s tendency to regurgitate vendor marketing claims instead of assessing them, he admitted "that is what happens about 98 percent of the time", but added "There are some outlets with a good piece from time to time that deal with serious architecture issues", mentioning SlashDot as one of them.

There is, of course, a Catch 22 here: to judge the seriousness of such outlets, foundation and substantive knowledge is necessary in the first place. And, alas, reporters possess even less of it than vendors and users (see, for example, The Ignorance Mechanism, On Trade Media’s "Balance"),
without which sources may appear serious even when they are nothing of the sort. As luck would have it, I ran into a good opportunity to prove this point for SlashDot. It so happened that shortly after my exchange with the journalist, Database Debunkings experienced a sudden ten-fold increase in traffic. Now, [given that my target audience is thinking practitioners,] were my material to suddenly become "hot", I would worry as to where I did go wrong. But the odds for that are rather slim and, fortunately, there was no need for concern: an email from a reader informed me that "there recently was an article posted to SlashDot.org which refers to Dbdebunk.com and Mr. Pascal/Date" and "There [were] some 443 comments to that posting." Such volume is practically always indicative of heat (hot air, to be more precise), rather than light. Ah, well, I thought, yet another source of weekly quotes (as if one was needed).

READ MY LIPS: IF THERE'S NULLs, IT'S NOT RELATIONAL

Follow @DBDebunk Follow @ThePostWest

“Let's say I want to store a list of movies that are stored on iTunes. For simplicity, we'll just store a few fields so that the film Avatar has these values:
ID: 354112018
Name: Avatar
Year: 2009
Synopsis: "From Academy Award®-winning director James Cameron comes Avatar, the story..."
However, sometimes the Synopsis is missing...and sometimes the Year is missing. Without giving it a second thought, I would probably create one table to store those four fields, something like this:
ID (INT)
Name (VARCHAR)
Year (INT NULL)
Synopsis (VARCHAR NULL)
Is there any advantage in 'further normalizing' the database so that, for example, I don't store any null values, such as:
Title
TitleID
Name

TitleSynopsis
TitleID
Synopsis

TitleYear
TitleID
Year
To me it seems like doing this would potentially create hundreds of extra tables (on a large database) and make inserts a nightmare -- I suppose a View could be created to flatten out the results so it's queryable, but even though I feel like it would require so much overhead. So is there any reason in the above case to normalize to remove nulls, or in general, what would be the case to do so, if there ever is one?” --StackOverflow.com

Fallacies

That we see this in 2022 is testament to abysmal ignorance of fundamentals in the industry. Let's enumerate the fallacies:

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 2: DEBUNKING AN ONLINE EXCHANGE 1 (obg)

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the soundness and stability conferred by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. In re-publishing I may revise, break into or merge parts and/or add comments and/or references that I enclose in square brackets).

Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com

Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com

Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com

Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com

Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com

Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

OBG: Missing Data -- Many-valued Logics and NULL Part 2

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.

In response to a LinkedIn thread where interest was expressed in a "NULL research" project, we re-publish as "Oldies But Goodies" a series of old exchanges with readers reacting to our The Final NULL in the Coffin: A Relational Solution to Missing Data: in Parts 1, 2, 3 we re-published a past exchange between myself and Hugh Darwen with a reader on the pros and cons of our solution vs. Hugh Darwen's "horizontal decomposition" NULLless solution, and in Part 4 my debunking of reactions to an article of mine on the subject which exhibit the confusion inevitably evoked by NULL.

We complete this series with an exchange with a reader who claimed that our criticism applies to SQL's specific implementations of NULL, not to the NULL concept itself. I and David McGoveran explained that the implementation flaws in SQL derive directly from the deeper problems endemic to many-valued logics.

OBG: Missing Data -- Many-valued Logics and NULL Part 1

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may break long pieces into multiple posts, revise, and/or add comments and references.

In response to a LinkedIn exchange we continue the series about missing data, NULL and the RDM. In Parts 1,2 and 3 we re-published a past exchange between myself and Hugh Darwen on the pros and cons of our relational solution to missing data vs. Hugh's "horizontal decomposition".

Here we re-publish my debunking of reactions to an article of mine exhibiting the common confusions evoked by NULL.

OBG: Missing Data -- "Horizontal Decomposition" Part 2

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability of a sound foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may break long pieces into multiple posts, revise, and/or add comments and references.

In Part 1 we re-published a reader's response to "horizontal decomposition" -- Hugh Darwen's How to Handle Missing Information without Using NULLs -- in comparison to our The Final NULL in the Coffin: A Relational Solution to Missing Data). Here's Hugh's response.

OBG: Missing Data -- "Horizontal Decomposition" Part 1

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability of a sound foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may break long pieces into multiple posts, revise, and/or add comments and references.

“I'm excited to share a data.world research partnership with Prof Leonid Libkin and Paolo Guagliardo from The University of Edinburgh. Our goal is to understand how NULL values are used in the real word to bridge theory and practice. Please help us by participating in a survey.”

Thus a recent announcement on LinkedIn, which triggered reactions in praise of this "much needed effort".

Sigh! SQL's NULL is a blunder unworthy of research. The commonly used "NULL value" is a contradiction in terms, indicating that industry surveys are not a path to enlightening. The real issue is, of course, missing data, which is governed by long studied and well understood logic[1,2,3,4], though apparently not in the industry and today's academia.

In 2004 we published The Final NULL in the Coffin: A Relational Solution to Missing Data (a paper revised since) that we believe is theoretically sound and, importantly, consistent with McGoveran's work re-interpreting, extending and formalizing Codd's RDM[5]. At the time it generated a series of exchanges with readers, which were posted at the old DBDebunk (2000-2006). In light of the above they warrant re-production.

I start with the first, split in three parts: In this Part 1 a reader's reaction to both our solution and Hugh Darwen's "horizontal decomposition" alternative, How to Handle Missing Information without Using NULLs; Hugh's reply is in Part 2 and mine -- re-written to bring up to date with current state of knowledge and for clarity -- is in Part 3.

Note: In a later book Darwen dedicated a chapter to a "multi-relation" approach which seems an allusion to our solution.

Understanding Codd's 12 Rules for RDBMS

Follow @DBDebunk Follow @ThePostWest

In response to an online publication of a book appendix regurgitating Codd's 12 famous rules (some of which were, typically, incorrect[1]) I posted earlier a clarification of the rules. This is a revision thereof for better consistency with the new understanding of the RDM based on McGoveran's re-interpretation, extension and formalization[2] of Codd's work.

No Such Thing As "Current Relational Data Models"

Follow @DBDebunk Follow @ThePostWest

“... the concept of a state group is indeed a missing modeling concept in relational/current data models...”

Thus in a LinkedIn exchange. I don't know what a "state group" is, but I spent almost six decades debunking the misuses of data model in general and the abuses of the RDM in particular and I smell them from miles away. While the time when lack of foundation knowledge shocked me is long gone, practitioners' total unawareness of and indifference to it, and poor reasoning in a field founded on logic never ceases to amaze me.

What exactly are "relational/current data models"?

Don't Conflate/Confuse Primary Keys, PK Constraints, and Indexes

Follow @DBDebunk Follow @ThePostWest

“What is the difference between an index and a key? How are they related?”

“There seams to be some confusion between what a Primary Key is, and what an Index is and how they are used. The Primary Key is a logical object. By that I mean that is simply defines a set of properties on one column or a set of columns to require that the columns which make up the primary key are unique and that none of them are null. Because they are unique and not null, these values (or value if your primary key is a single column) can then be used to identify a single row in the table every time. In most if not all database platforms the Primary Key will have an index created on it. An index on the other hand doesn’t define uniqueness. An index is used to more quickly find rows in the table based on the values which are part of the index. When you create an index within the database, you are creating a physical object which is being saved to disk.”

“A primary key by default creates a clustered index. A unique constraint/key by default creates a non-clustered index.”

“An index is a (logically) ordered list of rows. For example, an index on LastName means all values are already sorted in LastName order. Usually index rows contain far fewer columns in them than the table itself (except the clustered index, which is the table). A key is a column or columns that defines the order of an index. For example, on an index ordered by (LastName,FirstName), then LastName and FirstName are the keys. Btw, a primary key is a physical object, not a logical one. The db engine needs physical rows in order to insure unique values in the index.”

--Difference between an index and a key?, SQLTeam.com

I have recently published a paper[1], and posted a multipart series[2] on relational keys. In the latter I stated as follows:

"As a relational feature, keys can only be properly understood within the formal foundation of the RDM, which is simple set theory (SST) expressible in first order predicate logic (FOPL) adapted and applied to database management. Yet that is precisely what is ignored and dismissed in the industry -- including by the authors of SQL[3]."

I have also written extensively on widespread logical-physical confusion (LPC)[4], recently specifically in the key-index context[5]. The replies above are examples -- if any more were needed -- that validate my repeated claim of lack of foundation knowledge in the industry -- can you tell what's wrong with, and what's correct in, them?

N-ary vs. Binary Relations: Anatomy of a Compound Misconception

Follow @DBDebunk Follow @ThePostWest

Note: This is a re-write of an older post, to bring it into line with the McGoveran formalization and interpretation [1] of Codd's true RDM, which includes refinements, corrections, and extensions of his own.

“Codd has based his model on n-ary relations and that is the key mistake he has made; that leads to complex structure (absolutely not necessary) and situations where there are no values known and as a consequence the need of the concept we know too well -> the null pointers; binary relations (smallest possible) are sufficient to express any predicate/sententional formula and there is no possibility to have something like null; if a value is unknown then we do not know it thus it is not a fact for us thus it is not in our database.”

We suggest you test your foundation knowledge by figuring what's wrong with this picture before you proceed.

A New Understanding of Keys Part 2: Kinds of Keys

Follow @DBDebunk Follow @ThePostWest

Note: This the second of three re-writes of older posts to bring them in line with McGoveran's formalization and interpretation[1] of Codd's true RDM. They are short extracts from a completely rewritten paper #4 in the PRACTICAL DATABASE FOUNDATIONS series[2] that provides a new perspective on relational keys, distinct from the conventional wisdom of the last five decades.

(Continued from Part 1)

"Many data and information modelers talk about all kinds of keys (or identifiers. I'll forego the distinction for now). I hear them talk about primary keys, alternate keys, surrogate keys, technical keys, functional keys, intelligent keys, business keys (for a Data Vault), human keys, natural keys, artificial keys, composite keys, warehouse keys or Dimensional Keys (or Data Warehousing) and whatnot. Then a debate rises on the use (and misuse) of all these keys ... The foremost question we should actually ask ourselves: can we formally disambiguate kinds of keys (at all)? Of all kinds of key, the primary key and the surrogate key gained the most discussion."

"If we take a look at the relational model we only see of one or more attributes that are unique for each tuple in a relation -- no other formal distinction is possible. When we talk about different kinds of keys we base our nomenclature on properties and behavior of the candidate keys. We formally do not have a primary key, it is a choice we make and as such we might treat this key slightly different from all other available keys in a relation. The discussion around primary keys stems more from SQL NULL problems, foreign key constraints and implementing surrogate keys."

--Martijn Evers,dm-unseen.blogspot.com

I've deplored the misuse and abuse of terminology due a general lack of foundation knowledge in the industry [3] for longer than I care to remember, and keys are not an exception. If "the discussion around primary keys stems more from SQL NULL problems, foreign key constraints and implementing surrogate keys", then there is no understanding of relational keys whatsoever: whatever it is, a data structure that contains NULLs is not a relation, one reason for which SQL tables are not relations, SQL databases are not relational and SQL DBMSs are not RDBMSs (for a relational solution to missing data without NULLs see[4]).

We sure can disambiguate, but the key (pun intended) to keys is that they are a relational feature and, thus, can only be properly understood within the dual theoretical foundation of the RDM, which is an adaptation and application of simple set theory (SST) expressible in first order predicate logic (FOPL) to database management. Thus, their "nomenclature on properties and behavior" should reflect what from the real world they represent, and what function they fulfill in the RDM. Which is precisely what the industry disregards.

"NULL Value" is a Contradiction in Terms

Follow @DBDebunk Follow @ThePostWest

There is nothing wrong with Hugo Kornelis' picture of SQL NULL in NULL: The database's black hole. In fact, I recommend the series of which it is one part. It's SQL's picture of how to treat missing data that's wrong.

"Let’s first take a look at what NULL is supposed to be. Here is the definition of NULL from the SQL-2003 standard: null value--A special value that is used to indicate the absence of any data value."

While the absence of a value may be represented by some value at the implementation level, I strongly recommend users not think of NULL as any kind of value at the model level. The problems with NULL stem precisely from the fact that it is not a value, but rather a marker for the absence of a value. NULL value is a contradiction in terms that distracts from the problems.

POSTS

Thursday, May 2, 2024

Tuesday, July 25, 2023

Monday, June 19, 2023

Saturday, April 8, 2023

Wednesday, July 13, 2022

On NULLs and Multi-Table Relvars

Sunday, March 6, 2022

Misconceptions

Saturday, February 19, 2022

CLARIFICATIONS ON A DISCUSSION OF MY BOOK PART 2

Sunday, February 13, 2022

Slashing a Slashdot Exchange - Part 1

Friday, January 21, 2022

Fallacies

Saturday, January 8, 2022

Saturday, January 16, 2021

Saturday, January 9, 2021

Friday, November 27, 2020

Sunday, November 22, 2020

Saturday, October 17, 2020

Thursday, May 28, 2020

Tuesday, September 18, 2018

Wednesday, July 4, 2018

Sunday, April 22, 2018

Saturday, April 1, 2017