DATABASE DEBUNKINGS

Thursday, December 24, 2020

Happy Holiday Season

Follow @DBDebunk Follow @ThePostWest

To my readers and their families: stay safe!

Thursday, December 3, 2020

OBG: Missing Data -- "Horizontal Decomposition" Part 3

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate correctness and stability of a sound foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing "oldies but goodies" from the old DBDebunk (2000-06), so that you can judge for yourself how well my then arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may break long pieces into multiple posts, revise, and/or add comments and references.

In Part 1 we re-published a reader's comments on "horizontal decomposition" -- Hugh Darwen's proposal on How to Handle Missing Information without Using NULLs relative to our The Final NULL in the Coffin: A Relational Solution to Missing Data; In Part 2 we re-published Darwen's response. Here's my reply revised for consistency with the current state of knowledge.

OBG: Missing Data -- "Horizontal Decomposition" Part 2

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability of a sound foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may break long pieces into multiple posts, revise, and/or add comments and references.

In Part 1 we re-published a reader's response to "horizontal decomposition" -- Hugh Darwen's How to Handle Missing Information without Using NULLs -- in comparison to our The Final NULL in the Coffin: A Relational Solution to Missing Data). Here's Hugh's response.

OBG: Missing Data -- "Horizontal Decomposition" Part 1

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability of a sound foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may break long pieces into multiple posts, revise, and/or add comments and references.

“I'm excited to share a data.world research partnership with Prof Leonid Libkin and Paolo Guagliardo from The University of Edinburgh. Our goal is to understand how NULL values are used in the real word to bridge theory and practice. Please help us by participating in a survey.”

Thus a recent announcement on LinkedIn, which triggered reactions in praise of this "much needed effort".

Sigh! SQL's NULL is a blunder unworthy of research. The commonly used "NULL value" is a contradiction in terms, indicating that industry surveys are not a path to enlightening. The real issue is, of course, missing data, which is governed by long studied and well understood logic[1,2,3,4], though apparently not in the industry and today's academia.

In 2004 we published The Final NULL in the Coffin: A Relational Solution to Missing Data (a paper revised since) that we believe is theoretically sound and, importantly, consistent with McGoveran's work re-interpreting, extending and formalizing Codd's RDM[5]. At the time it generated a series of exchanges with readers, which were posted at the old DBDebunk (2000-2006). In light of the above they warrant re-production.

I start with the first, split in three parts: In this Part 1 a reader's reaction to both our solution and Hugh Darwen's "horizontal decomposition" alternative, How to Handle Missing Information without Using NULLs; Hugh's reply is in Part 2 and mine -- re-written to bring up to date with current state of knowledge and for clarity -- is in Part 3.

Note: In a later book Darwen dedicated a chapter to a "multi-relation" approach which seems an allusion to our solution.

OBG: Database Design and Guaranteed Correctness Part 1

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability of database designs provided by a sound foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" post from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions they were intended to dispel (I may break long pieces into multiple posts, and add comments and references).

Understanding Codd's 12 Rules for RDBMS

Follow @DBDebunk Follow @ThePostWest

In response to an online publication of a book appendix regurgitating Codd's 12 famous rules (some of which were, typically, incorrect[1]) I posted earlier a clarification of the rules. This is a revision thereof for better consistency with the new understanding of the RDM based on McGoveran's re-interpretation, extension and formalization[2] of Codd's work.

TYFK: Nothing to Do With Relational

Follow @DBDebunk Follow @ThePostWest

Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“The relational calculus is good in describing sets. But it´s bad at describing relations between data in different sets. Explicit identities (primary keys) need to be introduced and normalization is needed to avoid update inconsistencies due to duplication of data. To say it somewhat bluntly: The problem with the relational calculus and RDBMS etc. is the focus on data. It´s seems to be so important to store the data, that connecting the data moves to the background. That might be close to how we store filled in paper forms. But it´s so unlike how the mind works. There is no data stored in your brain. If you look at the fridge in your kitchen, there is no tiny fridge created in your brain so you can take the memory of your fridge with you, when you leave your kitchen.” --Weblogs.asp.net

The lack of foundation knowledge exposed by the above paragraph is so complete that its claims are practically upside down and backwards.

Fundamentals

As we have demonstrated, in mathematical set theory a relation (set) is a subset of a cross-product of domains (sets). In other words, it is a set that is a relationship among sets. Being abstract (i.e., having no real world meaning), the values of mathematical relations can be arbitrary.

The RDM is an application of simple set theory expressible in first order predicate logic (SST/FOPL) to database management: a relational database represents a conceptual model of some reality, namely (facts about) a multigroup in the real world -- a collection of related entity groups -- each database relation representing one such group; a database is also a set of related relations. The values in database relations (i.e., the data) are, thus, not arbitrary, but must be consistent with the conceptual model: relations and the database as a whole are semantically constrainted to be so consistent: (1) individual properties of entities and (2) collective properties of (a) groups (i.e., relationships among entities within groups), and (b) the multigroup (i.e., relationships among groups).

A primary key (PK) represents names given in the real world to entities of a given type, and the corresponding PK constraint (uniqueness) enforces consistency of a relation with the distinguishability of those entities in the real world, the facts about which it represents. These are not RDM artifacts, but rather part of the adaptation of SST/FOPL to database management.

For the primary advantage of the RDM -- guaranteed correctness of query results (i.e., inferences made from the database) -- to materialize, logical database design must adhere to three core principles which, jointly, imply fully normalized relations (5NF). In fact, in RDM relations are in 5NF by definition, otherwise they are not relations -- relational algebra (RA) operations lose information and all bets are off.

The RA is the manipulative component of the RDM -- a collection of primitive and derived set operations on relations that describe relationships among relations. For example, the join operation r1 JOIN r2 describes a relationship between r1 and r2 relation, the result itself a relation. Note that since every result of a RA operation on even one relation is always a relation and still describes a relationship -- between the "input" and "output" relations.

A data model -- and, industry claims notwithstanding, the only one satisfying Codd's definition that has been formalized is the RDM -- is by nature focused on data. However, the RDM supports physical independence (PI) and, thus, not concerned with how data is physically stored and accessed. The notion of "files stored in paper form" is an example of the common and entrenched logical-physical confusion (LPC) due to failure to understand the distinction between a logical relation and its tabular visualization on a physical medium, induced/reinforced by the industry's "direct image" implementation of SQL DBMSs.

Conclusion

We rephrase the above paragraph as follows:

“The relational algebra describes relationships among relations (sets). Primary keys are one of the adaptations of the SST/FOPL for database management: a PK constraint -- uniqueness -- represents formally in the database a within-group relationship among all its entities.

Mandatory adherence to three core design principles jointly imply full normalization, which is necessary to guarantees correctness of query results. True RDBMSs:

Implement the RA for logical data retrieval independent of how the data is physically stored and accessed. SQL DBMSs notwithstanding, vendors are free to store data whichever way they want as long as they don't expose it to users in applications.
Enforce relational constraints that are formal database representations of relationships in the conceptual model represented by the database.”

The "brain" stuff is sheer nonsense.

POSTS

Thursday, December 24, 2020

Happy Holiday Season

Thursday, December 3, 2020

OBG: Missing Data -- "Horizontal Decomposition" Part 3

Friday, November 27, 2020

OBG: Missing Data -- "Horizontal Decomposition" Part 2

Sunday, November 22, 2020

OBG: Missing Data -- "Horizontal Decomposition" Part 1

Thursday, October 29, 2020

OBG: Database Design and Guaranteed Correctness Part 1

Saturday, October 17, 2020

Understanding Codd's 12 Rules for RDBMS

Saturday, September 26, 2020

TYFK: Nothing to Do With Relational

Fundamentals

Conclusion