In Part 1 we showed (yet again) how even those with their heart in the right (relational) place can't help being affected by the common and entrenched industry misconceptions, in this case about relationships, relations and tables. More often than not authors exhibit the very misconceptions they try to debunk.
We left the author distinguishing sets (with unordered, unique elements) from tables (lists of ordered, possibly duplicate rows).
------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics
column. The site was discontinued in 2018. The content here is not available
anywhere else, so if you deem it useful, particularly if you are a regular
reader, please help upkeep it by purchasing publications, or donating. On-site
seminars and consulting are available.Thank you.
LATEST UPDATES
-05/09/21 Re-posted the FUNDAMENTALS
page, the content of which had
mysteriously disappeared.
-03/15/21: Pruned the POSTS page
-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page
LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations,
Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE
REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition
of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper
#1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my
latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide
Mauri).
USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or
acronyms of the terms listed on the FUNDAMENTALS
page. For detailed
instructions on how to understand and use the labels in conjunction with the
that page, see the ABOUT
page. The 2017 and 2016 posts,
including earlier posts rewritten in 2017 were relabeled accordingly. As other
older posts are rewritten, they will also be relabeled. For all other older
posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns
to dbdebunk, within which only links to sources external to AllAnalytics may
work or not.
SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as
To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on
LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the
only universally acceptable hatred left – as the (traditional) response to the
existential crisis of decadence and decline of Western civilization
(including the US).
- @ThePostWest on Twitter where I comment on global
#Antisemitism/#AntiZionism and the Arab-Israeli conflict.
------------------------------------------------------------------------------------------------------------------
Logical Validity and Semantic Consistency
“The most important consequence of the fact that we are dealing with sets is that we have the entire set theory at our disposal. If we have sets, then we can have subsets, unions, intersections and — wait for it— Cartesian products!”IOW, the importance is access to set operations with which sets are manipulable mathematically to derive other sets (just like numbers are by arithmetics). One of those operations, Cartesian product (Cp), when applied to two sets produces a set, a subset of which is a relation. If a database consists of relations, a RDBMS can manipulate them mathematically and produce relations as results system-guaranteed to be logically valid.
But a mathematical relation is an abstract set -- it represents nothing in the real world:
- Is defined on "arbitrary" (i.e., mathematically set) nameless domains domains/attributes identifiable by order;
- Has tuples that are unique by mathematical definition: Cp tuples are unique combinations of the values of all its attributes).
Database relations are not abstract, they represent entity groups and must be constrained (restricted) for semantic consistency -- faithfulness to the conceptual model represented by the database, without loss of mathematical properties and logical validity of results. The RDM is an adaptation of the mathematical theory of relations for database management. PKs and insignificant order are only two of of several restrictions necessary.
- Domains are named and constrained to represent properties;
- Attributes are named and constrained to represent properties in specific group contexts;
- Tuples are PK-constrained to represent (facts about) entities that are distinguishable in the real world;
- Relations are constrained
- individually to represent within-group relationships between properties and entities; and,
- collectively to represent cross-group relationships.
Note: We usually do not represent in databases all the properties of entities that jointly render them unique, for which reason and more convenient identification we assign entities unique names represented by PKs.
The relational algebra/calculus are also adaptations of mathematical set operations to database management. For example, join is a combination of primitive Cp, selection and projection that is useful for querying databases and is defined to work correctly with relations that represent groups of entities of a single type (i.e., are by definition in 5NF).
System-guaranteed logical validity and by-design semantic consistency jointly comprise correctness of inferences from databases (i.e., query results) that the RDM confers, together with decidability of data sublanguage, physical and logical independence and more.
We prefer the term R-table for a table that visualizes a relation, a data body of which (relations don't have headers) obey all semantic constraints on the relation. Note that the constraints are not visible in a R-table.
Missing Data and NULLs
“Now, what about the NULL value that is present in relational databases? The NULL which means a lack of value can be thought of as an implicit member of every domain, unless you have a NOT NULL restriction. We won’t talk about NULL in the next examples for the sake of simplicity and decency.”A lack of value is represented by a NULL ... value? This language common in the industry reveals the confusion about missing data. Relational databases do not have NULLs, SQL databases do, and they should not be confused either.
Note: In fact, Date did propose a missing data scheme whereby default values were part of every domain, but he renounced it. For a proposed relational scheme without NULLs see The Last NULL in the Coffin.
Relational and Computational Completeness
“Most people who think in terms of tables think about it imperatively ... But is it the right way to think about that problem? We’re not using the relational model to our advantage. We need to think declaratively instead of imperatively. We need to explain what we want, not how to get it. But it’s hard to think declaratively if we ignore all of the relational model and set theory because we don’t really have good words to explain the result that we want to get. We don’t have the right language.”Let me rephrase this: R-tables per se are not the main inducer of imperative (i.e., procedural) thinking, programming is. It also contributes to the failure to understand and appreciate the important difference between a relationally complete data sublanguage and a computationally complete programming language.
“In my opinions there are three main subjects that need to be known to understand relational databases:
- set theory
- predicate calculus
- relational algebra
Without it you will never truly appreciate the relational model.”
Could not agree more, but while necessary, they are not sufficient. What is also crucial to understand, but practitioners do not, is that the RDM is applied theory -- an adaptation of the theory to database practice. And understanding of neither the theory, nor its adaptations are achieved via experience with industry tools -- certainly not SQL, let alone non-SQL DBMSs -- but foundational education, which practitioners have long stopped undergoing. Academic courses have become industry tool trainers and even when the rare one covers the theory it does not convey the adaptations and their advantages.
Here's an excellent example.
No comments:
Post a Comment