1. What's wrong with this picture?
"Things get more complex when NULLable columns are used in expressions and predicates. In a procedural language, this wouldn’t have been a problem--if a procedural program fails to find the information it needs, it enters a conditional branch to handle this situation, as defined by the programmer. In a declarative, set-based language such as SQL, this was not possible. The alternatives were either to have the SQL developer add conditional expressions for each nullable column in a query to handle missing data, or to define a decent default behavior in SQL for missing data so that developers only have to write explicit conditional expressions if they need to override the default behavior." Hugo Kornelis, NULL - The database's black hole.
(Nothing wrong with Hugo's picture--in fact, I highly recommend the series of which the source of this quote is one part--only with SQL's picture of relational treatment of missing data).
The data management field cannot and will not progress without educated and informed users. Recently I announced UNDERSTANDING THE REAL RDM, a new series of papers that will
- Offer to the data practitioner an accessible informal preview of David's work.
- Contrast
it with the the current common interpretation that emerged after EFC's
passing and to demonstrate the practical implications of the
differences.
(This is a rewrite of a 12/10/16 post, to bring it in line with McGoveran's interpretation of Codd's RDM.)
Here's what's wrong with last week's picture, namely:
"A
quick-and-dirty definition for a relational database might be: a system
whose users view data as a collection of tables related to each other
through common data values.
The whole basis for the
relational model follows this train of thought: data is stored in
tables, which are composed of rows and columns. Tables of independent
data can be linked, or related, to one another if they each have columns
of data that represent the same data value, called keys. This concept
is so common as to seem trivial; however, it was not so long ago that
achieving and programming a system capable of sustaining the relational
model was considered a longshot with limited usefulness.
If
a vendor’s database product didn’t meet Codd’s 12 item litmus tests,
then it was not a member of the club ... these rules determine whether
the database engine itself can be considered truly “relational”. These
rules were constructed to support a data model that would ensure the
ACID properties of transactions and also eliminate a variety of data
manipulation anomalies that frequently occurred on non-relational
database platforms (and **still do**)." --Kevin Kline, SQLBlog.com
You've probably heard the frequent argument that relational databases
(which, unfortunately, in practice, means SQL ones) do not serve the
performance, flexibility, and temporalization needs of analytical
applications satisfactorily. Indeed, Anchor, Data Vault, and Dimensional
Modeling techniques are promoted as solutions to the "problems" due to
normalized databases. All this is rooted in certain fundamental
misconceptions that can be costly for business intelligence, analytics,
and data science.