Follow @DBDebunk
Follow @ThePostWest
See: Physical Independence Part 1: Don't Mix Model with Implementation
Monday, April 17, 2017
Saturday, April 15, 2017
This Week
Follow @DBDebunk
Follow @ThePostWest
Database Truth of the Week
"... systems of operations on data are most effective when they are formalisms, in which semantic considerations are unimportant until the formalism is applied to some specific application. In this way, database processing can join the ranks of successful mathematical abstractions. Differential equations, for instance, can be applied to situations ranging from orbit calculations to the quantum mechanics of the atom. The semantics of each application is unique to that application, but the formalism of differential equations is common. The power of the formalism lies in its abstraction from issues of meaning." --H. T. Merrett, Extending the Relational Algebra to Capture Less Meaning
Saturday, April 1, 2017
"NULL Value" is a Contradiction in Terms
Follow @DBDebunk
Follow @ThePostWest
There is nothing wrong with Hugo Kornelis' picture of SQL NULL in NULL: The database's black hole. In fact, I recommend the series of which it is one part. It's SQL's picture of how to treat missing data that's wrong.
There is nothing wrong with Hugo Kornelis' picture of SQL NULL in NULL: The database's black hole. In fact, I recommend the series of which it is one part. It's SQL's picture of how to treat missing data that's wrong.
"Let’s first take a look at what NULL is supposed to be. Here is the definition of NULL from the SQL-2003 standard: null value--A special value that is used to indicate the absence of any data value."
While the absence of a value may be represented by some value at the implementation level, I strongly recommend users not think of NULL as any kind of value at the model level. The problems with NULL stem precisely from the fact that it is not a value, but rather a marker for the absence of a value. NULL value is a contradiction in terms that distracts from the problems.
Sunday, March 26, 2017
This Week
Follow @DBDebunk
Follow @ThePostWest
1. What's wrong with this picture?
"Things get more complex when NULLable columns are used in expressions and predicates. In a procedural language, this wouldn’t have been a problem--if a procedural program fails to find the information it needs, it enters a conditional branch to handle this situation, as defined by the programmer. In a declarative, set-based language such as SQL, this was not possible. The alternatives were either to have the SQL developer add conditional expressions for each nullable column in a query to handle missing data, or to define a decent default behavior in SQL for missing data so that developers only have to write explicit conditional expressions if they need to override the default behavior." Hugo Kornelis, NULL - The database's black hole.
(Nothing wrong with Hugo's picture--in fact, I highly recommend the series of which the source of this quote is one part--only with SQL's picture of relational treatment of missing data).
Sunday, March 19, 2017
New Paper: The Interpretation and Representation of Database Relations
Follow @DBDebunk
Follow @ThePostWest
The data management field cannot and will not progress without educated and informed users. Recently I announced UNDERSTANDING THE REAL RDM, a new series of papers that will
The data management field cannot and will not progress without educated and informed users. Recently I announced UNDERSTANDING THE REAL RDM, a new series of papers that will
- Offer to the data practitioner an accessible informal preview of David's work.
- Contrast it with the the current common interpretation that emerged after EFC's passing and to demonstrate the practical implications of the differences.
Saturday, March 11, 2017
What Is a True Relational System (and What It Is Not)
Follow @DBDebunk
Follow @ThePostWest
(This is a rewrite of a 12/10/16 post, to bring it in line with McGoveran's interpretation of Codd's RDM.)
Here's what's wrong with last week's picture, namely:
(This is a rewrite of a 12/10/16 post, to bring it in line with McGoveran's interpretation of Codd's RDM.)
Here's what's wrong with last week's picture, namely:
"A quick-and-dirty definition for a relational database might be: a system whose users view data as a collection of tables related to each other through common data values.
The whole basis for the relational model follows this train of thought: data is stored in tables, which are composed of rows and columns. Tables of independent data can be linked, or related, to one another if they each have columns of data that represent the same data value, called keys. This concept is so common as to seem trivial; however, it was not so long ago that achieving and programming a system capable of sustaining the relational model was considered a longshot with limited usefulness.
If a vendor’s database product didn’t meet Codd’s 12 item litmus tests, then it was not a member of the club ... these rules determine whether the database engine itself can be considered truly “relational”. These rules were constructed to support a data model that would ensure the ACID properties of transactions and also eliminate a variety of data manipulation anomalies that frequently occurred on non-relational database platforms (and **still do**)." --Kevin Kline, SQLBlog.com
Thursday, March 2, 2017
The Trouble with Data Warehouse Analytics
Follow @DBDebunk
Follow @ThePostWest
You've probably heard the frequent argument that relational databases (which, unfortunately, in practice, means SQL ones) do not serve the performance, flexibility, and temporalization needs of analytical applications satisfactorily. Indeed, Anchor, Data Vault, and Dimensional Modeling techniques are promoted as solutions to the "problems" due to normalized databases. All this is rooted in certain fundamental misconceptions that can be costly for business intelligence, analytics, and data science.
You've probably heard the frequent argument that relational databases (which, unfortunately, in practice, means SQL ones) do not serve the performance, flexibility, and temporalization needs of analytical applications satisfactorily. Indeed, Anchor, Data Vault, and Dimensional Modeling techniques are promoted as solutions to the "problems" due to normalized databases. All this is rooted in certain fundamental misconceptions that can be costly for business intelligence, analytics, and data science.
Subscribe to:
Posts (Atom)