DATABASE DEBUNKINGS

Sunday, March 19, 2017

New Paper: The Interpretation and Representation of Database Relations

Follow @DBDebunk Follow @ThePostWest

The data management field cannot and will not progress without educated and informed users. Recently I announced UNDERSTANDING THE REAL RDM, a new series of papers that will

Offer to the data practitioner an accessible informal preview of David's work.
Contrast it with the the current common interpretation that emerged after EFC's passing and to demonstrate the practical implications of the differences.

What Is a True Relational System (and What It Is Not)

Follow @DBDebunk Follow @ThePostWest

(This is a rewrite of a 12/10/16 post, to bring it in line with McGoveran's interpretation of Codd's RDM.)

Here's what's wrong with last week's picture, namely:

"A quick-and-dirty definition for a relational database might be: a system whose users view data as a collection of tables related to each other through common data values.

The whole basis for the relational model follows this train of thought: data is stored in tables, which are composed of rows and columns. Tables of independent data can be linked, or related, to one another if they each have columns of data that represent the same data value, called keys. This concept is so common as to seem trivial; however, it was not so long ago that achieving and programming a system capable of sustaining the relational model was considered a longshot with limited usefulness.

If a vendor’s database product didn’t meet Codd’s 12 item litmus tests, then it was not a member of the club ... these rules determine whether the database engine itself can be considered truly “relational”. These rules were constructed to support a data model that would ensure the ACID properties of transactions and also eliminate a variety of data manipulation anomalies that frequently occurred on non-relational database platforms (and **still do**)." --Kevin Kline, SQLBlog.com

The Trouble with Data Warehouse Analytics

Follow @DBDebunk Follow @ThePostWest

You've probably heard the frequent argument that relational databases (which, unfortunately, in practice, means SQL ones) do not serve the performance, flexibility, and temporalization needs of analytical applications satisfactorily. Indeed, Anchor, Data Vault, and Dimensional Modeling techniques are promoted as solutions to the "problems" due to normalized databases. All this is rooted in certain fundamental misconceptions that can be costly for business intelligence, analytics, and data science.

This Week

Follow @DBDebunk Follow @ThePostWest

1. What's wrong with this picture

"A quick-and-dirty definition for a relational database might be: a system whose users view data as a collection of tables related to each other through common data values.The whole basis for the relational model follows this train of thought: data is stored in tables, which are composed of rows and columns. Tables of independent data can be linked, or related, to one another if they each have columns of data that represent the same data value, called keys. This concept is so common as to seem trivial; however, it was not so long ago that achieving and programming a system capable of sustaining the relational model was considered a longshot with limited usefulness." --Kevin Kline, SQLBlog.com

Simple Domains and Value Atomicity

Follow @DBDebunk Follow @ThePostWest

09/19/23: For the latest on this subject see: FIRST NORMAL FORM - A DEFINITIVE GUIDE

11/09/22: Revised

Here's what's wrong with last week's picture, namely:

Q: "I'm currently trying to design a database and I'm not too sure about the best way to approach a dynamically sized array field of one of my objects. My first thought is to use a column in my object to store an array of integers. However the more I read, the more I think this isn't the best option. Concrete example wise, I have a player object that stores 0 to many items, which are represented by an integer. What is the best way to represent this?"

A: "If a collection of values is atomic, store them together. Meaning, if you always care about the entire group, if you never search for nested values and never sort by nested values, then they should be stored together as a single field value. If not, they should be stored in a separate table, each value bring a row, each assigned the parent ID (foreign key) of a record on the other table that "owns" them as a group. For more info, search on the term "database normalization".

Some databases, support an array as a data type. For example, Postgres allows you to define a column as a one-dimension array, or even a two dimension array. If your database does not support array as a type of column definition, transform you data collection into an XML or JSON support if your database your database supports that type. For example, Postgres has basic support for storing, retrieving, and non-indexed searching of XML using XPath. And Postgres offers excellent industry-leading support for JSON as a data type including indexed support on nested values. Going this XML/JSON route can be an exception to the normalization rules I mentioned above." --StackOverflow.com

Focus on physical implementation ("dynamically sized array field") without well-defined conceptual and logical features it is supposed to represent ("a player object" is hardly enough) and confusion of levels of representation (a real world object does not "store" anything) are always a red flag, an indication of poor grasp of foundation knowledge. So let's introduce some.

This Week

Follow @DBDebunk Follow @ThePostWest

1. What's wrong with this picture

"If a collection of values is atomic, store them together. Meaning, if you always care about the entire group, if you never search for nested values and never sort by nested values, then they should be stored together as a single field value. If not, they should be stored in a separate table, each value bring a row, each assigned the parent ID (foreign key) of a record on the other table that "owns" them as a group. For more info, search on the term "database normalization".

Some databases, support an array as a data type. For example, Postgres allows you to define a column as a one-dimension array, or even a two dimension array. If your database does not support array as a type of column definition, transform you data collection into an XML or JSON support if your database your database supports that type. For example, Postgres has basic support for storing, retrieving, and non-indexed searching of XML using XPath. And Postgres offers excellent industry-leading support for JSON as a data type including indexed support on nested values. Going this XML/JSON route can be an exception to the normalization rules I mentioned above." --Response to the Quote of the Week listed next, StackOverflow.com

Re-Write

Follow @DBDebunk Follow @ThePostWest

See Meaning Criteria and Entity Supertype-Subtypes

POSTS

Sunday, March 19, 2017

New Paper: The Interpretation and Representation of Database Relations

Saturday, March 11, 2017

What Is a True Relational System (and What It Is Not)

Thursday, March 2, 2017

The Trouble with Data Warehouse Analytics

Saturday, February 25, 2017

This Week

1. What's wrong with this picture

Sunday, February 19, 2017

Simple Domains and Value Atomicity

Sunday, February 12, 2017

This Week

1. What's wrong with this picture

Sunday, February 5, 2017

Re-Write