DATABASE DEBUNKINGS

Saturday, September 29, 2018

Understanding Conceptual vs.Data Modeling Part 2: E/RM Models Reality, RDM Models Data

Follow @DBDebunk Follow @ThePostWest

Re-write 10/17/18
Revised 11/1/18

In Part 1 we explained that when the RDM and the E/RM were introduced, the distinct conceptual-logical-physical levels of representation had not yet emerged, and a data model had not yet been formally defined. But in 1980 Codd defined a formal data model as a combination of (1) data structures, (2) integrity constraints, and (3) operators on the structures[1], and later on the three-fold trinity of levels came into being. Given a conceptual level distinct from the logical, do the RDM and the E/RM satisfy the definition -- are they data models in today's terms?

Recall from Part 1 that the RDM has all three components and is defined in purely logical terms, so it is a data model. But the E/RM definition intermingles conceptual and logical terminology, and therefore is not consistent with two distinct levels. Moreover, as a data model E/RM is incomplete:

“The E/RM is not a data model as formally defined by Codd: no explicit structural component except sets classified in various ways, no explicit manipulative component except implied set operations, and very limited integrity (keys).”

--David McGoveran

Contrary to claims, Date does not exactly say that the E/RM is a data model:

“[It] is not even clear that the E/R "model" is truly a data model at all, at least in the sense in which we have been using that term in this book so far (i.e., as a formal system involving structural, integrity, and manipulative aspects). Certainly the term "E/R modeling" is usually taken to mean the process of deciding the structure (only) of the database, although [it does deal with] certain integrity aspects also, mostly having to do with keys ... However, a charitable reading of [Chen's original E/RM paper] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the relational model (it is certainly not a candidate for replacing the relational model, as some have suggested).”[2]

Note that even if, charitably, the E/RM is considered a data model, it is not up to the RDM.

Don't Conflate/Confuse Primary Keys, PK Constraints, and Indexes

Follow @DBDebunk Follow @ThePostWest

“What is the difference between an index and a key? How are they related?”

“There seams to be some confusion between what a Primary Key is, and what an Index is and how they are used. The Primary Key is a logical object. By that I mean that is simply defines a set of properties on one column or a set of columns to require that the columns which make up the primary key are unique and that none of them are null. Because they are unique and not null, these values (or value if your primary key is a single column) can then be used to identify a single row in the table every time. In most if not all database platforms the Primary Key will have an index created on it. An index on the other hand doesn’t define uniqueness. An index is used to more quickly find rows in the table based on the values which are part of the index. When you create an index within the database, you are creating a physical object which is being saved to disk.”

“A primary key by default creates a clustered index. A unique constraint/key by default creates a non-clustered index.”

“An index is a (logically) ordered list of rows. For example, an index on LastName means all values are already sorted in LastName order. Usually index rows contain far fewer columns in them than the table itself (except the clustered index, which is the table). A key is a column or columns that defines the order of an index. For example, on an index ordered by (LastName,FirstName), then LastName and FirstName are the keys. Btw, a primary key is a physical object, not a logical one. The db engine needs physical rows in order to insure unique values in the index.”

--Difference between an index and a key?, SQLTeam.com

I have recently published a paper[1], and posted a multipart series[2] on relational keys. In the latter I stated as follows:

"As a relational feature, keys can only be properly understood within the formal foundation of the RDM, which is simple set theory (SST) expressible in first order predicate logic (FOPL) adapted and applied to database management. Yet that is precisely what is ignored and dismissed in the industry -- including by the authors of SQL[3]."

I have also written extensively on widespread logical-physical confusion (LPC)[4], recently specifically in the key-index context[5]. The replies above are examples -- if any more were needed -- that validate my repeated claim of lack of foundation knowledge in the industry -- can you tell what's wrong with, and what's correct in, them?

RE-WRITE

Follow @DBDebunk Follow @ThePostWest

See: https://www.dbdebunk.com/2018/09/designation-property-and-assertion.html

Wednesday, August 29, 2018

DISTINCT and ORDER BY Are Not Relational

Follow @DBDebunk Follow @ThePostWest

“One of the things that confuse SQL users all the time is how DISTINCT and ORDER BY are related in a SQL query ... most people quickly understand:


SELECT DISTINCT length
FROM film

[that] returns results in an arbitrary order, because the database can (and might apply hashing rather than ordering to remove duplicates) ... Most people also understand:


SELECT length
FROM film
ORDER BY length

[that] will give us duplicates, but in order ... And, of course, we can combine the two:


SELECT DISTINCT length
FROM film
ORDER BY length

[But if] somewhat intuitively, we may want to order the lengths differently, e.g. by title:


SELECT DISTINCT length
FROM film
ORDER BY title

[m]ost databases [sic] fail this query with an exception like Oracle’s:

ORA-01791: not a SELECTed expression

At first sight ... this

SELECT length
FROM film
ORDER BY title

works after all ... So, how are these different? We have to rewind and check out the logical order of SQL operations (as opposed to the syntactic order). And always remember, this is the logical order, not the actual order executed by the optimiser.”

--How SQL DISTINCT and ORDER BY are Related, Jooq.org

Order Is For Society, Not Databases

Follow @DBDebunk Follow @ThePostWest

8/18/18: I have re-written this post for a better explanation. If you read it prior to the revision, you should re-read it.

“I learned that there is no concept of order in terms of tuples (e.g. rows) in a table, but according to wikipedia "a tuple is an ordered list of elements". Does that mean that attributes do have an order? If yes why would they be treated differently, couldn't one add another column to a table (which is why the tuples don't have order)? [OTOH], "In this notation, attribute–value pairs may appear in any order." Does this mean attributes have no order?”

--Do the “columns” in a table in a RMDB have order?

“Is it possible to reorder rows in SQL database? For example, how can I swap the order of 2nd row and 3rd row's values? The order of the row is important to me since i need to display the value according to the order [and] 'Order by' won't work for me. For example, I put a list of bookmarks in database. I want to display based on the result I get from query. (not in alphabet order). Just when they are inserted. But user may re-arrange the position of the bookmark (in any way he/she wants). So I can't use 'order by'. An example is how the bookmark display in the bookmark in firefox. User can switch position easily. How can I mention that in DB?”

--How can I reorder rows in sql database

While some data professionals may know that rows and columns of "database tables" are "unordered", few of them know what that means, and understand why. This is due to two, not unrelated, of the many common misconceptions[1] rooted in the lack of foundation knowledge in the industry, namely that relational databases consist of tables[2], and logical-physical confusion (LPC)[3]. They obscure understanding of the RDM and its practical implications, which is reflected in the answers to the above questions. Instead of debunking them, this post fills the gap in knowledge such that you can debunk them yourself -- try it before and after you read it.

No Such Thing As "Primary Key Tuning"

Follow @DBDebunk Follow @ThePostWest

“The choice of good InnoDB primary keys is a critical performance tuning decision. This post will guide you through the steps of choosing the best primary key depending on your workload ... You would be surprised how many times I had to explain the importance of primary keys and how many debates I had around the topic as often people have preconceived ideas that translate into doing things a certain way without further thinking.”

--Yves Trudeau, Principal architect, Percona.com

I will be labeled "pedantic" and a "purist" for saying this, but there is no such thing as "PK tuning". "Choice of good PK for performance" reflects logical-physical confusion (LPC), which, in turn, is rooted in lack of foundation knowledge, and failure to grasp the RDM -- not just by database practitioners, but even by DBMS designers.

Lenin, Trotsky, Data Management, and the Tyranny of Knowledge and Reason -- Version 2

Follow @DBDebunk Follow @ThePostWest

14 years ago I published an editorial with this title in response to a Slashdot.com exchange triggered by one of my articles at DBAzine.com. The objective was to illustrate some of the damage done by the collapse of education to the IT industry, reflecting a trend in the society as a whole. Things have only gotten worse since then, and for quite a while I considered a re-write. A recent Linkedin exchange provided the impetus, so here it is.

POSTS

Saturday, September 29, 2018

Understanding Conceptual vs.Data Modeling Part 2: E/RM Models Reality, RDM Models Data

Tuesday, September 18, 2018

Don't Conflate/Confuse Primary Keys, PK Constraints, and Indexes

Tuesday, September 11, 2018

RE-WRITE

Wednesday, August 29, 2018

DISTINCT and ORDER BY Are Not Relational

Wednesday, August 15, 2018

Order Is For Society, Not Databases

Sunday, August 5, 2018

No Such Thing As "Primary Key Tuning"

Monday, July 30, 2018

Lenin, Trotsky, Data Management, and the Tyranny of Knowledge and Reason -- Version 2