Wednesday, August 29, 2018

DISTINCT and ORDER BY Are Not Relational




“One of the things that confuse SQL users all the time is how DISTINCT and ORDER BY are related in a SQL query ... most people quickly understand:
   
SELECT DISTINCT length
FROM film

[that] returns results in an arbitrary order, because the database can (and might apply hashing rather than ordering to remove duplicates) ... Most people also understand:
   
SELECT length
FROM film
ORDER BY length

[that] will give us duplicates, but in order ... And, of course, we can combine the two:
   
SELECT DISTINCT length
FROM film
ORDER BY length

[But if] somewhat intuitively, we may want to order the lengths differently, e.g. by title:
   
SELECT DISTINCT length
FROM film
ORDER BY title

[m]ost databases [sic] fail this query with an exception like Oracle’s:

ORA-01791: not a SELECTed expression

At first sight ... this

SELECT length
FROM film
ORDER BY title

works after all ... So, how are these different? We have to rewind and check out the logical order of SQL operations (as opposed to the syntactic order). And always remember, this is the logical order, not the actual order executed by the optimiser.”
--How SQL DISTINCT and ORDER BY are Related, Jooq.org

Wednesday, August 15, 2018

Order Is For Society, Not Databases




8/18/18: I have re-written this post for a better explanation. If you read it prior to the revision, you should re-read it.
 
“I learned that there is no concept of order in terms of tuples (e.g. rows) in a table, but according to wikipedia "a tuple is an ordered list of elements". Does that mean that attributes do have an order? If yes why would they be treated differently, couldn't one add another column to a table (which is why the tuples don't have order)? [OTOH], "In this notation, attribute–value pairs may appear in any order." Does this mean attributes have no order?”
--Do the “columns” in a table in a RMDB have order?
“Is it possible to reorder rows in SQL database? For example, how can I swap the order of 2nd row and 3rd row's values? The order of the row is important to me since i need to display the value according to the order [and] 'Order by' won't work for me. For example, I put a list of bookmarks in database. I want to display based on the result I get from query. (not in alphabet order). Just when they are inserted. But user may re-arrange the position of the bookmark (in any way he/she wants). So I can't use 'order by'. An example is how the bookmark display in the bookmark in firefox. User can switch position easily. How can I mention that in DB?”
--How can I reorder rows in sql database

While some data professionals may know that rows and columns of "database tables" are "unordered", few of them know what that means, and understand why. This is due to two, not unrelated, of the many common misconceptions[1] rooted in the lack of foundation knowledge in the industry, namely that relational databases consist of tables[2], and logical-physical confusion (LPC)[3]. They obscure understanding of the RDM and its practical implications, which is reflected in the answers to the above questions. Instead of debunking them, this post fills the gap in knowledge such that you can debunk them yourself -- try it before and after you read it.



Sunday, August 5, 2018

No Such Thing As "Primary Key Tuning"




“The choice of good InnoDB primary keys is a critical performance tuning decision. This post will guide you through the steps of choosing the best primary key depending on your workload ... You would be surprised how many times I had to explain the importance of primary keys and how many debates I had around the topic as often people have preconceived ideas that translate into doing things a certain way without further thinking.”
--Yves Trudeau, Principal architect, Percona.com

I will be labeled "pedantic" and a "purist" for saying this, but there is no such thing as "PK tuning". "Choice of good PK for performance" reflects logical-physical confusion (LPC), which, in turn, is rooted in lack of foundation knowledge, and failure to grasp the RDM -- not just by database practitioners, but even by DBMS designers.

Monday, July 30, 2018

Lenin, Trotsky, Data Management, and the Tyranny of Knowledge and Reason -- Version 2




14 years ago I published an editorial with this title in response to a Slashdot.com exchange triggered by one of my articles at DBAzine.com. The objective was to illustrate some of the damage done by the collapse of education to the IT industry, reflecting a trend in the society  as a whole. Things have only gotten worse since then, and for quite a while I considered a re-write. A recent Linkedin exchange provided the impetus, so here it is.

Monday, July 23, 2018

Industry Practice Is No Substitute for Foundation Knowledge




“A short time ago a colleague asked me where he could find a "Databases 101" guide for the non-technical professional. As it turns out, the internet is littered with information, and mis-information, regarding data and databases. This makes it difficult for someone entering the field of data and databases to understand the bigger picture. He wanted something to help make sense of data and databases, specifically relational versus NoSQL. So, that’s what I decided to work on. This is the start of my Databases 101 guide for the non-database professional; something to help anyone understand why the word “database” is an overloaded term. We use the term "database" to describe a great many things ... I use the terms database and database engine interchangeably in this article.”
                            --Databases 101, ThomasLaRock.com

To “understand the bigger picture”, and “make sense of data and databases” requires foundation knowledge. In its absence, efforts to educate defeat their own purpose, because they are grounded in the very misinformation they should be correcting. This is the mechanism by which misconceptions[1] are perpetuated and reinforced.

Sunday, July 15, 2018

Understanding Relations Part 3: Debunking Conventional Wisdom




(See Part 1 and Part 2)

“A common term used in database design is a "relational database" -- but a database relation is not the same thing and does not imply, as its name suggests, a relationship between tables. Rather, a database relation simply refers to an individual table in a relational database. In a relational database, the table is a relation because it stores the relation between data in its column-row format. The columns are the table's attributes, while the rows represent the data records. A single row is known as a tuple to database designers.”
“A relation, or table, in a relational database has certain properties.”

“First off, its name must be unique in the database, i.e. a database cannot contain multiple tables of the same name.”

“Next ... as with the table names, no attributes can have the same name.”

“Next, no tuple (or row) can be a duplicate. In practice, a database might actually contain duplicate rows, but there should be practices in place to avoid this, such as the use of unique primary keys (next up). Given that a tuple cannot be a duplicate, it follows that a relation must contain at least one attribute (or column) that identifies each tuple (or row) uniquely. This is usually the primary key. This primary key cannot be duplicated. This means that no tuple can have the same unique, primary key. The key cannot have a NULL value, which simply means that the value must be known.”

“Further, each cell, or field, must contain a single value. For example, you cannot enter something like "Tom Smith" and expect the database to understand that you have a first and last name; rather, the database will understand that the value of that cell is exactly what has been entered.”

“Finally, all attributes—or columns—must be of the same domain, meaning that they must have the same data type. You cannot mix a string and a number in a single cell.”

“All these properties, or constraints, serve to ensure data integrity, important to maintain the accuracy of data.”
  --Definition of Database Relation

It is easy to discern when explanations of relational features are not grounded in the formal foundations of the RDM[1], but in industry practices. Here are some further clarifications and corrections.
 

Wednesday, July 4, 2018

N-ary vs. Binary Relations: Anatomy of a Compound Misconception




Note: This is a re-write of an older post, to bring it into line with the McGoveran formalization and interpretation [1] of Codd's true RDM, which includes refinements, corrections, and extensions of his own.
“Codd has based his model on n-ary relations and that is the key mistake he has made; that leads to complex structure (absolutely not necessary) and situations where there are no values known and as a consequence the need of the concept we know too well -> the null pointers; binary relations (smallest possible) are sufficient to express any predicate/sententional formula and there is no possibility to have something like null; if a value is unknown then we do not know it thus it is not a fact for us thus it is not in our database.”
We suggest you test your foundation knowledge by figuring what's wrong with this picture before you proceed.
View My Stats