Friday, September 22, 2017

This Week



1. Database Truth of the Week

“If the data sub-language ... has the power of second order predicate logic (SOPL), expressions are possible that cannot be evaluated (for example, self-referencing expressions) and the formal language is then undecidable, an algorithm to implement a declarative query language is impossible and all hope of physical independence is lost." --David McGoveran


2. What's Wrong With This Database Picture?

"Our terminology is broken beyond repair. [Let me] point out some problems with Date's use of terminology, specifically in two cases.
"type" = "domain": I fully understand why one might equate "type" and "domain", but ... in today's programming practice, "type" and "domain" are quite different. The word "type" is largely tied to system-level (or "physical"-level) definitions of data, while a "domain" is thought of as an abstract set of acceptable values.

"class" != "relvar": In simple terms, the word "class" applies to a collection of values allowed by a predicate, regardless of whether such a collection could actually exist. Every set has a corresponding class, although a class may have no corresponding set ... in mathematical logic, a "relation" is a "class" (and trivially also a "set"), which contributes to confusion.
In modern programming parlance "class" is generally distinguished from "type" only in that "type" refers to "primitive" (system-defined) data definitions while "class" refers to higher-level (user-defined) data definitions. This distinction is almost arbitrary, and in some contexts, "type" and "class" are actually synonymous."

Sunday, September 17, 2017

Database Management: No Progress Without Data Fundamentals



I have recently -- yet again -- been accused in a LinkedIn exchange  of "gibberish without any evidence" and of claiming that "nobody know what they're doing" with databases. I will leave it to readers to judge whether (1) five decades worth of writings and teaching is "no evidence" and (2) my comments in the exchange are gibberish. Here I would like to dare anybody to find claims to that effect in any of my pronouncements. What I did, do and will say is that most data professionals do not know and understand data and relational fundamentals -- an incontrovertible fact proved not just by me[1], but also by others[2,3] and that this inhibits real progress in database management. 

As I wrote two weeks ago:
"The RDM put database management on a formal, scientific foot. Consequently, tool experience and relational terminology are insufficient -- foundation knowledge is necessary. Unfortunately, most data professionals do not possess it, in part because they have been misled by the industry and in part because few go through an education -- as distinct from training -- program that teaches the RDM and teaches it correctly. Consequently, even those with the heart in the right place defend the RDM without a full understanding, their views distorted by what passes for it (stay tuned for a debunking of such a recent example)."
I will now fulfill the promise by debunking just such a "heart-in-the-right-place" defense of the RDM. 

Sunday, September 10, 2017

This Week



1. Database Truth of the Week

“A network is a directed acyclic graph (the "direction" of the transitive relationship) and, thus, amenable to transitive closure (TC). In the Relational Data Model (RDM) that usually means the smallest set that includes all the members that satisfy the transitive relationship in question (for the count of each object type the closure is computed and the count ignores level). While the Relational Data Model (RDM) can handle an important subset of graph theory via special graph domain operators and extensions to the original relational operators, which could be made efficient, it is a very difficult problem. Certain computations on finite sets such as TC are not in general computable in a language based on first order predicate logic (FOPL) that is declarative, decidable and supports physical independence (PI) -- a core relational objective. They require a computationally complete language (CCL) that is imperative and recursive.
A ‘TC function’ can be implemented using a host CCL that returns its result in the form of a relation; then a symbol (i.e., pure syntax) of type relation can be defined in relational algebra that references/invokes that function. From within the algebra it appears to be just a relation and is up to the user to understand what the value of the returned relation means --i.e., that it represents the TC. That understanding/interpretation is outside the algebra and passed to users only via documentation (e.g., some meta-language).” --David McGoveran


2. What's Wrong With This Database Picture?

"I don’t like talking about the relational theory of data. It is absolutely fundamental to any deep understanding of data, but most practitioners get along fine without it. It’s more the implementers of database management systems (DBMSs) who need to understand relational theory, so teaching relational theory to ordinary practitioners is a bit like tormenting people with irrelevant theory before you let them get on with the business at hand. Moreover, some of those who understand relational theory use their knowledge to beat other people over the head with it. I don’t want to be associated with that high-handed approach to this important theory.

But I’ve been goaded. Google made me do it. My attention was drawn to a video put out by some folks at Google, Data Modeling for BigQuery. The video is fine for the most part, but it makes some misstatements about relational theory that just drive me crazy. They repeat commonly accepted misconceptions about relational databases—misconceptions that, unfortunately, have driven some of the “advances” we’ve seen of late in the realm of database technology. There have definitely been some true advances, but some new technology is merely different without being better.
If you’re a practitioner, designing, implementing, and using databases, whether SQL or NoSQL, this won’t matter much to you, although it never hurts to learn a little more about the theory of data. However, if you are a programmer who might be the one who builds the next NoSQL mega-star that will replace decades-old technology, you need to know this, because this knowledge will enable you to blind-side every established DBMS vendor, whether SQL or NoSQL." --Ted Hills, Understand Relational to Understand the Secrets of Data

Friday, September 1, 2017

Don't Confuse/Conflate Database Consistency with Truth



Disregard for foundation knowledge and failure to learn from past mistakes by even data professionals deemed experts inhibit progress in data management and bring back problems already resolved that should be of foremost concern to data analysts. Consider the following:

"Above all else, we count on databases to reflect the truth consistently, or at least to reflect the table data perfectly. The database cannot be blamed when an application (or the end users of an application) place inaccurate data in its tables, but a database must accurately report the data it holds. Therefore, bugs are not all created equal; there are bugs, and there are wrong-rows bugs, bugs that silently misrepresent the data that the tables hold. Even the craziest, most obscure corner case that potentially misrepresents your data should rightly bring a loud chorus: "The emperor has no clothes!" We depend on the database, above all, not to lie."

Sunday, August 27, 2017

Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness



Note: This is a 8/24/17 rewrite of a 5/20/13 post to bring it in line with McGoveran's formal exposition of Codd's RDM [1] and its correct interpretation.

08/25/17: I have added formal definitions of logical validity and semantic correctness. 
09/01/17: Minor revisions. 
09/02/17: Added references.
03/15/18: Minor revisions.


Here's what's wrong with last week's picture, namely:
"In my experience, using an object model in both the application layer and in the database layer results in an inefficient system. This are my personal design goals:
- Use a relational data model for storage
- Design the database tables using relational rules including 3rd normal form
- Tables should mirror logical objects, but any object may encompass multiple tables
- Application objects, whether you are using an OO language or a traditional language using structured programming techniques should parallel application needs which most closely correspond to individual SQL statements than to tables or "objects". --LinkedIn.com

Sunday, August 20, 2017

This Week



1. Database Truth of the Week

“... [one] limitation imposed by set semantics is the inability to express the concept of a computer variable to which values can be destructively assigned (or "updated") ... variables can be expressed in logic, but they cannot be expressed in elementary set theory, or first order predicate logic (FOPL) -- the foundations of the RDM. Other, more expressively powerful systems are required. Unfortunately, such powerful formal systems do violence to the RDM and its intent.” --David McGoveran


2. What's Wrong With This Database Picture?

"In my experience, using an object model in both the application layer and in the database layer results in an inefficient system. This are my personal design goals:
- Use a relational data model for storage.
- Design the database tables using relational rules including 3rd normal form
- Tables should mirror logical objects but any object may encompass multiple tables
- Application objects, whether you are using an OO language or a traditionallanguage using structured programming techniques should parallel application needs which most closely correspond to individual SQL statements than to tables or "objects". --LinkedIn.com

Sunday, August 13, 2017

Relational Fidelity, Cursors and ORDER BY



Here's what's wrong with last database picture, namely:
"In a book I am reading (QUERYING SQL SERVER 2012) the author talks about theory of how databases work. He mentions relations, attributes and tuples etc. He frequently stresses the fact that some aspect of T-SQL is not relational. Like in the following excerpt:
"T-SQL also supports an object called a cursor that is defined based on a result of a query, and that allows fetching rows one at a time in a specified order. You might care about returning the result of a query in a specific order for presentation purposes or if the caller needs to consume the result in that manner through some cursor mechanism that fetches the rows one at a time. But remember that such processing isn’t relational. If you need to process the query result in a relational manner--for example, define a table expression like a view based on the query--the result will need to be relational. Also, sorting data can add cost to the query processing. If you don’t care about the order in which the result rows are returned, you can avoid this unnecessary cost by not adding an ORDER BY clause."
I would like to know, since every implementation of SQL pretty much has an ORDER BY clause which makes it non-relational, why does it even matter that (the set after ORDER BY is used) its not relational anymore since its like that everywhere? I can understand if he said it was non-standard, for example using != instead of <> for inequality because that affects portability etc., but I do not understand why something is better being relational. Please enlighten." --stackoverflow.com

View My Stats