Sunday, September 10, 2017

This Week



1. Database Truth of the Week

“A network is a directed acyclic graph (the "direction" of the transitive relationship) and, thus, amenable to transitive closure (TC). In the Relational Data Model (RDM) that usually means the smallest set that includes all the members that satisfy the transitive relationship in question (for the count of each object type the closure is computed and the count ignores level). While the Relational Data Model (RDM) can handle an important subset of graph theory via special graph domain operators and extensions to the original relational operators, which could be made efficient, it is a very difficult problem. Certain computations on finite sets such as TC are not in general computable in a language based on first order predicate logic (FOPL) that is declarative, decidable and supports physical independence (PI) -- a core relational objective. They require a computationally complete language (CCL) that is imperative and recursive.
A ‘TC function’ can be implemented using a host CCL that returns its result in the form of a relation; then a symbol (i.e., pure syntax) of type relation can be defined in relational algebra that references/invokes that function. From within the algebra it appears to be just a relation and is up to the user to understand what the value of the returned relation means --i.e., that it represents the TC. That understanding/interpretation is outside the algebra and passed to users only via documentation (e.g., some meta-language).” --David McGoveran


2. What's Wrong With This Database Picture?

"I don’t like talking about the relational theory of data. It is absolutely fundamental to any deep understanding of data, but most practitioners get along fine without it. It’s more the implementers of database management systems (DBMSs) who need to understand relational theory, so teaching relational theory to ordinary practitioners is a bit like tormenting people with irrelevant theory before you let them get on with the business at hand. Moreover, some of those who understand relational theory use their knowledge to beat other people over the head with it. I don’t want to be associated with that high-handed approach to this important theory.

But I’ve been goaded. Google made me do it. My attention was drawn to a video put out by some folks at Google, Data Modeling for BigQuery. The video is fine for the most part, but it makes some misstatements about relational theory that just drive me crazy. They repeat commonly accepted misconceptions about relational databases—misconceptions that, unfortunately, have driven some of the “advances” we’ve seen of late in the realm of database technology. There have definitely been some true advances, but some new technology is merely different without being better.
If you’re a practitioner, designing, implementing, and using databases, whether SQL or NoSQL, this won’t matter much to you, although it never hurts to learn a little more about the theory of data. However, if you are a programmer who might be the one who builds the next NoSQL mega-star that will replace decades-old technology, you need to know this, because this knowledge will enable you to blind-side every established DBMS vendor, whether SQL or NoSQL." --Ted Hills, Understand Relational to Understand the Secrets of Data

Friday, September 1, 2017

Don't Confuse/Conflate Database Consistency with Truth



Disregard for foundation knowledge and failure to learn from past mistakes by even data professionals deemed experts inhibit progress in data management and bring back problems already resolved that should be of foremost concern to data analysts. Consider the following:

"Above all else, we count on databases to reflect the truth consistently, or at least to reflect the table data perfectly. The database cannot be blamed when an application (or the end users of an application) place inaccurate data in its tables, but a database must accurately report the data it holds. Therefore, bugs are not all created equal; there are bugs, and there are wrong-rows bugs, bugs that silently misrepresent the data that the tables hold. Even the craziest, most obscure corner case that potentially misrepresents your data should rightly bring a loud chorus: "The emperor has no clothes!" We depend on the database, above all, not to lie."

Sunday, August 27, 2017

Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness



Note: This is a 8/24/17 rewrite of a 5/20/13 post to bring it in line with McGoveran's formal exposition of Codd's RDM [1] and its correct interpretation.

08/25/17: I have added formal definitions of logical validity and semantic correctness. 
09/01/17: Minor revisions. 
09/02/17: Added references.
03/15/18: Minor revisions.


Here's what's wrong with last week's picture, namely:
"In my experience, using an object model in both the application layer and in the database layer results in an inefficient system. This are my personal design goals:
- Use a relational data model for storage
- Design the database tables using relational rules including 3rd normal form
- Tables should mirror logical objects, but any object may encompass multiple tables
- Application objects, whether you are using an OO language or a traditional language using structured programming techniques should parallel application needs which most closely correspond to individual SQL statements than to tables or "objects". --LinkedIn.com

Sunday, August 20, 2017

This Week



1. Database Truth of the Week

“... [one] limitation imposed by set semantics is the inability to express the concept of a computer variable to which values can be destructively assigned (or "updated") ... variables can be expressed in logic, but they cannot be expressed in elementary set theory, or first order predicate logic (FOPL) -- the foundations of the RDM. Other, more expressively powerful systems are required. Unfortunately, such powerful formal systems do violence to the RDM and its intent.” --David McGoveran


2. What's Wrong With This Database Picture?

"In my experience, using an object model in both the application layer and in the database layer results in an inefficient system. This are my personal design goals:
- Use a relational data model for storage.
- Design the database tables using relational rules including 3rd normal form
- Tables should mirror logical objects but any object may encompass multiple tables
- Application objects, whether you are using an OO language or a traditionallanguage using structured programming techniques should parallel application needs which most closely correspond to individual SQL statements than to tables or "objects". --LinkedIn.com

Sunday, August 13, 2017

Relational Fidelity, Cursors and ORDER BY



Here's what's wrong with last database picture, namely:
"In a book I am reading (QUERYING SQL SERVER 2012) the author talks about theory of how databases work. He mentions relations, attributes and tuples etc. He frequently stresses the fact that some aspect of T-SQL is not relational. Like in the following excerpt:
"T-SQL also supports an object called a cursor that is defined based on a result of a query, and that allows fetching rows one at a time in a specified order. You might care about returning the result of a query in a specific order for presentation purposes or if the caller needs to consume the result in that manner through some cursor mechanism that fetches the rows one at a time. But remember that such processing isn’t relational. If you need to process the query result in a relational manner--for example, define a table expression like a view based on the query--the result will need to be relational. Also, sorting data can add cost to the query processing. If you don’t care about the order in which the result rows are returned, you can avoid this unnecessary cost by not adding an ORDER BY clause."
I would like to know, since every implementation of SQL pretty much has an ORDER BY clause which makes it non-relational, why does it even matter that (the set after ORDER BY is used) its not relational anymore since its like that everywhere? I can understand if he said it was non-standard, for example using != instead of <> for inequality because that affects portability etc., but I do not understand why something is better being relational. Please enlighten." --stackoverflow.com

Saturday, August 5, 2017

This Week



1. Database Truth of the Week

"Semantic correctness: every interpretation of the symbols (meaning assignment and truth value assignment) that makes the axioms true, makes the theorems true. When we extend a logical data model with semantics (specific to the subject matter and its "business" rules) via constraints, those constraints become axioms that must be true." --David McGoveran

Tuesday, August 1, 2017

Structure, Integrity, Manipulation: How to Compare Data Models




The IT industry operates like the fashion industry: every few years -- and the number keeps getting smaller -- a "new" data technology pops up, with vendors, the trade media and various "experts" all stepping over each other to claim that it'll "revolutionize your business" and unless you jump on the bandwagon, you'll be "left behind." But time and again these prove to be fads lacking a sound foundation. Huge resources are invested in migrations from fad to fad, rather than in productive work (Don't believe the hype about Hadoop usage, Basta, Big Data It's Time to Say Arrivederci). Remember?
"Hadoop seems to take over relational database, as Hbase can store even unstructured data whereas relational data warehouse limits to structured data ... handles traditional structured data just fine, albeit in a different way than a RDBMS ... EDW vendors [will] incorporate Hadoop framework into their core architectures to enable advanced and high performance analytics."

View My Stats