DATABASE DEBUNKINGS

Sunday, December 17, 2017

This Week

1. Database truth of the week

"Within the database field, it is common to refer to three “level” of description: conceptual, logical, and physical. Both the logical level and the physical level are formal systems. By contrast, the conceptual level is typically an informal system and refers to the subject of the database.

The conceptual language is a subject language, in the terminology of formal systems. The conceptual level identifies the concepts to be formally represented by the logical and physical levels, and how users think and talk about those concepts. This level corresponds only informally to the so-called “conceptual schema” of earlier approaches to information management, which emphasized the capture of conceptual information using various techniques including diagrams and documentation having various degrees of formality, but not forming a strictly formal system themselves." -- David McGoveran

2. What's wrong with this database picture?

I re-wrote two older debunkings to bring them in line with the McGoveran formalization and interpretation of Codd's true RDM. Re-reads are recommended.

"Can you have 2 tables, VIEWS and DOWNLOADS, with identical structure in a good DB schema (item_id, user_id, time). Some of the records will be identical but their meaning will be different depending on which table they are in. The "views" table is updated any time a user views an item for the first time. The "downloads" table is updated any time a user downloads an item for the first time. Both of the tables can exist without the other ..."

"I have a database for a school ... [with] are numerous tables obviously but consider these:

CONTACT - all contacts (students, faculty) has fields such as LAST, FIRST, MI, ADDR, CITY, STATE, ZIP, EMAIL;
FACULTY - hire info, login/password for electronic timesheet login, foreign key to CONTACT;
STUDENT - medical comments, current grade, foreign key to CONTACT.
Do you think it is a good idea to have a single table hold such info? Or, would you have had the tables FACULTY and STUDENT store LAST, FIRST, ADDR and other fields? ..."

Conventional Wisdom and True Relational Features

Follow @DBDebunk Follow @ThePostWest

Here's what's wrong with last week's picture, namely:

"Per Date’s AN INTRODUCTION TO DATABASE SYSTEMS, Date & Darwen’s DATABASES, TYPES, AND THE RELATIONAL MODEL, and related references, the features of a relational database are values, types, attributes, tuples, relations, relation-valued variables, operators, and constraints.

A type is a set of values and related operators.

An attribute is a name, value, type triple.

A tuple is a set of attributes.

A relation is a set of tuples with a given heading.

A relation-valued variable (known as a relvar) is a persistent variable whose time-varying value is a relation." --Dave Voorhis, Computer scientist; lead developer of Rel, a true relational database system, Quora.com

This is more or less the conventional wisdom, which is nothing like the true RDM envisioned by Codd [1].

DBMS for Analytics: Risky Business Without Foundation Knowledge (Part 1)

Follow @DBDebunk Follow @ThePostWest

Note: This was originally a post at AllAnalytics, which is no longer exists, so some links to other posts there no longer work, but I left them in to alert the reader that I have written on those specific subjects. Other links work.

A new study finding that "non-relational database management systems now comprising 70% of analytics data sources" attributes their popularity to "superiority" over RDBMSs in satisfying analytics needs. There are good reasons to be skeptical of such findings, but even if this one were true, the arguments advanced in support of the claim are rooted in the usual misconceptions due to poor foundation knowledge debunked by this blog. Let's see.

This Week

Follow @DBDebunk Follow @ThePostWest

1. Database Truth of the Week

"A formal system is a systematic way of representing something. We call that something the subject and often refer to it as the “subject system”, although it might not be “systematic” in any sense at all. By contrast we call the formal system the object system -- a system of abstract “objects”.

Representing a subject with a formal system allows us to reason about it without getting trapped in ambiguities, or circular arguments. The formal system becomes a theory about that portion of the subject that has been represented and when that portion is faithfully represented by the theory, we say that portion is a model of the theory.There are at least three distinct uses of languages necessary to use or apply any formal system that correspond to the three distinct ways in which we need to discuss and use formal systems. It is standard to refer to these three uses by different names and as if they were distinct languages: subject, object and meta-language." --David McGoveran

Ed. Note: In the database context, the subject language expresses the conceptual model; the object language expresses the logical model; the data model is the meta-language that expresses the relationship between object and subject.

2. What's Wrong With This Database Picture?

"Per Date’s AN INTRODUCTION TO DATABASE SYSTEMS, Date & Darwen’s DATABASES, TYPES AND THE RELATIONAL MODEL and related references, the features of a relational database are values, types, attributes, tuples, relations, relation-valued variables, operators, and constraints.

A type is a set of values and related operators.

An attribute is a name, value, type triple.

A tuple is a set of attributes.

A relation is a set of tuples with a given heading.

A relation-valued variable (known as a relvar) is a persistent variable whose time-varying value is a relation." --Dave Voorhis (Computer scientist; lead developer of Rel, a true relational database system), --What are the features of a relational database?, Quora.com

What Relations Really Are and Why They Are Important

Follow @DBDebunk Follow @ThePostWest

Note: Some of the References have been re-written to bring them into line with the McGoveran formalization and interpretation [1] of Codd's real RDM -- re-reading is recommended.

Here's what's wrong with the picture of two weeks ago, namely:

"In SQL RDBMSes (such as MS SQL Server and Oracle] tables are permently stored relations, where the column names defined in the data dictionary form the "heading" and the rows are the "tuples" of the relation."

"A relation can be represented by a table in database. A relation in the context of modeling a problem will include the fields and possibly the identification of fields which have relationships with other relations..."

"Put simply, a "relation" is a table, the heading being the definition of the structure and the rows being the data."

"In simple English: relation is data in tabular format with fixed number of columns and data type of each column. This can be a table, a view, a result of a subquery or a function etc."

"A relation is a table, which is a set of data. A table is the result of a query."
--What is a relation in database terminology?, StackOverflow.com

This Week

Follow @DBDebunk Follow @ThePostWest

1. Database Truth of the Week

"Codd's original 'normal form' in 1969-70 is not equivalent to the current 1NF. The former and 'normalization' were tied to Codd's first definition of join in 1969. Multiple normal forms (1NF-5NF) and 'further normalization' are a consequence of Codd's re-definition of join in 1970 as the one of today. Once 1NF and further normalization to at least 2NF have been introduced, 'the one normal form' makes no longer sense. Thus, there is no way to answer "what is the difference between the normal form and 1NF" without taking into account the definition of join, and -- if defined as we now do -- no way to understand the former, except to say that it was to the 1969-70 join what the 5NF is to the current join.That is one reason I personally strongly believe that while relations are at least in 1NF -- a relation that is not, cannot be represented formally as a relation, nor do the formal operators of the relational algebra work correctly otherwise -- even this is insufficient and formal relations must be in 5NF. Otherwise put, there is no such thing as a non-5NF database relation." --David McGoveran

2. What's Wrong With This Database Picture?

"A relation is a data structure which consists of a heading and an unordered set of tuples which share the same type", according to Wikipedia on 'Relation (database)'."

"In SQL RDBMSes (such as MS SQL Server and Oracle] tables are permanently stored relations, where the column names defined in the data dictionary form the "heading" and the rows are the "tuples" of the relation."

"A relation can be represented by a table in database. A relation in the context of modeling a problem will include the fields and possibly the identification of fields which have relationships with other relations..."

"Put simply, a "relation" is a table, the heading being the definition of the structure and the rows being the data."

"In simple English: relation is data in tabular format with fixed number of columns and data type of each column. This can be a table, a view, a result of a subquery or a function etc."

"A relation is a table, which is a set of data. A table is the result of a query." --What is a relation in database terminology?, StackOverflow.com

Understanding Conceptual vs. Data Modeling Part 1: Data Model - The RDM Is, the E/RM Isn't

Follow @DBDebunk Follow @ThePostWest

Re-write 10/16/18

“E/RM is a data model -- So says Date, Chen, etc. So says the majority of current industry experts ... With very strong references to Codd (who he worked with), Date elegantly explains the differences between RM and E/RM -- but clearly believes both are data models (even allowing for the charitable comment). If we take a RDB as the ultimate target implementation of data, and an E/RM (or extended) can correctly design all the artifacts that are implemented, this means it is modeling the data. Granted, an E/RM does not explicitly model some of the non-structural aspects of the original Codd definition.”

“Out of interest, is there a common Relational Modeling tool, that is not also an E/RM tool and models the full Codd definition? There are also several other methods of modeling data -- E/RM is more a mechanism to represent the data. If E/RMs are used by IT professionals across the world to direct the design and build of the majority of applications guided by standard methodologies, is the view of this argument that these were all build wrongly? Regardless of success? Is the inferred conclusion that only the RM models data, and ERM, [or] any other techniques do not? [If so] that is a little limiting.”

Objects, Properties, and Ontological Commitment

We are culturally and linguistically conditioned to conceptualize the world as objects with properties. Objects in a universe thereof that share common properties are of the same type and form a class, distinguishing them from objects that are not and do not. Applying a class definition to the universe selects out the group of objects of that type from the universe.

Philosophical ontology is the study of being, existence, reality, as well as the basic categories of being, and their relationships -- what entities exist or may be said to exist, and how they may be grouped, related, and subdivided according to similarities and differences.

Note: 'Object' is used in the general, not OO sense. Ontology, as used herein, should not be confused with "computer science ontology", whereby the term ontology was usurped, and is understood by programmers as meaning a conceptual graph of directed semantic relationships among objects (and only sometimes among object types).

Conceptual modeling (1) identifies types of objects of interest, and (2) formulates business rules (BR) that specify their properties and relationships and, as such, makes an ontological commitment. Any approach to conceptual modeling must consider the ontological commitment upon which it is based, which has major implications for the data model used to formalize conceptual models as logical models for computable database representation -- it must be consistent with that commitment.

Unfortunately, due to lack of foundation knowledge in the industry[1], practitioners -- both vendors and users -- are largely unaware of, and oblivious to ontological underpinning and their implications for database technology and practice, one reason why they not only stagnated, but regressed in the last five decades. In this multipart series we explain the important distinction between conceptual, and data modeling (aka logical database design), which requires a formal data model. The E/RM is not, and while it can be used for conceptual modeling of reality, not data, we outline a new conceptual modeling approach that makes a different ontological commitment and requires adjustments to the RDM, both necessary for genuine progress.

POSTS

Sunday, December 17, 2017

This Week

1. Database truth of the week

2. What's wrong with this database picture?

Sunday, December 10, 2017

Conventional Wisdom and True Relational Features

Sunday, December 3, 2017

DBMS for Analytics: Risky Business Without Foundation Knowledge (Part 1)

This Week

1. Database Truth of the Week

2. What's Wrong With This Database Picture?

Sunday, November 26, 2017

What Relations Really Are and Why They Are Important

Sunday, November 19, 2017

This Week

1. Database Truth of the Week

2. What's Wrong With This Database Picture?

Wednesday, November 8, 2017

Understanding Conceptual vs. Data Modeling Part 1: Data Model - The RDM Is, the E/RM Isn't

Objects, Properties, and Ontological Commitment