Sunday, October 29, 2017

Database Design: What It Is and Isn't



Revised 10/31/17.

Note: Posts starting with this one will be consistent with the TERMINOLOGY page. Fundamental terms -- the grasp of which is necessary for data management practice -- will be boldened. When you encounter one you don't understand, better find out what it means, chances are it's being misused or abused. Once the page is finalized, labels and, time permitting, old posts may also be revised accordingly. 

Reference [9] is an important rewrite and is recommended pre-requisite
for this post that you should read first.

Here's what's wrong with the picture of three weeks ago, namely:
"The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views. In an object database the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system(DBMS).

The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must:

  • Determine the data to be stored in the database.
  • Determine the relationships between the different data elements.
  • Superimpose a logical structure upon the data on the basis of these relationships.
Within the relational model the final step above can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects." --Halil Lacevic, What is a Relational Database?, Quora.com
Many problems in database practice are due to failure to grasp what a data model is and the important distinctions between DBMS functions on the one hand and application functions on the other.

The three design steps above are vague, somewhat confused and obscure more than enlighten. They do not reflect the fact that database design is formalization of a conceptual model of reality as relations constrained to be consistent with the business rules the model consists of. 

Sunday, October 22, 2017

This Week



1. Database Truth of the Week

"The original normal form and the later First Normal Form (1) are distinct. In the early 1969 RDM there was only "the normal form" of relations [a term Codd borrowed from FOPL]. It was based on the initial version of the join operation, which was different than today's join. Had 1NF and further normalization to at least 2NF had been introduced then, the normal form would have made no sense, as there would have been then multiple normal forms, which make sense only with the post-1970 join definition currently in use. Thus, there is no way to answer "what is the difference between the original normal form and 1NF?" without taking into account the definition of join, and -- if defined as we now do -- no way to understand the original normal form, except to say that in the context of the original join definition it would correspond to today's Fifth Normal Form (5NF). This is why a relation is really in 5NF by definition, not in 1NF as per current understanding." --David McGoveran



2. What's Wrong With This Database Picture?

"The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views. In an object database the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system(DBMS).

The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must:

  • Determine the data to be stored in the database.
  • Determine the relationships between the different data elements.
  • Superimpose a logical structure upon the data on the basis of these relationships.
Within the relational model the final step above can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects." --Halil Lacevic, What is a Relational Database?, Quora.com

Monday, October 9, 2017

This Week




1. Database Truth of the Week

“A DBMS using the RDM for all its functionality would be very limited. The RDM only requires that the declarative data sub-language employed by users for data manipulation -- has power not more expressive than first order predicate logic (FOPL), which implies acceptance of certain limitations on what users can do directly in the language, in return for
Language declarativity and decidability;
Semantic correctness and system-guaranteed logical validity;
Physical and logical independence;
Simplicity.”
                                                  --David McGoveran


2. What's Wrong With This Database Picture?

"The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views. In an object database the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system(DBMS).

The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must:

  • Determine the data to be stored in the database.
  • Determine the relationships between the different data elements.
  • Superimpose a logical structure upon the data on the basis of these relationships.
Within the relational model the final step above can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects." 
                             --Halil Lacevic, What is a Relational Database?, Quora.com

Monday, October 2, 2017

Understanding the Division of Labor between Analytics Applications and DBMS



I am coming across, on the one hand, instructions on how to do "analytics with SQL" and, on the other, tools purporting to enable "analytics without SQL." They are an umpteenth iteration of essentially similar ideas during my 30-plus years in data management and reflect common and entrenched fundamental misconceptions that I have documented and analyzed the costly consequences of in my writings and teachings. They will keep repeating, inhibiting genuine progress, as long as data fundamentals are ignored or dismissed. One of the least understood is the distinction between DBMS and application functions.

Friday, September 22, 2017

This Week



1. Database Truth of the Week

“If the data sub-language ... has the power of second order predicate logic (SOPL), expressions are possible that cannot be evaluated (for example, self-referencing expressions) and the formal language is then undecidable, an algorithm to implement a declarative query language is impossible and all hope of physical independence is lost." --David McGoveran


2. What's Wrong With This Database Picture?

"Our terminology is broken beyond repair. [Let me] point out some problems with Date's use of terminology, specifically in two cases.
"type" = "domain": I fully understand why one might equate "type" and "domain", but ... in today's programming practice, "type" and "domain" are quite different. The word "type" is largely tied to system-level (or "physical"-level) definitions of data, while a "domain" is thought of as an abstract set of acceptable values.

"class" != "relvar": In simple terms, the word "class" applies to a collection of values allowed by a predicate, regardless of whether such a collection could actually exist. Every set has a corresponding class, although a class may have no corresponding set ... in mathematical logic, a "relation" is a "class" (and trivially also a "set"), which contributes to confusion.
In modern programming parlance "class" is generally distinguished from "type" only in that "type" refers to "primitive" (system-defined) data definitions while "class" refers to higher-level (user-defined) data definitions. This distinction is almost arbitrary, and in some contexts, "type" and "class" are actually synonymous."

Sunday, September 17, 2017

Database Management: No Progress Without Data Fundamentals



I have recently -- yet again -- been accused in a LinkedIn exchange  of "gibberish without any evidence" and of claiming that "nobody know what they're doing" with databases. I will leave it to readers to judge whether (1) five decades worth of writings and teaching is "no evidence" and (2) my comments in the exchange are gibberish. Here I would like to dare anybody to find claims to that effect in any of my pronouncements. What I did, do and will say is that most data professionals do not know and understand data and relational fundamentals -- an incontrovertible fact proved not just by me[1], but also by others[2,3] and that this inhibits real progress in database management. 

As I wrote two weeks ago:
"The RDM put database management on a formal, scientific foot. Consequently, tool experience and relational terminology are insufficient -- foundation knowledge is necessary. Unfortunately, most data professionals do not possess it, in part because they have been misled by the industry and in part because few go through an education -- as distinct from training -- program that teaches the RDM and teaches it correctly. Consequently, even those with the heart in the right place defend the RDM without a full understanding, their views distorted by what passes for it (stay tuned for a debunking of such a recent example)."
I will now fulfill the promise by debunking just such a "heart-in-the-right-place" defense of the RDM. 

View My Stats