Follow @DBDebunk
Follow @ThePostWest
Revised 6/26/19.
In
Part 1 and Part 2 we
explained that when the RDM (1969-70) and the E/RM (1976) were
introduced, there was no distinction between a conceptual and a logical
level -- the conceptual-logical-physical distinction of levels of
representation emerged in mid 80s. Only in 1980 did Codd specify three
components of a formal data model -- structure, integrity, manipulation.
While the RDM satisfies the specification, the E/RM does not: it is a
conceptual modeling approach, weaknesses of which have been elaborated
elsewhere[1]. In Part 3 we presented a common example of conceptual-logical conflation (CLC),
and corresponding confusion of types of model (conceptual, logical,
physical, and data).
As promised, here we outline a new
conceptual modeling approach derived by David McGoveran from his work
formalizing Codd's RDM. It makes an ontological commitment different
from that by conventional modeling, which requires revision and
extension of the RDM -- an objective of David's effort.
Saturday, November 3, 2018
Sunday, October 28, 2018
Understanding Conceptual vs. Data Modeling Part 3: Don't Conflate Reality and Data
Follow @DBDebunk
Follow @ThePostWest
In Part 1 and Part 2 we explained that between 1975-81, when the E/RM and RDM were introduced, there was no distinction between an informal conceptual and a formal logical level. In 1980, however, Codd defined a formal data model and in the later 80s the conceptual-logical-physical levels of representation emerged. If applied to the two models:
Consider the question "does data modeling slow down an application development process?". I will set aside the notion of "speeding up" application development by skipping altogether "data modeling" (whichever way it is meant), and focus on the response.
In Part 1 and Part 2 we explained that between 1975-81, when the E/RM and RDM were introduced, there was no distinction between an informal conceptual and a formal logical level. In 1980, however, Codd defined a formal data model and in the later 80s the conceptual-logical-physical levels of representation emerged. If applied to the two models:
- Only the RDM satisfies the definition;
- The E/RM can be used at the conceptual level to model reality, the latter can be used to model data at the logical level (i.e., formalize conceptual models as logical models for database representation).
Consider the question "does data modeling slow down an application development process?". I will set aside the notion of "speeding up" application development by skipping altogether "data modeling" (whichever way it is meant), and focus on the response.
Thursday, October 18, 2018
Understanding Conceptual vs. Data Modeling Parts 1-4
Follow @DBDebunk
Follow @ThePostWest
As parts of a new multipart series, I have rewritten two previous posts that merit a new read.
The other two parts are new posts:
As parts of a new multipart series, I have rewritten two previous posts that merit a new read.
- Understanding Conceptual vs. Data Modeling Part 1: The E/RM and the RDM.
- Understanding Conceptual vs. Data Modeling Part 2: The E/RM Models Reality, the RDM Models Data.
The other two parts are new posts:
- Understanding Conceptual vs. Data Modeling Part 3: Don't Conflate Reality and Data.
- Understanding Conceptual vs. Data Modeling Part 4: Property-Entity Modeling.
Saturday, September 29, 2018
Understanding Conceptual vs.Data Modeling Part 2: E/RM Models Reality, RDM Models Data
Follow @DBDebunk
Follow @ThePostWest
Re-write 10/17/18
Revised 11/1/18
In Part 1 we explained that when the RDM and the E/RM were introduced, the distinct conceptual-logical-physical levels of representation had not yet emerged, and a data model had not yet been formally defined. But in 1980 Codd defined a formal data model as a combination of (1) data structures, (2) integrity constraints, and (3) operators on the structures[1], and later on the three-fold trinity of levels came into being. Given a conceptual level distinct from the logical, do the RDM and the E/RM satisfy the definition -- are they data models in today's terms?
Recall from Part 1 that the RDM has all three components and is defined in purely logical terms, so it is a data model. But the E/RM definition intermingles conceptual and logical terminology, and therefore is not consistent with two distinct levels. Moreover, as a data model E/RM is incomplete:
Re-write 10/17/18
Revised 11/1/18
In Part 1 we explained that when the RDM and the E/RM were introduced, the distinct conceptual-logical-physical levels of representation had not yet emerged, and a data model had not yet been formally defined. But in 1980 Codd defined a formal data model as a combination of (1) data structures, (2) integrity constraints, and (3) operators on the structures[1], and later on the three-fold trinity of levels came into being. Given a conceptual level distinct from the logical, do the RDM and the E/RM satisfy the definition -- are they data models in today's terms?
Recall from Part 1 that the RDM has all three components and is defined in purely logical terms, so it is a data model. But the E/RM definition intermingles conceptual and logical terminology, and therefore is not consistent with two distinct levels. Moreover, as a data model E/RM is incomplete:
“The E/RM is not a data model as formally defined by Codd: no explicit structural component except sets classified in various ways, no explicit manipulative component except implied set operations, and very limited integrity (keys).”
Contrary to claims, Date does not exactly say that the E/RM is a data model:--David McGoveran
“[It] is not even clear that the E/R "model" is truly a data model at all, at least in the sense in which we have been using that term in this book so far (i.e., as a formal system involving structural, integrity, and manipulative aspects). Certainly the term "E/R modeling" is usually taken to mean the process of deciding the structure (only) of the database, although [it does deal with] certain integrity aspects also, mostly having to do with keys ... However, a charitable reading of [Chen's original E/RM paper] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the relational model (it is certainly not a candidate for replacing the relational model, as some have suggested).”[2]Note that even if, charitably, the E/RM is considered a data model, it is not up to the RDM.
Tuesday, September 18, 2018
Don't Conflate/Confuse Primary Keys, PK Constraints, and Indexes
Follow @DBDebunk
Follow @ThePostWest
“What is the difference between an index and a key? How are they related?”I have recently published a paper[1], and posted a multipart series[2] on relational keys. In the latter I stated as follows:
“There seams to be some confusion between what a Primary Key is, and what an Index is and how they are used. The Primary Key is a logical object. By that I mean that is simply defines a set of properties on one column or a set of columns to require that the columns which make up the primary key are unique and that none of them are null. Because they are unique and not null, these values (or value if your primary key is a single column) can then be used to identify a single row in the table every time. In most if not all database platforms the Primary Key will have an index created on it. An index on the other hand doesn’t define uniqueness. An index is used to more quickly find rows in the table based on the values which are part of the index. When you create an index within the database, you are creating a physical object which is being saved to disk.”
“A primary key by default creates a clustered index. A unique constraint/key by default creates a non-clustered index.”
“An index is a (logically) ordered list of rows. For example, an index on LastName means all values are already sorted in LastName order. Usually index rows contain far fewer columns in them than the table itself (except the clustered index, which is the table). A key is a column or columns that defines the order of an index. For example, on an index ordered by (LastName,FirstName), then LastName and FirstName are the keys. Btw, a primary key is a physical object, not a logical one. The db engine needs physical rows in order to insure unique values in the index.”
--Difference between an index and a key?, SQLTeam.com
"As a relational feature, keys can only be properly understood within the formal foundation of the RDM, which is simple set theory (SST) expressible in first order predicate logic (FOPL) adapted and applied to database management. Yet that is precisely what is ignored and dismissed in the industry -- including by the authors of SQL[3]."I have also written extensively on widespread logical-physical confusion (LPC)[4], recently specifically in the key-index context[5]. The replies above are examples -- if any more were needed -- that validate my repeated claim of lack of foundation knowledge in the industry -- can you tell what's wrong with, and what's correct in, them?
Tuesday, September 11, 2018
RE-WRITE
Follow @DBDebunk
Follow @ThePostWest
See: https://www.dbdebunk.com/2018/09/designation-property-and-assertion.html
See: https://www.dbdebunk.com/2018/09/designation-property-and-assertion.html
Wednesday, August 29, 2018
DISTINCT and ORDER BY Are Not Relational
Follow @DBDebunk
Follow @ThePostWest
“One of the things that confuse SQL users all the time is how DISTINCT and ORDER BY are related in a SQL query ... most people quickly understand:
SELECT DISTINCT length
FROM film
[that] returns results in an arbitrary order, because the database can (and might apply hashing rather than ordering to remove duplicates) ... Most people also understand:
SELECT length
FROM film
ORDER BY length
[that] will give us duplicates, but in order ... And, of course, we can combine the two:
SELECT DISTINCT length
FROM film
ORDER BY length
[But if] somewhat intuitively, we may want to order the lengths differently, e.g. by title:
SELECT DISTINCT length
FROM film
ORDER BY title
[m]ost databases [sic] fail this query with an exception like Oracle’s:
ORA-01791: not a SELECTed expression
At first sight ... this
SELECT length
FROM film
ORDER BY title
works after all ... So, how are these different? We have to rewind and check out the logical order of SQL operations (as opposed to the syntactic order). And always remember, this is the logical order, not the actual order executed by the optimiser.”
--How SQL DISTINCT and ORDER BY are Related, Jooq.org
Subscribe to:
Posts (Atom)