Monday, May 8, 2023


Note: Reader mail (rm) posts are my exchanges with readers that raise fundamental issues. I may improve language for clarity and amplify with notes for the benefit of readers.

I've written more than once about Chen's E/RM (see references) and recently a reader emailed me a reaction to one of these writings, offering E/RM as evidence that properties were not necessary in conceptual modeling -- everything could be expressed with just entities and relationships. That is, of course, incorrect, but I was working at the time on my latest series of posts on Relationships and the RDM, which was going to address that issue too and I intended to refer him to it. But knowing that David McGoveran knew Chen in the 80s, I asked him if he had any comments. He thought that something broader and more forceful regarding Chen's work was in order and suggested some text. We had some discussion on the subject and decided that I will post my reply (ON PROPERTIES IN CONCEPTUAL MODELING) and publish his reply to the reader later, which I do below.

Sunday, March 19, 2023


Note: Reader mail (rm) posts are exchanges with my readers that raise fundamental issues. I may improve language for clarity and amplify with Ed. Notes for the benefit of readers.

“Your post Understanding Conceptual vs. Data Modeling Part 1: Data Model - The RDM Is, the E/RM Isn't is well done. However, concepts and relationships can be perceived and modeled without formulating or specifying properties. Chen did that in his ER diagrams. And informally, everyone does it as a mental model every day. I suppose anyone can define conceptual modeling however they wish to.  But at its minimum and most abstract, which is what conceptual modelling is usually understood to be, it can be done without formulating or specifying properties.” --GR

Friday, November 1, 2019

Comments on a Stonebraker Article

These comments were prompted by a LinkedIn post referencing Michael Stonebraker's Those Who Forget the Past Are Doomed to Repeat It  -- something I often reiterate myself -- where he argues:
“Over the past decade, there have been a number of DBMSs introduced (typically labeled as NoSQL) which utilize a network or hierarchical data model. MongoDB and Cassandra come immediately to mind as examples. Some such systems support networks through the concepts of "links" and some support hierarchical data using a nested data model often utilizing JSON. In my opinion, these systems have not internalized lessons from history.
“At the SIGFIDET (now SIGMOD) annual conference in 1974, there was a "Great Debate" over the merits of the relational model versus the network and hierarchical models ... Basically, the argument was about which model [relational or network] was a better fit for structured data (as opposed to documents, e-mails, etc.) and boiled down to two questions:

Question 1: Are high-level data sublanguages a good idea?
Question 2: Are tables the best data structure or should one use a network or hierarchy?”

“The last 45 years have definitely affirmed Codd’s position on both issues ... The conclusion from the 1970s was that the relational model provides superior data independence, compared to the network and hierarchical [graph] models. Forty-five years later, this conclusion is still true. If you want to insulate yourself from the changes that business conditions dictate, use a relational DBMS. If you want the successor to the successor to your job to thank you for your wise decision, use a relational model.”
I couldn't agree more, having repeatedly argued this myself. But he misses some old aspects that the industry has failed to recognize, has ignored, or dismissed[1]; and some important new aspects due to a new understanding of Codd's work[2].

Friday, June 14, 2019

Normalization and Further Normalization Part 3: Understanding Database Design

Note: This is a re-write of two older posts, to bring them into line with McGoveran's formalization, re-interpretation, and extension[1] of Codd's RDM.

In Part 1 we explained that for a database to be relational, database design must adhere to three core principles, in which case it consists of relations that are by definition in both 1NF and 5NF. In Part 2 we showed that whether tables visualize relations (i.e., are R-tables) can be determined only with reference to the conceptual model that the database designer intended the database to represent (not what any users might think it does). This is obscured by the common and entrenched confusion/conflation of levels of representation and, consequently, of types of model -- conceptual, logical, physical, and data model -- that we have so often debunked[2].

Saturday, May 11, 2019

Understanding Data Modeling Part 5: Conclusions

In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the "data modeling" mess in the industry that Friesendal has tried to catalog, and argued that it can help overcome it. In Part 2 we applied this knowledge to the first two industry "data models" considered by Friesendal -- the E/RM and RDM. In Part 3, we applied it to OO/UML and (yet a formally undefined) GDM, and in Part 4 to Fact Modeling (FM).

Here we apply it to Friesendal's conclusions.

Saturday, April 20, 2019

Understanding Data Modeling Part 2: "E/RM" and "RDM"

In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the industry's modeling mess that Friesendal has tried to map. We now proceed to apply it to the various industry "data models" considered by Friesendal, and his understanding thereof. In this part, we apply this knowledge to the first two industry "data models" considered by Friesendal -- the E/RM and RDM.

"Entity-Relationship Model"

“One of the first formal attempts at a framework for Data Modeling was the Entity-Relationship data model paradigm proposed [in 1976] by Peter Chen. Notice that in the original Chen-style, the attributes are somewhat independent and the relationships between entities are named and carry cardinalities ("how many" participants in each end of the relationship) ... Attributes are related to their "owner" entity" in what other people called "functional dependencies".”

Sunday, December 2, 2018

What Is a Data Model, and What It Is Not

“The term data model is used in two distinct but closely related senses. Sometimes it refers to an abstract formalization of the objects and relationships found in a particular application domain, for example the customers, products, and orders found in a manufacturing organization. At other times it refers to a set of concepts used in defining such formalizations: for example concepts such as entities, attributes, relations, or tables. So the "data model" of a banking application may be defined using the entity-relationship "data model". This article uses the term in both senses.”
--Data Model, Wikipedia

What a True Data Model Is

Few practitioners realize that Codd invented the Relational Data Model (RDM) as the first exemplar of a data model, a concept that he formalized in 1980 as follows:

Saturday, November 3, 2018

Understanding Conceptual vs. Data Modeling Part 4: Properties-object Modeling

Revised 6/26/19.

In Part 1 and Part 2 we explained that when the RDM (1969-70) and the E/RM (1976) were introduced, there was no distinction between a conceptual and a logical level -- the conceptual-logical-physical distinction of levels of representation emerged in mid 80s. Only in 1980 did Codd specify three components of a formal data model -- structure, integrity, manipulation. While the RDM satisfies the specification, the E/RM does not: it is a conceptual modeling approach, weaknesses of which have been elaborated elsewhere[1]. In Part 3 we presented a common example of conceptual-logical conflation (CLC), and corresponding confusion of types of model (conceptual, logical, physical, and data).

As promised, here we outline a new conceptual modeling approach derived by David McGoveran from his work formalizing Codd's RDM. It makes an ontological commitment different from that by conventional modeling, which requires revision and extension of the RDM -- an objective of David's effort.

Saturday, September 29, 2018

Understanding Conceptual vs.Data Modeling Part 2: E/RM Models Reality, RDM Models Data

Re-write 10/17/18
Revised 11/1/18

In Part 1 we explained that when the RDM and the E/RM were introduced, the distinct conceptual-logical-physical levels of representation had not yet emerged, and a data model had not yet been formally defined. But in 1980 Codd defined a formal data model as a combination of (1) data structures, (2) integrity constraints, and (3) operators on the structures[1], and later on the three-fold trinity of levels came into being. Given a conceptual level distinct from the logical, do the RDM and the E/RM satisfy the definition -- are they data models in today's terms?

Recall from Part 1 that the RDM has all three components and is defined in purely logical terms, so it is a data model. But the E/RM definition intermingles conceptual and logical terminology, and therefore is not consistent with two distinct levels. Moreover, as a data model E/RM is incomplete:

“The E/RM is not a data model as formally defined by Codd: no explicit structural component except sets classified in various ways, no explicit manipulative component except implied set operations, and very limited integrity (keys).”
--David McGoveran
Contrary to claims, Date does not exactly say that the E/RM is a data model:
“[It] is not even clear that the E/R "model" is truly a data model at all, at least in the sense in which we have been using that term in this book so far (i.e., as a formal system involving structural, integrity, and manipulative aspects). Certainly the term "E/R modeling" is usually taken to mean the process of deciding the structure (only) of the database, although [it does deal with] certain integrity aspects also, mostly having to do with keys ... However, a charitable reading of [Chen's original E/RM paper] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the relational model (it is certainly not a candidate for replacing the relational model, as some have suggested).”[2]
Note that even if, charitably, the E/RM is considered a data model, it is not up to the RDM.

Wednesday, November 8, 2017

Understanding Conceptual vs. Data Modeling Part 1: Data Model - The RDM Is, the E/RM Isn't

Re-write 10/16/18
“E/RM is a data model -- So says Date, Chen, etc. So says the majority of current industry experts ... With very strong references to Codd (who he worked with), Date elegantly explains the differences between RM and E/RM -- but clearly believes both are data models (even allowing for the charitable comment). If we take a RDB as the ultimate target implementation of data, and an E/RM (or extended) can correctly design all the artifacts that are implemented, this means it is modeling the data. Granted, an E/RM does not explicitly model some of the non-structural aspects of the original Codd definition.”

“Out of interest, is there a common Relational Modeling tool, that is not also an E/RM tool and models the full Codd definition? There are also several other methods of modeling data -- E/RM is more a mechanism to represent the data. If E/RMs are used by IT professionals across the world to direct the design and build of the majority of applications guided by standard methodologies, is the view of this argument that these were all build wrongly? Regardless of success? Is the inferred conclusion that only the RM models data, and ERM, [or] any other techniques do not? [If so] that is a little limiting.”

Objects, Properties, and Ontological Commitment

We are culturally and linguistically conditioned to conceptualize the world as objects with properties. Objects in a universe thereof that share common properties are of the same type and form a class, distinguishing them from objects that are not and do not. Applying a class definition to the universe  selects out the group of objects of that type from the universe.

Philosophical ontology is the study of being, existence, reality, as well as the basic categories of being, and their relationships -- what entities exist or may be said to exist, and how they may be grouped, related, and subdivided according to similarities and differences. 

Note: 'Object' is used in the general, not OO sense. Ontology, as used herein, should not be confused with "computer science ontology", whereby the term ontology was usurped, and is understood by programmers as meaning a conceptual graph of directed semantic relationships among objects (and only sometimes among object types).

Conceptual modeling (1) identifies types of objects of interest, and (2) formulates business rules (BR) that specify their properties and relationships and, as such, makes an ontological commitment. Any approach to conceptual modeling must consider the ontological commitment upon which it is based, which has major implications for the data model used to formalize conceptual models as logical models for computable database representation -- it must be consistent with that commitment.

Unfortunately, due to lack of foundation knowledge in the industry[1], practitioners -- both vendors and users -- are largely unaware of, and oblivious to ontological underpinning and their implications for database technology and practice, one reason why they not only stagnated, but regressed in the last five decades. In this multipart series we explain the important distinction between conceptual, and data modeling (aka logical database design), which requires a formal data model. The E/RM is not, and while it can be used for conceptual modeling of reality, not data, we outline a new conceptual modeling approach that makes a different ontological commitment and requires adjustments to the RDM, both necessary for genuine progress.

