Showing posts with label CLC. Show all posts
Showing posts with label CLC. Show all posts

Sunday, March 26, 2023

RELATIONSHIPS AND THE RDM V2 PART 1: RELATIONS & DATABASE RELATIONS



with David McGoveran

 

Note: This a multi-part re-write of a previous series intended, when completed, to replace it. In the meantime you can consult the old version -- there is nothing wrong with it.

No matter how much I demonstrate the absence of foundation knowledge in the IT industry, the mountain of evidence and reasoning is dismissed -- how can everybody be wrong and only Fabian Pascal be right?

Well, consider the following (from a college prof no less):
“... we are not modeling objects/entities/attribute ... at all in the relational model, [but] a bunch of relationships ... hence perhaps Codd was correct in calling it a "relation", a bunch of relationships ... Interesting that most people think of relationships as being the distinguishing characteristic of a relational model and it is not ... [it] has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes ... What [other] type(s) of relationships can be explicitly and formally defined in a relational data model? Of course there are many other relationships which can be inferred, such as between an attribute and an entity identifier. Please give me a precise reference to where Codd spoke of relationships [differently than i]n his 1985 piece published in ComputerWorld, [where] he said that the only way to represent a relationship (between relations) was through explicitly stored values (i.e., attributes, foreign keys) ... In my personal understanding, a relation is defined as a set of tuples. Then ... "in the relational model every relation represents a relationship". And then a quote from Chen: "each tuple of entities ... is a relationship". If I use the first and the second statements - I can say that a relationship is a set of tuples. The third statement says that a relationship is a tuple. So far, is a relationship a set of an element of a set? (Or may be a set of sets?) ... I argue that there is essentially no difference between relationships between entities of distinct classes and between properties of the same class. They both represent relationships. A property can represent a relationship between entities of distinct classes. If such relationships are represented by foreign keys and the relations representing the classes must be in 1NF, then relational databases can represent only M:1 relationships, a very unnecessary limitation when modeling some reality of interest ... The entity-relationship model is essentially a directed graph model, where relationships are prominent residents. Not so in the relational model (despite the name), where relationships (between relations, mind you) are not visible and in the SQL implementations is reduced to constraints. Relationships are about structure, which is as important as meaning (the semantics of the terms used in the universe being modeled).” --LinkedIn.com

The amount of nonsense squeezed into this rambling is mind boggling. No understanding of the RDM, confusion, abysmal reasoning and misuse of terms -- debunking it in its entirety would be practically impossible (believe me, I tried). Instead, I focus on a critical aspect of the RDM of which there is little grasp in the industry: I convey the fundamentals and leave it to the motivated reader to try their own debunking -- the most effective way I know to learn.

Thursday, March 12, 2020

Muddling Modeling Part 2: An Example




In an old article I used a Hay-Ross exchange to illustrate how disregard for fundamentals and the associated name proliferation -- which underlies the industry's fad-to-fad tradition -- cause confusion that inhibits understanding of conceptual modeling for database design. A recent LinkedIn exchange -- hardly unique -- showed the article to be as relevant today as it was two decades ago, prompting me to bring it up to date.

In Part 1 we reiterated pertinent fundamentals. Here is the re-written article
-- try to apply the fundamentals from Part 1 before you proceed with our debunking.

Sunday, August 25, 2019

Meaning Criteria and Entity Supertype-Subtypes Relationships




Note: This is a re-write of a previous post.
"I have a database for a school ... [with] numerous tables obviously, but consider these:
CONTACT - all contacts (students, faculty): has fields such as LAST, FIRST, ADDR, CITY, STATE, ZIP, EMAIL;
FACULTY - hire info, login/password, foreign key to CONTACT;
STUDENT - medical comments, current grade, foreign key to CONTACT."
"Do you think it is a good idea to have a single table hold such info? Or, would you have had the tables FACULTY and STUDENT store LAST, FIRST, ADDR and other fields? At what point do you denormalize for the sake of being more practical? What would you do when you want to close out one year and start a new year? If you had stand-alone student and faculty tables then you could archive them easily, have a school semester and year attached to them. However, as you go from one year to the next information about a student or faculty may change. Like their address and phone for example. The database model now is not very good because it doesn’t maintain a history. If Student A was in school last year as well but lived somewhere else would you have 2 contact rows? 2 student rows?  Or do you have just one of each and have a change log. Which is best?"
How would somebody who "does not know past, or new requirements, modeling, and database design" and messes with a working database just because "he heard something about (insert your favorite fad here)" figure out correct from bad answers? Particularly if the answers suffer from the same lack of foundation knowledge as the question?

Friday, June 14, 2019

Normalization and Further Normalization Part 3: Understanding Database Design




Note: This is a re-write of two older posts, to bring them into line with McGoveran's formalization, re-interpretation, and extension[1] of Codd's RDM.
 

In Part 1 we explained that for a database to be relational, database design must adhere to three core principles, in which case it consists of relations that are by definition in both 1NF and 5NF. In Part 2 we showed that whether tables visualize relations (i.e., are R-tables) can be determined only with reference to the conceptual model that the database designer intended the database to represent (not what any users might think it does). This is obscured by the common and entrenched confusion/conflation of levels of representation and, consequently, of types of model -- conceptual, logical, physical, and data model -- that we have so often debunked[2].


Sunday, June 2, 2019

Normalization and Further Normalization Part 2: If You Need Them, You're Doing It Wrong




In Part 1 we outlined some fundamentals of database design, namely the distinction between normalization to 1NF, and further normalization (to "full" 5NF), and explained that they are necessary only to repair poor designs -- if you (1) develop a complete conceptual model and (2) formalize it properly using the RDM, (3) adhering to the three core principles of database design, you should end up with a relational database in both 1NF and 5NF.

Here we apply this knowledge to the typical request for "normalization" help we presented in Part 1.

Friday, May 31, 2019

Normalization and Further Normalization Part 1: Databases Representing ... What?




Note: This is a re-write of older posts (which now link here), to bring them into line with the McGoveran formalization, re-interpretation, and extension[1] of Codd's RDM.
“A particular bug-bear and a mistake that +90% of "data modelers" make, is analyzing "point in time" views of the business data and "normalizing" those values hence failing to consider change over time and the need to reproduce historic viewpoints. Let’s say we start with this list of data-items for a Sales-Invoice (completely omitting details of what’s been sold):
SALES-INVOICE
 {Invoice-Date,
  Customer-Account-ID,
  Customer Name,
  Invoice-Address-Line-1,
  Invoice-Address-Line-2,
  Invoice-Address-Line-3,
  Invoice-Address-Line-4,
  Invoice-Address-Postcode,
  Net-Amount,
  VAT,
  Total-Amount
 };
Nearly every time, through the blind application of normalization we get this ... there’s even a term for it -- it’s called "over-normalization":
SALES-INVOICE
 {Invoice-Date,
  Customer-Account-Id
   REFERENCES Customer-Account,
  Net-Amount,
  VAT,
  Total-Amount
 };

CUSTOMER-ACCOUNT
 {Customer-Account-Id,
  Customer-Name,
  Invoice-Address
   REFERENCES Address
 };

ADDRESS
 {Address-Line-1,
  Address-Line-2,
  Address-Line-3,
  Address-Line-4,
  Postcode
 };”
A measure of scarcity of foundation knowledge in the industry are the attempts to correct a plethora of common misconceptions[2] that suffer from the very misconceptions they aim to correct. One of the most common fallacies is confusion of levels of representation[3] that takes two forms[4]. We have written extensively about the logical-physical confusion (LPC)[5,6,7,8] underlying "denormalization for performance"[9], and the conceptual-logical conflation (CLC) that lumps conceptual with data modeling[10,11,12], inhibiting understanding that the latter is formalization of the former. 

Saturday, May 11, 2019

Understanding Data Modeling Part 5: Conclusions



In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the "data modeling" mess in the industry that Friesendal has tried to catalog, and argued that it can help overcome it. In Part 2 we applied this knowledge to the first two industry "data models" considered by Friesendal -- the E/RM and RDM. In Part 3, we applied it to OO/UML and (yet a formally undefined) GDM, and in Part 4 to Fact Modeling (FM).

Here we apply it to Friesendal's conclusions.

Saturday, April 20, 2019

Understanding Data Modeling Part 2: "E/RM" and "RDM"




In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the industry's modeling mess that Friesendal has tried to map. We now proceed to apply it to the various industry "data models" considered by Friesendal, and his understanding thereof. In this part, we apply this knowledge to the first two industry "data models" considered by Friesendal -- the E/RM and RDM.


"Entity-Relationship Model"


“One of the first formal attempts at a framework for Data Modeling was the Entity-Relationship data model paradigm proposed [in 1976] by Peter Chen. Notice that in the original Chen-style, the attributes are somewhat independent and the relationships between entities are named and carry cardinalities ("how many" participants in each end of the relationship) ... Attributes are related to their "owner" entity" in what other people called "functional dependencies".”

Sunday, December 2, 2018

What Is a Data Model, and What It Is Not




“The term data model is used in two distinct but closely related senses. Sometimes it refers to an abstract formalization of the objects and relationships found in a particular application domain, for example the customers, products, and orders found in a manufacturing organization. At other times it refers to a set of concepts used in defining such formalizations: for example concepts such as entities, attributes, relations, or tables. So the "data model" of a banking application may be defined using the entity-relationship "data model". This article uses the term in both senses.”
--Data Model, Wikipedia

What a True Data Model Is


Few practitioners realize that Codd invented the Relational Data Model (RDM) as the first exemplar of a data model, a concept that he formalized in 1980 as follows:


Sunday, November 25, 2018

Data and Meaning Part 1: The RDM Is Applied Theory




“Fabian - With respect, maybe it's time to' shake the formal foundations' of data management, especially given the rising costs and increasing segregation of silos.”
“John, if I were to say what I really think, I would be accused of insulting, so I won't. You don't need to respect me, but you better respect formal foundations. Since they are what gives SOUNDNESS to data management practice, what you are really saying is that you don't care about soundness -- do you really intend to take this position? I would not be surprised, because the industry has long "shook" the formal foundations and lack of soundness is precisely what characterizes it. But because there is no longer proper education, practitioners are totally unaware of the relationship between formal foundations and soundness, everything is ad-hoc and arbitrary, yet they fail to recognize the consequences.”[1]
--LinkedIn.com
Thus an exchange with John Gorman on LinkedIn, in which he posed several questions (that I answered in the last week's post[2]), the subject being the importance of not confusing levels of representation, and, more specifically, avoiding conceptual-logical conflation (CLC)[3].

Somebody posted a link to my answers on Linkedin and in a comment on it John linked to a Richard Feynman YouTube lecture on "the general differences between the interests and customs of the mathematicians and the physicists". To which I responded that my very point is that, just like physics is not the mathematics used to describe it (a central issue in quantum mechanics), conceptual modeling is not data modeling, the latter is the representation of the former in the database -- they are distinct[2]. This brought to mind some older columns I published on the All Analytics website that no longer exists, so this series is a revision thereof.

Saturday, November 10, 2018

Conceptual Modeling Is Not Data Modeling



“Ok, now that we have those two (Parts 3 and 4 of your series) 'on the table' so to speak, perhaps you would address these questions...
1. Would it be safe to say that facts expressed in a Conceptual model should be verifiable in reality?

2. Are the following facts logically equivalent or are they different:

a) The car with license number 62-JZK-6 has the color aquamarine blue
b) De auto met kenteken 62-JZK-6 heeft de kleur aquamarijnblauw

3. If a previously true fact is found in reality to be verifiably false, would that mean the Conceptual model is wrong or the Logical model, or reality?”

“I'm going to add another:

4. How does RDM handle temporal changes to the 'truth' of statement 2a) when:

a) The owner of the car paints it black.
b) The owner of the license plate legally transfers it to a truck.
c) The owner of the car replaces every single part except the chassis.”

John O'Gorman asked me these questions in a LinkedIn exchange[1] in response to my comments in another exchange on modeling[2], where I alerted to the confusion of levels of representation common in the industry, particularly conceptual-logical conflation(CLC)[3]: calling conceptual modeling data modeling both reflects and induces it.

Online exchanges are not a proper vehicle for learning, particularly foundation knowledge. Which is why I publish free blog posts, and papers and books, to which to refer interested serious data professionals. It just so happened that my just posted four-part series covers the subject at hand[4], so I referred to it, as well as other writings (the answers are already there if one cares to read them). I will not discuss the whole exchanges -- read them and judge for yourself -- but I promised to answer the questions here, where I can do them justice.

John raises primarily conceptual, not data model issues -- the latter are subservient to decisions in the former -- but then asks "how does RDM handle..." From experience, I recognize implicit doubts that the RDM can. As far as we know there is no formal data model[5] that is a superior alternative to the RDM with respect to "handling" conceptual issues (in fact, there is no other formal data model -- i.e., that satisfies Codd's definition -- period).


Since most of the issues involved are covered by McGoveran's work in progress[6] (in which my multi-part series is rooted), to ensure consistency with it I passed the questions by him. As he too pointed out, "Answers that work in all situations require highly complicated discussions and lots of time, and trying to teach someone without proper experience and educational background would be very cumbersome, or an oversimplication via online exchanges." 


Here's what's possible within the constraints of a blog post -- the serious reader is referred to our writings.

Sunday, October 28, 2018

Understanding Conceptual vs. Data Modeling Part 3: Don't Conflate Reality and Data




In Part 1 and Part 2  we explained that between 1975-81, when the E/RM and RDM were introduced, there was no distinction between an informal conceptual and a formal logical level. In 1980, however, Codd defined a formal data model and in the later 80s the conceptual-logical-physical levels of representation emerged. If applied to the two models:

  • Only the RDM satisfies the definition;
  • The E/RM can be used at the conceptual level to model reality, the latter can be used to model data at the logical level (i.e., formalize conceptual models as logical models for database representation).
Current practitioners, however, continue to confuse levels of representation and confuse/conflate types of model. So much so, that in my presentations I used to draw an imaginary line dividing the room into two sections, and move to the right section to discuss one level/model, and to the left section to discuss another.

Consider the question "does data modeling slow down an application development process?". I will set aside the notion of "speeding up" application development by skipping altogether "data modeling" (whichever way it is meant), and focus on the response.

Saturday, September 29, 2018

Understanding Conceptual vs.Data Modeling Part 2: E/RM Models Reality, RDM Models Data




Re-write 10/17/18
Revised 11/1/18

In Part 1 we explained that when the RDM and the E/RM were introduced, the distinct conceptual-logical-physical levels of representation had not yet emerged, and a data model had not yet been formally defined. But in 1980 Codd defined a formal data model as a combination of (1) data structures, (2) integrity constraints, and (3) operators on the structures[1], and later on the three-fold trinity of levels came into being. Given a conceptual level distinct from the logical, do the RDM and the E/RM satisfy the definition -- are they data models in today's terms?

Recall from Part 1 that the RDM has all three components and is defined in purely logical terms, so it is a data model. But the E/RM definition intermingles conceptual and logical terminology, and therefore is not consistent with two distinct levels. Moreover, as a data model E/RM is incomplete:

“The E/RM is not a data model as formally defined by Codd: no explicit structural component except sets classified in various ways, no explicit manipulative component except implied set operations, and very limited integrity (keys).”
--David McGoveran
Contrary to claims, Date does not exactly say that the E/RM is a data model:
“[It] is not even clear that the E/R "model" is truly a data model at all, at least in the sense in which we have been using that term in this book so far (i.e., as a formal system involving structural, integrity, and manipulative aspects). Certainly the term "E/R modeling" is usually taken to mean the process of deciding the structure (only) of the database, although [it does deal with] certain integrity aspects also, mostly having to do with keys ... However, a charitable reading of [Chen's original E/RM paper] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the relational model (it is certainly not a candidate for replacing the relational model, as some have suggested).”[2]
Note that even if, charitably, the E/RM is considered a data model, it is not up to the RDM.

Wednesday, November 8, 2017

Understanding Conceptual vs. Data Modeling Part 1: Data Model - The RDM Is, the E/RM Isn't



Re-write 10/16/18
“E/RM is a data model -- So says Date, Chen, etc. So says the majority of current industry experts ... With very strong references to Codd (who he worked with), Date elegantly explains the differences between RM and E/RM -- but clearly believes both are data models (even allowing for the charitable comment). If we take a RDB as the ultimate target implementation of data, and an E/RM (or extended) can correctly design all the artifacts that are implemented, this means it is modeling the data. Granted, an E/RM does not explicitly model some of the non-structural aspects of the original Codd definition.”

“Out of interest, is there a common Relational Modeling tool, that is not also an E/RM tool and models the full Codd definition? There are also several other methods of modeling data -- E/RM is more a mechanism to represent the data. If E/RMs are used by IT professionals across the world to direct the design and build of the majority of applications guided by standard methodologies, is the view of this argument that these were all build wrongly? Regardless of success? Is the inferred conclusion that only the RM models data, and ERM, [or] any other techniques do not? [If so] that is a little limiting.”

Objects, Properties, and Ontological Commitment


We are culturally and linguistically conditioned to conceptualize the world as objects with properties. Objects in a universe thereof that share common properties are of the same type and form a class, distinguishing them from objects that are not and do not. Applying a class definition to the universe  selects out the group of objects of that type from the universe.

Philosophical ontology is the study of being, existence, reality, as well as the basic categories of being, and their relationships -- what entities exist or may be said to exist, and how they may be grouped, related, and subdivided according to similarities and differences. 

Note: 'Object' is used in the general, not OO sense. Ontology, as used herein, should not be confused with "computer science ontology", whereby the term ontology was usurped, and is understood by programmers as meaning a conceptual graph of directed semantic relationships among objects (and only sometimes among object types).

Conceptual modeling (1) identifies types of objects of interest, and (2) formulates business rules (BR) that specify their properties and relationships and, as such, makes an ontological commitment. Any approach to conceptual modeling must consider the ontological commitment upon which it is based, which has major implications for the data model used to formalize conceptual models as logical models for computable database representation -- it must be consistent with that commitment.

Unfortunately, due to lack of foundation knowledge in the industry[1], practitioners -- both vendors and users -- are largely unaware of, and oblivious to ontological underpinning and their implications for database technology and practice, one reason why they not only stagnated, but regressed in the last five decades. In this multipart series we explain the important distinction between conceptual, and data modeling (aka logical database design), which requires a formal data model. The E/RM is not, and while it can be used for conceptual modeling of reality, not data, we outline a new conceptual modeling approach that makes a different ontological commitment and requires adjustments to the RDM, both necessary for genuine progress.



Sunday, August 27, 2017

Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness



Note: This is a 8/24/17 rewrite of a 5/20/13 post to bring it in line with McGoveran's formal exposition of Codd's RDM [1] and its correct interpretation.

08/25/17: I have added formal definitions of logical validity and semantic correctness. 
09/01/17: Minor revisions. 
09/02/17: Added references.
03/15/18: Minor revisions.


Here's what's wrong with last week's picture, namely:
"In my experience, using an object model in both the application layer and in the database layer results in an inefficient system. This are my personal design goals:
- Use a relational data model for storage
- Design the database tables using relational rules including 3rd normal form
- Tables should mirror logical objects, but any object may encompass multiple tables
- Application objects, whether you are using an OO language or a traditional language using structured programming techniques should parallel application needs which most closely correspond to individual SQL statements than to tables or "objects". --LinkedIn.com

Sunday, July 10, 2016

Levels of Representation: Relationships, Rules, Relations and Constraints




What's Wrong with Last Week's Picture (Question about relational model )

There are relationships at both the conceptual and logical representation levels. Confusing them is bad conceptual modeling and database design.

Relationships, Rules and Relations

AT: "In my personal understanding, a relation is defined as a set of tuples. Then ... "in the relational model every relation represents a relationship". And then a quote from Chen: "each tuple of entities ... is a relationship". If I use the first and the second statements - I can say that a relationship is a set of tuples. The third statement says that a relationship is a tuple. So far, is a relationship a set of an element of a set? (Or may be a set of sets?)".
A relation is a set of tuples that is a subset of the Cartesian product of the domains, i.e., it is a relationship of domains at the logical level (Chen may have been misquoted: a tuple is a set of attribute values (i.e., a relationship thereof, that represents a fact about a single entity).

Sunday, June 12, 2016

Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation



From last week:

What's wrong with this picture? (Kinds of Data Models, LinkedIn.com)

David Hay: "Part of the ... confusion as to what exactly was meant by “data modeling”--conceptual, logical or physical--is that most data modeling activities seem to focus on achieving good relational database designs ... my approach is the portrayal of the underlying structure of an enterprise’s data--without regard for any technology that might be used to manage it ... a “conceptual data model” ... that represents the business."
Nothing raises uncertainty whether to laugh or cry better than attempts to dispel confusion which suffer from the very confusion they purport to dispel.

Sunday, January 17, 2016

Conceptual (Business) Modeling, Logical Database Design and Physical Implementation



A serious problem in the database field is not just that many data professionals do not know and understand the RDM, but also that they believe they do and criticize it. It is relatively easy to detect such critics. In "Recognizing and Treating Tableitis" Gordon Everest tries to be humorous, but it ends up more sad than funny.

View My Stats