Follow @DBDebunk
Follow @ThePostWest
Note: This two part series is a rewrite of of an older post (which now links here), to bring it into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
(Continued from Part 1)
In Part 1 I how several data practitioners failed to pinpoint the relational violation by a a conditional uniqueness constraint that should have been obvious with foundation knowledge. The closest one came was "more than one kind of business entity here [that] share the same properties (not attributes)", but still missed the implications.
Saturday, April 7, 2018
Tuesday, March 27, 2018
Name the Relational Violation Part 1: Conditional Uniqueness Constraint
Follow @DBDebunk
Follow @ThePostWest
Note: This is a rewrite of of an older post (which now links here), to bring it into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
The heart is in the right place, but the violation is obvious. Yet, the several responses did not pinpoint it.
Note: This is a rewrite of of an older post (which now links here), to bring it into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
"I'm seeing more [data] professionals implementing the following type of conditional unique constraints, typically related to the use of 'soft deletes'):TABLEX (SK,A,B,C,ACT_FLAG)Uniqueness is defined for (A,B) iff ACT_FLAG='Yes'. Any row with a ACT_FLAG = 'No' is excluded from the unique requirement."
"Most SQL DB implementations I know of do not allow this type of constraint to be enforced declaratively. Instead it relies on tricks within the index specification for enforcement."
"This conditional application of unique constraints troubles me. The prevalent use of a surrogate primary key avoids duplicates in the table as a whole. But this approach seems to declare a business rule that can be turned off and on based on the value of a non-key column. It definitely feels there is more than one kind of business entity here -- but they do share the same attributes. I also see a similar design pattern when OO classes are mapped to a table during implementation (when the implementation approach is to combine classes into a single table and an attempt is made to enforce two types of 'uniqueness'."
"However, I cannot definitively find a specific rule/guideline within relational theory that it violates. I know there are design alternatives. The fact that there is no true always-on business key other than the surrogate key IS an issue, but the fact that technically the surrogate PK prevents duplicates is almost always presented as a counter argument."
The heart is in the right place, but the violation is obvious. Yet, the several responses did not pinpoint it.
Saturday, March 24, 2018
Data Modeling and NoSQL
Follow @DBDebunk
Follow @ThePostWest
Revised 3/25/18
Revised 3/25/18
I have written extensively on the three levels of representation and four types of model and I won't repeat it here -- readers can refresh their memory if necessary[1,2]. Everest's comments are at best ambiguous with respect to the levels and models (e.g., by data modeling he means business modeling, and his "business data model" lumps together business model and data model). It is to avoid such ambiguities and the resulting confusion that I recommend the three-fold terminology of conceptual modeling, logical database design and physical implementation, eschewing data modeling[3]. Here I will rely on my earlier writings to address strictly the issue of data modeling in the NoSQL context raised by Everest."To the question How relevant is data modeling in the world of NoSQL? I give the following answer.
The main purpose of data modeling is to understand the business, some application domain, some users world. The model becomes a representation of that world -- the "things" in it, the relationships among those things and any constraints on those things or relationships. A secondary purpose is to build a database to contain information which pertains to and describes that domain."
"Generally we speak of the model coming first, then the implementation, and finally, the data gets collected and stored according to the model. Hence, the business data model should not be concerned with issues of physical stored representation, or the transformations/manipulations/constraints which are imposed to facilitate implementation in some data (storage) management system. That could be a relational DBMS, or a NoSQL tool".
" ... increasingly the data already exists in some form. Which leaves us with the task of figuring out what it means, what it represents -- that is, understanding the data as it represents some user domain. NoSQL tools are often designed to deal with existing data and to process it more efficiently (that may be an oversimplification!). Either way, you must understand the business in order to make sense of the data."
--Gordon Everest, LinkedIn.com
Saturday, March 17, 2018
Physical Independence Part 2: Logical-physical Confusion
Follow @DBDebunk
Follow @ThePostWest
Note: This is a rewrite of older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
Revised 3/17/18
(Continued from Part 1)
This is the second part of my response today to an old DBDebunk query:
Note: This is a rewrite of older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
Revised 3/17/18
(Continued from Part 1)
This is the second part of my response today to an old DBDebunk query:
"You constantly remind us that the relational model is a logical model having no connection to any physical model (so I infer). You also indicate how no commercial product fully implements the relational model. Therefore, how do we make use of the relational model when dealing with the physical constructs of a commercial database program (Oracle, Access, DB2, etc.)?" --DBDebunk.comIn Part 1 I explained physical independence (PI) and claimed that the industry has failed to internalize its importance. Here I provide evidence to that effect and discuss some consequences.
Monday, March 5, 2018
Physical Independence Part 1: Don't Mix Model with Implementation
Follow @DBDebunk
Follow @ThePostWest
Note: This is a rewrite of several older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
Note: This is a rewrite of several older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
"You constantly remind us that the relational model is a logical model having no connection to any physical model (so I infer). You also indicate how no commercial product fully implements the relational model. Therefore, how do we make use of the relational model when dealing with the physical constructs of a commercial database program (Oracle, Access, DB2, etc.)?" --DBDebunk.com query
Monday, February 26, 2018
Sunday, February 18, 2018
This Week
Follow @DBDebunk
Follow @ThePostWest
1. Database Truth
"The RDM is a formal system. It has two parts. Semantics its outside the formal language (which is Deductive Subsystem), but not outside the interpretation (i.e., application) of that language (Interpretation Subsystem). Without an Interpretation Subsystem there is no possibility of applying the formal system and it remains an abstract game of symbols."--David McGoveran
"Codd's 1979 paper described a way to "capture" semantics using the relational formalism. That formalism doesn't tell you how to discover semantics, but if you have them, then he showed (at least to some degree) how to express those semantics relationally."
"Semantics is about applying the RDM to some subject. In effect, what you do is restrict the power of the abstract formalism so that it is more closely aligned with your intended use. In my terminology, that means you:
- Create axioms (expressed as constraints), limiting the vocabulary to the subject matter (and making it finite and usually fairly small); and,
- Restrict the possible interpretations that can be used consistently with the resulting subset of the formalism."
2. Do You Know What's Wrong With This Picture?
"When someone refers to a relation in a database course, what does that mean?"--What is a relation in database terminology?, StackOverflow.com
"It means that it is time to go to Wikipedia."
"A relation in the context of modeling a problem will include the fields and possibly the identification of fields which have relationships with other relations."
"A relation is a data structure which consists of a heading and an unordered set of tuples which share the same type."
"I was trying to show that, in SQL, a relation is more than just a table. Queries return relations. And within a query, relational math is happening, with many intermediate results, that themselves are relations."
"A relation is an abstract structure which contains a set of attributes, and a relvar is the dataset status in a particular moment of this relation. The first one can be considered as the table definition with columns, and the second one is dataset in this table."
"Tuples need not have a key (or any way of locating them?) Having tried to answer this question so that I could explain it to my students, I am forced to the conclusion that the theory has nothing whatsoever to do with "data" in the usual sense. Perhaps information Theory would have been a better basis for data systems, rather than Mathematics? Computer applications are notably different from Physics, which math was created to model, and its child, Engineering. I think data was never intended to be "true", it must be useful."
Subscribe to:
Posts (Atom)