DATABASE DEBUNKINGS: Pred

Showing posts with label Pred. Show all posts

Monday, February 5, 2024

METALOGICAL PROPERTIES Part 2: Assertion Predicate

In Part 1 we introduced in the conceptual model (CM) the metalogical designation property. It represents—in the absence of known shared defining properties of an entity type, the designation by a group's definer that an entity identifier (aka assigned name) or property value is a member of the group. Such a group is not a group of entities, but a group of name and property values. In the logical model (LM), it is formalized as a designation predicate (DP) and defines a domain.

In Part 2, we introduce the metalogical assertion property. It represents the assertion by an authorized database user that a specific entity, represented by a tuple, either does or does not correspond to an actual entity in the real world.

METALOGICAL PROPERTIES PART 1: Designation Property

Follow @DBDebunk Follow @ThePostWest

with David McGovern

One purpose of our contributions here is to suggest a vocabulary that avoids confusion not just within the formal logical level, but also between conceptual and logical terminologies, which is widespread in the industry and is exacerbated by limitations of natural language (NL). We use the following terminology in our approach to conceptual modeling:

Objects are:

- Primitive (basic entities);

- Compound:

- groups of related entities;

- multigroups (groups of related groups);

Properties are:

- Individual (of basic entities);

- Collective:

- Of groups: relationships among entities within a group;

- Of multigroups: relationships among groups within a multigroup.

Note: It is a McGoveran insight that relationships between objects at a lower aggregate level are properties of the object at the higher aggregate level which the former comprise (LOGIC FOR SERIOUS DATABASE FOLK, forthcoming; see draft chapters) http://www.alternativetech.com/ATpubs_dir.html For classification of properties as first, second, third and fourth order (1OP, 2OP, 3OP and 4OP) see RELATIONSHIPS AND THE RDM Parts 1-3. https://www.dbdebunk.com/2023/03/relationships-and-rdm-v2-part-1.html All such properties can be expressed logically in a FOPL-based relational data sublanguage as constraints, which is beyond the scope of this discussion.

ENTITIES, PROPERTIES AND CODD'S SLEIGHT OF HAND

Follow @DBDebunk Follow @ThePostWest

Note: This a revision of an earlier post

RDM is an application to database management of mathematical relation theory (MRT) consistent with simple set theory (SST) expressible in first order predicate logic (SST/FOPL) that is used to formalize symbolically conceptual models of reality as logical models for database representation.

In RDM a domain can "appear" in multiple relations: the domain represents an abstract property, attributes defined on it represent that property in contexts of specific entity groups that relations represent. For example, attribute SALARY in relation EMPLOYEES represents the property represented by domain MONEY in the context of entities of type Employee and attribute BUDGET in relation DEPARTMENTS represents it in the context of entities of type Department.

PREDICATE LOGIC, SEMANTICS AND RDM (sms)

Follow @DBDebunk Follow @ThePostWest

Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“As I have said many times, if the original relational model had been based on predicate logic and also the semantics and rules of definitions we'd all be better off now. It wasn't. Full stop.”
--Ronald Ross, LinkedIn.com

Assessing such arguments normally requires clarification of what exactly is meant by "the relational model". Ross does refer specifically to the "original" -- which we take to mean that introduced by Codd in 1969-70 -- but given the massive misuse and abuse in the industry, perceptions of it may well be corrupted (Nobody Understands the Relational Model Semantics, Relational Closure and Database Correctness). Moreover, there are many predicate logic (PL) systems and many ways of categorizing them (1st vs n-th order being only one way) -- we assume Ross means RDM is based on none.

DATABASE RELATIONS, TABLES AND SEMANTIC CONSISTENCY

Follow @DBDebunk Follow @ThePostWest

by David McGoveran with Fabian Pascal

Note: In "Setting Matters Straight" posts I debunk online Q&As that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“In a RDBMS, a table is columned rows, as in you treat individual rows as an actual entity while the columns are its attributes. In an excel tab, you can create a column, but it doesn't have to have all the same data types in that column, nor does one row have to represent one entity. It's more free form ... All in all, RDB is relational because it's column based rows and constrained to that format, while non relational can have free form like an excel. When you have rows that are uniform (constrained to what the column should be), you create entities as tables, and link them through columns to keep track of the relationships.”
--Quora.com

I posted this on LinkedIn as one of my "To Laugh or Cry?" items which, unlike "Setting Matters Right" posts, are beyond debunking. But the exchange that followed made me realize that there is, nevertheless, pedagogical value to it: it expresses something important, but poorly due to author's lack of foundation knowledge.

Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 2

Follow @DBDebunk Follow @ThePostWest

with David McGoveran

(Title inspired by Richard Feynman)

In Part 1 we explained that all database relations are, mathematically, relations, but not all relations are database relations, which are in both 1NF and 5NF and we agreed with a statement in a LinkedIn discussion ending as follows: "Update anomalies are not as big of a problem as an algebra where relations aren't closed under join". Unfortunately, update anomalies, closure, and how relational operators were defined are all interrelated and represent an even "bigger problem". Update anomalies are not "bugs", let alone irrelevant, but actually a reflection of that much bigger problem.

In this second part we delve into that problem.

Understanding Relational Constraints

Follow @DBDebunk Follow @ThePostWest

“The data in a relational database is stored in form of a table. A table makes the data look organized. Yet in some cases we might face issues while working with the data like repetition. We might want enforce rules on the data to avoid such technical problems. Theses rules are called constraints. A constraint can be defined as a rule that has to enforced on the data to avoid faults. There are three kinds of constraints: entity, referential and semantic constraints. Listed below are the differences between these three constraints:
1. Entity constraints -- primary key, foreign key, unique, NULL -- are posed within a table and used to enforce uniqueness and to define no value [respectively].
2. Referential constraints -- foreign key -- are enforced with more than one table for referring other tables for analysis of the data.
3. Semantic constraints -- datatypes -- are enforced in a table on the values of a specific attribute and help the data segregate according to its type. Example: name varchar2(30).”
--GeeksforGeeks.com

Before we tackle the main subject, let's get some misconceptions out of the way. As we have explained so many times:

Data is not "stored in a form of a table" -- it can be stored in any number of physical formats, at the discretion of DBMS designers and DBAs. Physical independence is a core advantage of the RDM.
A table does not "make the data look organized". Data is by definition organized -- be it relationally or not -- otherwise it would be random noise not data. A database relation can be visualized as a R-table, but tables do not play any role in RDM.
While some "repetition" (i.e., redundancy) is prevented by constraints (e.g., uniqueness), others are avoided by database design (e.g., 5NF DB relations).

And now to constraints.

TYFK: Facts, Properties, Relationships, Domains, Relations, Tuples

Follow @DBDebunk Follow @ThePostWest

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

A statement from a 1986 book that "Data are facts represented by values -- numbers, character strings, or symbols -- which carry meaning in a certain context" triggered the following response on Linkedin:

“...In contrast, Date and Darwen (2000) say:
Domains are the things that we can talk about.
Relations are the truths we utter about those things.
Thus, the declarative sentence "Fred is in the kitchen." is a fact that links the domains Person[s] and Place[s] with the predicate "is in". The complete relation might be made up of three facts:
Fred is in the kitchen.
Mary is in the garden.
Arthur is in the garden.
This seems to be more precise than the 1986 statement.”

To which the book author responded:

“...back then we did not have the refinement, clarity, nor precision from people like Sjir Nijssen and Terry Halpin regarding facts, or elementary fact sentences, which today you and I know are the bedrock of data modeling. Facts are expressed in sentences (with domains and predicates).”

Unfortunately none of this is sufficiently clear and precise to prevent confusion and it inhibits understanding of the RDM.

RE-WRITE

Follow @DBDebunk Follow @ThePostWest

See: https://www.dbdebunk.com/2023/08/entities-properties-and-codds-sleight.html

Friday, November 27, 2020

OBG: Missing Data -- "Horizontal Decomposition" Part 2

Follow @DBDebunk Follow @ThePostWest

Note: To demonstrate the correctness and stability of a sound foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may break long pieces into multiple posts, revise, and/or add comments and references.

In Part 1 we re-published a reader's response to "horizontal decomposition" -- Hugh Darwen's How to Handle Missing Information without Using NULLs -- in comparison to our The Final NULL in the Coffin: A Relational Solution to Missing Data). Here's Hugh's response.

TYFK: Misconceptions About the Relational Model

Follow @DBDebunk Follow @ThePostWest

“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:

Data is stored in tables called relations.

Relations can be normalized, [in which case] values saved are atomic values.

Each row in a relation contains a unique value.

Each column in a relation contains values from a same domain.”

Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

TYFK: What Is A Database Relationship?

Follow @DBDebunk Follow @ThePostWest

Note: This is a re-write of an earlier post. About TYFK posts (Test Your Foundation Knowledge) see the post insert below.

“Here two or more table[s] are related with each other. This is Database relationship. Database relationship is used a lot ... [in] relational database management systems ... shortly called RDBMS. Here is Join_data [sic] table and Interview_data table. For creating a relational database management system both of the table[s] must have a common field. Here Employee_ID is a common field ... Database relationship types: One-To-One relation, One-To-many relation, Many-to-many relation. Minimum one common field is essential in all the tables. The data type of common field and field size will be same in all the tables.”

First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Fourth Order Properties Part 1: Association Relations vs. Foreign Keys

Follow @DBDebunk Follow @ThePostWest

“We have Building, Room, and Bed entities. Logically, if this is in the scope of some hypothetical hotel, then each one of those entities is dependent on their parent to exist ... you cannot have a bed without a room. Also, that room wouldn't exist without its parent, Building. So, why have I rarely seen this identifying relationship introduced? When I was learning databases, everything was apparently "non-identifying". When is this type of relationship necessary, if at all? I see the issue arises when that BED can exist without a BUILDING. If you were to INSERT into the BED table, you are constraint [sic] to provide a building_id, as the building_id is part of that BED's primary key. Couldn't you avoid an identifying relationship by giving each table its own surrogate primary key? Is this the correct representation of an identifying relationship? I could avoid that by just giving each table its own ID. At the end of the day, this is about IDENTIFYING relationships, not their existence, which is how I've been logically determining if something is an "identifying relationship" If that were the case, then any 1:N relationship could be "identifying" but that's not how you define identifying or non-identifying.”

“Interesting -- I’d never heard this term before. I’ve hears it referred to as a cached ID though, as that 2nd ID isn’t required, but may be beneficial for performance purposes. For this example with 3 levels it’s not a huge joint statement, but for some systems with 12 tables the joins get unpleasant. I’ve never started a system with this additional id, but I have added one later on once the need was there and the profiling led to this being the best solution for our specific situation. Usually though, just creating a view that does the joins for me has been easier. I’ll be curious what has led others to use this approach.”

“It's not really introduced because it's way more towards academic than functional.”

--Reddit.com

Such questions, and ad-hoc terms like "identifying relationships"[1] come up because practice is driven by intuition and experience (if any), without the benefit of foundation knowledge[2]. Whether practitioners know/like it or not, a database is a formal computable representation of an informal conceptual model[3] and, therefore, data modeling (i.e., logical database design)[4] is impossible without (1) a well-defined and complete conceptual model and (2) a formal data model with which to formalize it as a logical model[5]and the two should not be confused[6]. Otherwise all bets are off.

Here's how foundation knowledge should have informed modeling and design.

Understanding Conceptual vs. Data Modeling Part 4: Properties-object Modeling

Follow @DBDebunk Follow @ThePostWest

Revised 6/26/19.

In Part 1 and Part 2 we explained that when the RDM (1969-70) and the E/RM (1976) were introduced, there was no distinction between a conceptual and a logical level -- the conceptual-logical-physical distinction of levels of representation emerged in mid 80s. Only in 1980 did Codd specify three components of a formal data model -- structure, integrity, manipulation. While the RDM satisfies the specification, the E/RM does not: it is a conceptual modeling approach, weaknesses of which have been elaborated elsewhere[1]. In Part 3 we presented a common example of conceptual-logical conflation (CLC), and corresponding confusion of types of model (conceptual, logical, physical, and data).

As promised, here we outline a new conceptual modeling approach derived by David McGoveran from his work formalizing Codd's RDM. It makes an ontological commitment different from that by conventional modeling, which requires revision and extension of the RDM -- an objective of David's effort.

Understanding Conceptual vs. Data Modeling Part 3: Don't Conflate Reality and Data

Follow @DBDebunk Follow @ThePostWest

In Part 1 and Part 2 we explained that between 1975-81, when the E/RM and RDM were introduced, there was no distinction between an informal conceptual and a formal logical level. In 1980, however, Codd defined a formal data model and in the later 80s the conceptual-logical-physical levels of representation emerged. If applied to the two models:

Only the RDM satisfies the definition;
The E/RM can be used at the conceptual level to model reality, the latter can be used to model data at the logical level (i.e., formalize conceptual models as logical models for database representation).

Current practitioners, however, continue to confuse levels of representation and confuse/conflate types of model. So much so, that in my presentations I used to draw an imaginary line dividing the room into two sections, and move to the right section to discuss one level/model, and to the left section to discuss another.

Consider the question "does data modeling slow down an application development process?". I will set aside the notion of "speeding up" application development by skipping altogether "data modeling" (whichever way it is meant), and focus on the response.

Understanding Conceptual vs.Data Modeling Part 2: E/RM Models Reality, RDM Models Data

Follow @DBDebunk Follow @ThePostWest

Re-write 10/17/18
Revised 11/1/18

In Part 1 we explained that when the RDM and the E/RM were introduced, the distinct conceptual-logical-physical levels of representation had not yet emerged, and a data model had not yet been formally defined. But in 1980 Codd defined a formal data model as a combination of (1) data structures, (2) integrity constraints, and (3) operators on the structures[1], and later on the three-fold trinity of levels came into being. Given a conceptual level distinct from the logical, do the RDM and the E/RM satisfy the definition -- are they data models in today's terms?

Recall from Part 1 that the RDM has all three components and is defined in purely logical terms, so it is a data model. But the E/RM definition intermingles conceptual and logical terminology, and therefore is not consistent with two distinct levels. Moreover, as a data model E/RM is incomplete:

“The E/RM is not a data model as formally defined by Codd: no explicit structural component except sets classified in various ways, no explicit manipulative component except implied set operations, and very limited integrity (keys).”

--David McGoveran

Contrary to claims, Date does not exactly say that the E/RM is a data model:

“[It] is not even clear that the E/R "model" is truly a data model at all, at least in the sense in which we have been using that term in this book so far (i.e., as a formal system involving structural, integrity, and manipulative aspects). Certainly the term "E/R modeling" is usually taken to mean the process of deciding the structure (only) of the database, although [it does deal with] certain integrity aspects also, mostly having to do with keys ... However, a charitable reading of [Chen's original E/RM paper] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the relational model (it is certainly not a candidate for replacing the relational model, as some have suggested).”[2]

Note that even if, charitably, the E/RM is considered a data model, it is not up to the RDM.

RE-WRITE

Follow @DBDebunk Follow @ThePostWest

See: https://www.dbdebunk.com/2018/09/designation-property-and-assertion.html

Sunday, July 15, 2018

Understanding Relations Part 3: Debunking Conventional Wisdom

Follow @DBDebunk Follow @ThePostWest

(See Part 1 and Part 2)

“A common term used in database design is a "relational database" -- but a database relation is not the same thing and does not imply, as its name suggests, a relationship between tables. Rather, a database relation simply refers to an individual table in a relational database. In a relational database, the table is a relation because it stores the relation between data in its column-row format. The columns are the table's attributes, while the rows represent the data records. A single row is known as a tuple to database designers.”

“A relation, or table, in a relational database has certain properties.”

“First off, its name must be unique in the database, i.e. a database cannot contain multiple tables of the same name.”

“Next ... as with the table names, no attributes can have the same name.”

“Next, no tuple (or row) can be a duplicate. In practice, a database might actually contain duplicate rows, but there should be practices in place to avoid this, such as the use of unique primary keys (next up). Given that a tuple cannot be a duplicate, it follows that a relation must contain at least one attribute (or column) that identifies each tuple (or row) uniquely. This is usually the primary key. This primary key cannot be duplicated. This means that no tuple can have the same unique, primary key. The key cannot have a NULL value, which simply means that the value must be known.”

“Further, each cell, or field, must contain a single value. For example, you cannot enter something like "Tom Smith" and expect the database to understand that you have a first and last name; rather, the database will understand that the value of that cell is exactly what has been entered.”

“Finally, all attributes—or columns—must be of the same domain, meaning that they must have the same data type. You cannot mix a string and a number in a single cell.”

“All these properties, or constraints, serve to ensure data integrity, important to maintain the accuracy of data.” --Definition of Database Relation

It is easy to discern when explanations of relational features are not grounded in the formal foundations of the RDM[1], but in industry practices. Here are some further clarifications and corrections.

Understanding Relations Part 1: Tables? So What?

Follow @DBDebunk Follow @ThePostWest

Note: This is a re-write of two older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation of Codd's real RDM, including his own refinements, corrections, and extensions[1]

“Put simply, a "relation" is a table, the heading being the definition of the structure and the rows being the data.”

“In simple English: relation is data in tabular format with fixed number of columns and data type of each column. This can be a table, a view, a result of a subquery or a function etc.”

“Practically, a "Relation" in relational model can be considered as a "Table" in actual RDBMS products(Oracle, SQL Server, MySQL, etc), and "Tuples" in a relation can also be considered as "Rows" or "Records" in a table.”

“In common usage, however, when someone refers to a "relation" in a database course, they are referring to a tabular set of data either permanently stored in the database (a table) or derived from tables according to a mathematical description (a view or a query result).”

“In SQL RDBMSes (such as MS SQL Server and Oracle] tables are permently stored relations, where the column names defined in the data dictionary form the "heading" and the rows are the "tuples" of the relation. Then from a table, a query can return a different relation.”

“Data is stored in two-dimensional tables consisting of columns (fields) and rows (records). Multi-dimensional data is represented by a system of relationships among two-dimensional tables.”

“I read [that] "Relations are multidimensional. They are not flat. They are not two dimensional. Don't let the term table mislead you." on the back cover of CJ Date's DATABASE IN DEPTH. Can anyone help how to visualize this multidimensional nature of relations?”

Because SQL DBMSs have been sold as relational databases (which they are not), and in SQL the data structure is the table, in the absence of foundation knowledge[2] most practitioners think that relational databases consist of tables, but do not ask themselves why and how is that significant for database practice. The subtitle of this post is a question I used to ask in presentations years ago that always got silence. I see no evidence of improvement -- in fact, it's gotten worse. To emulate Feynman, "Nobody understands the RDM".

That such a simple and commonly understood structure can visualize relations is an advantage of the RDM, but a table is not a relation and, SQL notwithstanding, confusing the two reflects a lack of understanding of the RDM, misses its significance for database practice, and prevents taking full advantage of its benefits.

Note: The table is the preferred way to picture relations, there are others (e.g., array).

First, the fundamentals.

Name the Relational Violation Part 2: Self-defeating Constraint

Follow @DBDebunk Follow @ThePostWest

Note: This two part series is a rewrite of of an older post (which now links here), to bring it into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.

(Continued from Part 1)

In Part 1 I how several data practitioners failed to pinpoint the relational violation by a a conditional uniqueness constraint that should have been obvious with foundation knowledge. The closest one came was "more than one kind of business entity here [that] share the same properties (not attributes)", but still missed the implications.

POSTS

Monday, February 5, 2024

Tuesday, January 9, 2024

with David McGovern

Thursday, August 17, 2023

Monday, June 19, 2023

Thursday, August 4, 2022

by David McGoveran with Fabian Pascal

Thursday, November 11, 2021

Saturday, September 4, 2021

Thursday, August 5, 2021

Thursday, June 10, 2021

Friday, November 27, 2020

Sunday, June 28, 2020

Sunday, May 10, 2020

Saturday, March 2, 2019

Saturday, November 3, 2018

Sunday, October 28, 2018

Saturday, September 29, 2018

Tuesday, September 11, 2018

Sunday, July 15, 2018

Sunday, June 24, 2018

Saturday, April 7, 2018