Showing posts with label DM. Show all posts
Showing posts with label DM. Show all posts

Friday, October 13, 2023

EVERYBODY THINK THEY KNOW FIRST NORMAL FORM, BUT NOBODY DOES



“I have read this article in an effort to boost my academic knowledge on data modeling a bit and still have no idea what this academic author wanted to say. Apparently First Normal Form (1NF) doesn't get enough respect and then proceeds to talk about Non-First Normal Form (NFNF). But what about First Normal Form (1NF) damnit.”

By sheer chance this was posted on LinkedIn just after I published my new paper The First Normal Form: A Definitive Guide.

PRACTICAL DATABASE FOUNDATIONS

FIRST NORMAL FORM

A DEFINITIVE GUIDE

(September 2023)

Fabian Pascal

 

Table of Contents

 Introduction

1.      The Normal Form

2.      The First Normal Form

3.      Domain Decomposability & Atomicity

4.      1NF & Tables

5.      SQL & 1NF

5.1.     Repeating Groups & Repeated Attributes

5.2.   Information Principle & SQL
 

Sunday, April 30, 2023

RELATIONSHIPS AND THE RDM V2 Part 3: SEMANTIC CONSTRAINTS



Note: This is a multipart re-write of a previous series that, when completed, is intended to replace it.

In Part 1 we documented the differences between mathematical and database relations (see table in Part 1). We attributed the fallacy that the RDM can express only one type of relationship -- between relations using FKs -- to the industry being unaware of the adaptation of math relations for database management. We intimated that some of the additional features of database relations express relationships other than between relations.

In Part 2 we identified the intra-group c-relationships (and the corresponding within-relation l-relationships) in our approach to conceptual modeling:

  • Properties-entities relationships

- general dependencies

  • Properties Relationships
  • Entities Relationships

- entity uniqueness
- functional dependencies (FD)
- entity supertype-subtypes relationships

and used a simple conceptual model (CM) of six entity groups to illustrate them:

Customers (cID, cname, FICO, discount)
Products (pID, pname, price)
Salesmen (sID, sname, sales, salary, commission)
Orders (oID, pID, cID, sID, date, amount)
Order Items (oID, iID, pID, quantity)

Database design is the use of a data model (DM) (here, RDM) to formalize conceptual models (CM) -- including c-relationships -- as logical models (LM) for database representation, so it must be able to convert the business rules (BR) that express those relationships in specialized natural language at the conceptual level to formal constraints in a FOPL-based data sublanguage at the logical level.

Our intention is to demonstrate that the RDM can express all c-these relationships, but we face a difficulty.

Sunday, March 19, 2023

ON PROPERTIES IN CONCEPTUAL MODELING (rm)



Note: Reader mail (rm) posts are exchanges with my readers that raise fundamental issues. I may improve language for clarity and amplify with Ed. Notes for the benefit of readers.

“Your post Understanding Conceptual vs. Data Modeling Part 1: Data Model - The RDM Is, the E/RM Isn't is well done. However, concepts and relationships can be perceived and modeled without formulating or specifying properties. Chen did that in his ER diagrams. And informally, everyone does it as a mental model every day. I suppose anyone can define conceptual modeling however they wish to.  But at its minimum and most abstract, which is what conceptual modelling is usually understood to be, it can be done without formulating or specifying properties.” --GR

Saturday, February 4, 2023

CONCEPTUAL MODELING, LOGICAL DATABASE DESIGN AND PHYSICAL IMPLEMENTATION (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“A conceptual data model usually just includes the main concepts (entities) required to store information and the relationships that exist between these entities. We don’t usually include any details about each piece of information. We can consider the conceptual stage as an initial model, without all the details required to create a database.

A logical data model is probably the most-used data model. It goes beyond the conceptual model; it includes entities, relationships, details on entities’ different attributes, and unique ways to identify entities (primary keys) and establish the relationships between them (foreign keys).

A physical data model is usually derived from a logical data model for a particular relational database management system (RDBMS), thus taking into account all technology-specific details. One big difference between logical and physical data models is that we now need to use table and column names rather than specifying entity and attribute names. This allows us to adapt to the limits and conventions of the desired database engine. We also provide the actual data types and constraints that allows us to store the desired information.”
--Vertabelo.com

Sunday, January 22, 2023

CONCEPTUAL BUSINESS RULES AND LOGICAL CONSTRAINTS (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

What's right/wrong about this database picture?

“Other than constraints on cardinality, business rules are not generally represented on data models of either kind. Even in the case of business data models, the models are supposed to represent fundamental structures, while business rules represent variable constraints.”

                                                                    --TDan.com

Monday, January 2, 2023

 NEW "DATA MODELS" 5.2 (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

This is a re-published series of several DBDebunk 2002 posts on Simon Wlliams' Lazy Software so-called "Associative Model of Data" (AMD), academic claims of its superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

  • Part 1 was an email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that triggered the series.
  • Part 2 was my response to a reader's email questioning our dismissal of Williams's claims.
  • Part 3 was my email exchange with Williams where he provided his definition of a data model on which I conditioned any discussion with him and I debunked it.
  • Part 4 is my response to a reader's comments on my previous posts in the series.
  • Part 5.1 provided the background for my critique of Edward Hurley's report on Simon Williams's Lazy Software and his so-called "Associative Model of Data" (AMD).

Tuesday, December 13, 2022

NEW "DATA MODELS" 5.1 (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

 

This is a re-published series of several DBDebunk 2001 exchanges about Simon Wlliams' so-called "Associative Model of Data" (AMD), academic claims of its superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

Part 1 was an email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that started the series.
Part 2 was my response to a reader's email questioning our dismissal of Williams's claims.
Part 3 was my email exchange with Williams, where he provided his definition of a data model on which I conditioned any discussion with him and where I debunked it.
Part 4 is my response to a reader's comments on Parts 1-3.

Thursday, November 10, 2022

NEW "DATA MODELS" 4 (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

This is a re-published series of several DBDebunk 2001 exchanges on Simon Wlliams' so-called "Associative Model of Data" (AMD), academic claims of its superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

Part 1 was an email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that started the series. Part 2 was my response to a reader's email questioning our dismissal of Williams's claims. Part 3 was my email exchange with Williams where he provided his definition of a data model on which I conditioned any discussion with him and I debunked it. Part 4 is my response to a reader's comments on my previous posts in the series.

Saturday, October 29, 2022

NEW "DATA MODELS" 3 (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

This is a re-published series of several DBDebunk 2001 exchanges on Simon Wlliams' so-called "Associative Model of Data" (AMD), academic claims of its superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

Part 1 was the email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that started the series. Part 2 was my response to a reader's email questioning our dismissal of Williams's claims.  Part 3 is my email exchange with Williams: he provided his "definition" of a data model on which I conditioned any discussion with him and I proved my point by debunking it.

Sunday, October 16, 2022

NEW "DATA MODELS" 2 (t&n)



Note: "Then & Now" (t&n) is a new version of what used to be the "Oldies but Goodies" (obg) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

This is a re-published series of several DBDebunk 2001 exchanges on Simon Wlliams' so-called "Associative Model of Data" (AMD), academic claims of its superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

Part 1 was the email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Simon Williams' claims that started the series. Part 2 is my response to a reader's email questioning our dismissal of Williams's claims.  (The reader's comments are in quotes.)

Saturday, September 17, 2022

NEW "DATA MODELS" 1 (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

“Codd's aim was to free programmers from having to know the physical structure of data. Our aim is to free them in addition from having to know its logical structure.”

                                                 --Simon Williams, LazySoft

This series is a re-publication of several DBDebunk 2001 posts in response to Simon Wlliams' so-called "Associative Model of Data", academic claims of superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

Part 1 is the email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that started the whole thing.

Saturday, May 21, 2022

NO RDBMS WITHOUT RELATIONAL DOMAINS (obg)



Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).

The following is an email exchange with a reader and DBMS designer.

ON DATA TYPES AND WHAT A DBMS IS

(originally published in 2001)

Reader:
"I would like to hear your (or Date's) opinion on The Suneido Database … it seems to me self-contradictory. They aren't typed ... so how can they define operators, or even the idea of domains. They also say they include administrative commands, which as far as I understand isn't allowed in the THIRD MANIFESTO. While they do not claim to be an implementation of the Manifesto, their claims that their database language was created by CJ Date do not sound appropriate."

 "They don't know what [domains (distinct from programming data types)] are and what their function in the RDM is. That's common for all DBMS vendors, the claims of which should be always taken with more than a grain of salt."

Friday, March 18, 2022

ENTITIES & RECORDS (sms)



 

Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals to encourage readers to test their foundation knowledge, which they can then compare with our debunking here, where we confirm what is correct and correct what is fallacious (with clarifications, wherever necessary). For in-depth treatment check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs).

Q: “What is the relationship between an entity and a record?”

A: “In the context of a database design, an ‘entity’ is a type or category of persons, places, things or events. It’s a collectivisation of the nouns in a system about which you wish to keep data. For example, Employee might be the name of an entity in your system. A ‘record’ is a collection of data about a specific entity, a particular person or place, an identifiable thing, or a single event. For example, Name: ‘Dave Voorhis’, StartYear: 2019, Salary: £1,398,293 might be a record of one Employee entity in your system.”

A: “Database, file, and recordset are basically the same thing. They are collections of information or data. Each database or file or recordset typically has some sort of common purpose or definition. Like a database (relational, hierarchical, etc.) of data of a business process. A File is again a collection of data such as all transactions to be posted. A recordset is also basically a file.

Entity and table are basically the same thing. While you have the grouping of all the data, and entity (logical view) and a table (physical view) are the same. As Dave said, it is a logical grouping of a specific piece of data.

File, recordset, record, row or line are basically the same. A .csv file is a grouping of records. A file is a grouping of records. A row is an individual grouping of data from a relational database.

The last is element or attribute or field. This is the individual piece of data like Transaction_Amount or First Name.”
--Quora.com
A simple and the answer oversimplifies. But things seem simple only in the absence of foundation knowledge. Practitioners use different terms for the same thing, or the same word for different things, but that must be corrected, not accepted or validated.

Saturday, February 19, 2022

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 6: DEBUNKING AN ONLINE EXCHANGE 5 (obg)



Note: To demonstrate the correctness and stability offered by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references, which I enclose in square brackets).

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the)  order of the original publication.
Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com
Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com
Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com
Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com
Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com
Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

CLARIFICATIONS ON A DISCUSSION OF MY BOOK PART 2

(originally posted 2/21/01)

In Part 1 debunked a review of my book @Slashdot.Org. In parts 2-5 I tackled the discussion generated there by the review. In this last part I focus on the discussion of data hierarchies covered in chapter 7 of my book [the in-vogue re-emergent graph fad].

“Chapter 7 discusses data hierarchies and trees. In a nutshell: there are no trees in SQL. The author is distressed by this. Given that a foreign key is basically a pointer, you can store trees in databases, it might not be pretty and there may not be easy way to read them and it might not be a good thing to do - but if you feel the need then get right in there. Of course I could be totally wrong about this.”
Confusing keys with pointers is one of the major errors many practitioners make ]. One intentional core advantage of the RDM is precisely that it prohibits pointers -- both physical and, as in object-orientation, logical. Exposing pointers to users has caused many unnecessary problems and complications, but offered no benefit (Don't Mix Pointers and Relations and Don't Mix Pointers and Relations - Please! in Date's RELATIONAL DATABASE WRITINGS 1994-1997). There is an easy way to demonstrate that relational keys are not, like object IDs (OID), pointers, but values: they represent uniquely identifying names/attributes of rel world entities. Pointers are system-generated internals and have no real world counterpart. The desirability of a data model that produces logical models that are faithful representations of the real world, without adding artifacts of their own. Indeed, as Date points out in Why The Object Model' is Not a Data Model in his above-mentioned book, the fact that "in the object world all the references to objects are by means of their corresponding OIDs explains why -- as is well known -- OO systems typically provide (a) two different equality comparison operators, equal OID vs. equal value and (b) two different assignment operators, assign OID vs. assign value.  Note the added complication -- what is the benefit?

Sunday, December 5, 2021

HOW NOT TO EXPLAIN THE RELATIONAL MODEL (tyfk)



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“The key idea is "Parent-Child" relationship. Entities ~ Relations ~ Tables (tilde stands for "more or less like"). Concept of a Table resonates with most of the people just as everybody intuitively grasps a concept of "rows and columns” but might struggle with "tuples and attributes". Explain relations and relationships, 1:1, 1:N, N:N etc. Explain rationale for this way of collecting and storing data, touch upon data normalization, and tell a few anecdotes about cost of storage back in 1970 and Y2K problem it have caused; add that we have inadvertently created Y10K problem while fixing it (not exactly true but not wrong either). Show an ERD diagram, trace the relationships, introduce SQL, maybe run a few simple SELECT queries to help your listeners visualize it, including equijoin and ORDER BY. Save other JOIN types, data types and other, more advanced topics, and for the next encounter.”
--Quora.com

 An excellent example that validates my claim of lack of foundation knowledge in the industry: most "explainers" of RDM have acquired relational jargon, but do not know or understand it at all.

Friday, February 12, 2021

TYFK: What Is a Relational Database and Why Is It Important?



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:
  • Data is stored in tables called relations.
  • Relations can be normalized.
  • In normalized relations, values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same [sic] domain.”
--What is a relational database and why is it important, Quora.com

Friday, January 1, 2021

OBG: Database Design and Guaranteed Correctness Part 2



Note: This is a re-write of an earlier post (which now links here), to bring it into line with the current understanding of the RDM derived from McGoveran formalization and interpretation of Codd's work[1]. Reference [9] is also an important re-write and is recommended pre-requisite for this post.

Continued from Part 1

 “The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views ... However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system(DBMS). The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must:

  • Determine the data to be stored in the database.
  • Determine the relationships between the different data elements.
  • Superimpose a logical structure upon the data on the basis of these relationships.
Within the relational model the final step above can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects.”
--What is a Relational Database, Quora.com
There is, typically, much vagueness and confusion here and instead of debunking it makes more sense to provide a rigorous description of what database design really is: formalization of a conceptual model -- expressed as business rules -- as a logical model for representation in the database using a formal data model. If the data model is the RDM, the logical model consists of relations constrained for semantic consistency with the conceptual mode, the constraints being formalizations of the business rules.

Saturday, September 26, 2020

TYFK: Nothing to Do With Relational



Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

 

 “The relational calculus is good in describing sets. But it´s bad at describing relations between data in different sets. Explicit identities (primary keys) need to be introduced and normalization is needed to avoid update inconsistencies due to duplication of data. To say it somewhat bluntly: The problem with the relational calculus and RDBMS etc. is the focus on data. It´s seems to be so important to store the data, that connecting the data moves to the background. That might be close to how we store filled in paper forms. But it´s so unlike how the mind works. There is no data stored in your brain. If you look at the fridge in your kitchen, there is no tiny fridge created in your brain so you can take the memory of your fridge with you, when you leave your kitchen.” --Weblogs.asp.net

The lack of foundation knowledge exposed by the above paragraph is so complete that its claims are practically upside down and backwards.

Fundamentals

As we have demonstrated, in mathematical set theory a relation (set) is a subset of a cross-product of domains (sets). In other words, it is a set that is a relationship among sets. Being abstract (i.e., having no real world meaning), the values of mathematical relations can be arbitrary.

The RDM is an application of simple set theory expressible in first order predicate logic (SST/FOPL) to database management: a relational database represents a conceptual model of some reality, namely (facts about) a multigroup in the real world -- a collection of related entity groups -- each database relation representing one such group; a database is also a set of related relations. The values in database relations (i.e., the data) are, thus, not arbitrary, but must be consistent with the conceptual model: relations and the database as a whole are semantically constrainted to be so consistent: (1) individual properties of entities and (2) collective properties of (a) groups (i.e., relationships among entities within groups), and (b) the multigroup (i.e., relationships among groups).

A primary key (PK) represents names given in the real world to entities of a given type, and the corresponding PK constraint (uniqueness) enforces consistency of a relation with the distinguishability of those entities in the real world, the facts about which it represents. These are not RDM artifacts, but rather part of the adaptation of SST/FOPL to database management.

For the primary advantage of the RDM -- guaranteed correctness of query results (i.e., inferences made from the database) -- to materialize, logical database design must adhere to three core principles which, jointly, imply fully normalized relations (5NF).
In fact, in RDM relations are in 5NF by definition, otherwise they are not relations -- relational algebra (RA) operations lose information and all bets are off.

The RA is the manipulative component of the RDM -- a collection of primitive and derived set operations on relations that describe relationships among relations. For example, the join operation r1 JOIN r2 describes a relationship between r1 and r2 relation, the result itself a relation. Note that since every result of a RA operation on even one relation is always a relation and still describes a relationship -- between the "input" and "output" relations.

A data model -- and, industry claims notwithstanding, the only one satisfying Codd's definition that has been formalized is the RDM -- is by nature focused on data. However, the RDM supports physical independence (PI) and, thus, not concerned with how data is physically stored and accessed. The notion of "files stored in paper form" is an example of the common and entrenched logical-physical confusion (LPC) due to failure to understand the distinction between a logical relation and its tabular visualization on a physical medium, induced/reinforced by the industry's "direct image" implementation of SQL DBMSs.

Conclusion


We rephrase the above paragraph as follows:

“The relational algebra describes relationships among relations (sets). Primary keys are one of the adaptations of the SST/FOPL for database management: a PK constraint -- uniqueness -- represents formally in the database a within-group relationship among all its entities.

Mandatory adherence to three core design principles jointly imply full normalization, which is necessary to guarantees correctness of query results. True RDBMSs:

  • Implement the RA for logical data retrieval independent of how the data is physically stored and accessed. SQL DBMSs notwithstanding, vendors are free to store data whichever way they want as long as they don't expose it to users in applications.
  • Enforce relational constraints that are formal database representations of relationships in the conceptual model represented by the database.”

 The "brain" stuff is sheer nonsense.





Friday, September 4, 2020

OBG: Relationships and Relations




Note: To demonstrate the stability afforded by a sound foundation relative to the industry's fad-driven cookbook practices, I am re-publishing under "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my debunkings hold up, and whether the industry has progressed beyond the misconceptions they were intended to dispel. I may break long pieces into more pointed parts, and add comments and references to further reading.

From "Little Relationship to Relational" originally posted on March 29, 2001.

“Given the depth and complexity of Codd's thought, not to mention the arcane terms in which he often expressed himself, it is not difficult to grasp why so many of his key points have been widely misunderstood. Even programmers still often misconstrue the technical term "relational". The relational in relational theory refers to relations and not relationships. A relation is a special set of similar objects commonly modeled as entities or as database tables. Relationships may exist between these relations and if your relations are entities you could easily represent the whole thing using a Relational Entity Relationship approach. To elucidate a simple practical example, if you had a company table and an employee table and each company record could have many employee records associated with it, you would have two relations and one relationship. The relations would be the sets of similar objects found in the Employee and Company tables and the relationship would be the association between them. In this case one company to many employees.”
Codd's thought was very deep indeed--new implications are still being derived from his original ideas--and one major objective of relational technology, now almost forgotten, is simplicity. There is little that is complex in relational technology and, in fact, it is the most simple approach possible. Any other general approach is more complex.

It is true that Codd, as a mathematician, did not present his ideas in a way comprehensible to the average practitioner. But it is also true that he had to use different terminology in order to distinguish his precise concepts from the fuzzy, problematic terms already used in the industry. It is also true that, as I argued in the first editorial launching this site, practitioners are so steeped in complex implementation details and devoid of education in fundamentals, that they have a hard time understanding simple logical concepts. It is rather ironic that the author of the article himself reveals some misunderstanding of his own. To clarify:

  • formally a relation is a set of tuples, representing propositions about the real world.
  • informally, a relational table can be viewed as representing an "entity type", with rows representing "entities" of that type.
But note carefully that:
  • "entity" has no precise, formal definition
  • "relationship" can and should be regarded as a special case of "entity"

Comments on re-publication: 
  • A relation is a relationship among domains that is constrained semantically to represent in the database real world relationships within and among entity groups. 
  • We no longer use R-table as a substitute for relation -- it is a visualization of a relation on some physical medium that plays no role in RDM. Note that constraints are not visible in a R-table.
  • A relationship can be (1) among entities within an entity group, in which case it is a collective property of the group and is represented by a constraint or (2) between groups, in which case it is represented by an associative relation.


Further Reading

The Interpretation and Representation of Database Relations (Codd 1969-70)

Logical Symmetric Access, Data Sublanguage, Kinds of Relations, Redundancy and Consistency (Codd 1969-70)

What Relations Really Are and Why They Are Important

Understanding Relations series

Levels of Representation: Conceptual Modeling, Logical Database Design and Physical Implementation

Understanding Conceptual vs. Data Modeling series

Conceptual Modeling Is Not Data Modeling

Relationships and the RDM series

Relations & Relationships

Relationships, Rules, Relations and Constraints


What Is A Database Relationship




Thursday, August 20, 2020

TYFK: Relations, Tables, Domains and Normalization



Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:

  • Data is stored in tables called relations.
  • Relations can be normalized. In normalized relations, values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same domain.”
View My Stats