Thursday, August 4, 2022

DATABASE RELATIONS, TABLES AND SEMANTIC CONSISTENCY



by David McGoveran with Fabian Pascal

 

Note: In "Setting Matters Straight" posts I debunk online Q&As that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“In a RDBMS, a table is columned rows, as in you treat individual rows as an actual entity while the columns are its attributes. In an excel tab, you can create a column, but it doesn't have to have all the same data types in that column, nor does one row have to represent one entity. It's more free form ... All in all, RDB is relational because it's column based rows and constrained to that format, while non relational can have free form like an excel. When you have rows that are uniform (constrained to what the column should be), you create entities as tables, and link them through columns to keep track of the relationships.”
--Quora.com
I posted this on LinkedIn as one of my "To Laugh or Cry?" items which, unlike "Setting Matters Right" posts, are beyond debunking. But the exchange that followed made me realize that there is, nevertheless, pedagogical value to it: it expresses something important, but poorly due to author's lack of foundation knowledge.

Wednesday, July 13, 2022

MISSING DATA AND MULTI-RELATION QUERY RESULTS (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

On NULLs and Multi-Table Relvars

(first published 04/05/02)

"I had a question about the missing-values suggestion in PRACTICAL ISSUES IN DATABASE MANAGEMENT, page 234. You write:
"Table operations would have to be modified to yield results with as many tables as there are types of propositions with only known values."
How would this be represented in a language like Tutorial D, where relvars are required to be strongly typed? One possible idea is to make use of type inheritance. Suppose I had a domain of tuple values {x,a,b,c} (all integers, say) where x is not allowed to be missing but a, b, and c are allowed to be missing. Suppose we extended the domains of a, b, and c with an "imaginary" special value that we will never represent, which I will show for diagram purposes only as '?'. Then the domain can be split into parts:
XABC {x,a,b,c} possrep: {X: int, A: int, B: int, C: int}
XAB {x,a,b,'?'} possrep: {X: int, A: int, B: int}
XAC {x,a,'?',c} possrep: {X: int, A: int, C: int}
XBC {x,'?',b,c} possrep: {X: int, B: int, C: int}
XA {x,a,'?','?'} possrep: {X: int, A: int}
XB {x,'?',b,'?'} possrep: {X: int, B: int}
XC {x,'?','?',c} possrep: {X: int, C: int}
X {x,’?','?','?'} possrep: {X: int}
Using Mr. Date's specialization by constraint idea, we can inherit all the subtuple types from the main tuple type. Updates could make a tuple change type. A relation of relations of XABC type could be used to return results of a query. Each relation within the relation would contain one subtype.

However, the exponential explosion of possible subtypes would be very difficult to handle, practically speaking. As you admit in your book, a real DBMS might have to handle thousands of small subtables. This cannot be passed off as an "implementation detail" since table operations "yield results" at the user presentation level. No matter how efficient the underlying system might be, this seems unacceptable. Perhaps we have to fall back on default values after all."

Sunday, July 3, 2022

RELATIONS, DATABASE RELATIONS AND TABLES (sms)



Note: In "Setting Matters Straight" posts I debunk online Q&As that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “What is a relation in database?”

A: “Relational databases were so named in 1970 by computer scientist E. F. Codd because the tables are themselves relations, which is a mathematical term. What makes a relation (aka a table) a relation? Basically:
  • A relation has a heading, which names a finite set of columns.
  • Columns are defined by their name and their type.
  • A relation has a finite set of tuples (aka rows), and every tuple has the same set of columns (i.e. same name and type) as those named in the heading.
  • Being finite sets, both the set of columns in the heading and the set of tuples in the relation have no duplicates and no inherent order.
See? There’s nothing about relationships between tables in the definition of a relation. You could have a relational database that contains just one relation. If there’s any relationship described in a relation, it’s actually the relationship between the columns within a relation. That is, the value "Pittsburgh" goes with the value "Steelers" on a given row because the relation is defined as "pro football teams by city" and therefore there’s a linkage between some values in the set of football teams and the set of city names.”  --Quora.com

Sunday, June 26, 2022

REPEATING GROUPS AND 1NF (t&n)



09/19/23: For the latest on this subject see: FIRST NORMAL FORM - A DEFINITIVE GUIDE

 

“A commonly used example of a table that is not in 2-NF is one with repeated attributes (i.e. child1, child2, child3). However, after examining the definition of 2NF in your book PRACTICAL ISSUES IN DATABASE MANAGEMENT, it seems to me that tables such as these do in fact satisfy 2NF. Am I missing something?” --Reader

Saturday, June 11, 2022

ORDER & RELATIONAL DATABASES (sms)



Note: In "Setting Matters Straight" I post on LinkedIn online Q&As that involve fundamentals under the header "What's Right and Wrong with this Database Picture" and then debunk them here. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “I'm not sure what this means: "The order of the rows and columns is immaterial to the DBMS?" -- could anyone explain?”

A: “It means two things:
The engine is under no obligation to insert new rows immediately following the previously inserted row(s)... During processing of selects, the optimizer is free to use any index it finds efficient to use or none at all... For this reason, if the order of returned data is important to your processing, then you must include an ORDER BY clause.”

Q: “How do you reorder fields in the database?”

A: “Depends on how you define "reorder". What view of your data are you trying to set the order. Are you in Table Design view? ... Are you looking at form? The answer is different depending on what you are referring to.”
--Quora.com

Saturday, May 21, 2022

NO RDBMS WITHOUT RELATIONAL DOMAINS (obg)



Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).

The following is an email exchange with a reader and DBMS designer.

ON DATA TYPES AND WHAT A DBMS IS

(originally published in 2001)

Reader:
"I would like to hear your (or Date's) opinion on The Suneido Database … it seems to me self-contradictory. They aren't typed ... so how can they define operators, or even the idea of domains. They also say they include administrative commands, which as far as I understand isn't allowed in the THIRD MANIFESTO. While they do not claim to be an implementation of the Manifesto, their claims that their database language was created by CJ Date do not sound appropriate."

 "They don't know what [domains (distinct from programming data types)] are and what their function in the RDM is. That's common for all DBMS vendors, the claims of which should be always taken with more than a grain of salt."

Monday, May 2, 2022

RELATION PROLIFERATION (sms)



Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “How do I avoid too many relations in databases?”

A: “You don’t. Every relation is there to store meaningful data, hopefully you do not define database relations for data that are not to be stored in your database.”

A: “By following proper design principles. Normalization, standard data patterns, and progressing from logical to physical always. Never denormalize (or avoid normalizing in the first place) because performance never trumps accuracy. It really doesn't matter how fast you get the wrong answer.”
--Quora.com
View My Stats