Showing posts with label LV. Show all posts
Showing posts with label LV. Show all posts

Saturday, June 1, 2024

SMS: DOMAINS & SQL




I am working on entirely new papers (not re-writes) in the PRACTICAL DATABASE FOUNDATIONS series. I have already published two:

  • THE FIRST NORMAL FORM - A DEFINITIVE GUIDE
  • PRIMARY KEYS - A NEW UNDERSTANDING

available for ordering from the PAPERS page, and two more:

  • RELATIONAL DATABASE DOMAINS: A DEFINITIVE GUIDE
  • DATABASE RELATIONS: A DEFINITIVE GUIDE

are in progress and forthcoming, respectively.

In the process I am coming across common and entrenched industry "pearls" that I am using for my "Setting Matters Straight" (SMS) and "To Laugh or Cry" (TLC) posts on Linkedin. I do those posts to enable the few thinking database professionals left realize how scarce foundation knowledge is, and to illustrate fallacies that abound in the industry, of which they are unaware, and which the papers are intended to dispel.

Time permitting, I may expose and dispel some of those fallacies, treated in more depth in the papers, such that those thinking professionals can test their knowledge and decide whether the papers are a worthy educational investment.

Here's one.

 “A domain in most SQL usage is essentially an alias name for an existing type + restrictions on an existing type that can be used in a column. As for an attribute, it's essentially a COLUMN in SQL, a field in other types of databases, etc.”
Can you identify the fallacies before you proceed?

Saturday, February 4, 2023

CONCEPTUAL MODELING, LOGICAL DATABASE DESIGN AND PHYSICAL IMPLEMENTATION (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“A conceptual data model usually just includes the main concepts (entities) required to store information and the relationships that exist between these entities. We don’t usually include any details about each piece of information. We can consider the conceptual stage as an initial model, without all the details required to create a database.

A logical data model is probably the most-used data model. It goes beyond the conceptual model; it includes entities, relationships, details on entities’ different attributes, and unique ways to identify entities (primary keys) and establish the relationships between them (foreign keys).

A physical data model is usually derived from a logical data model for a particular relational database management system (RDBMS), thus taking into account all technology-specific details. One big difference between logical and physical data models is that we now need to use table and column names rather than specifying entity and attribute names. This allows us to adapt to the limits and conventions of the desired database engine. We also provide the actual data types and constraints that allows us to store the desired information.”
--Vertabelo.com

Monday, January 2, 2023

 NEW "DATA MODELS" 5.2 (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

This is a re-published series of several DBDebunk 2002 posts on Simon Wlliams' Lazy Software so-called "Associative Model of Data" (AMD), academic claims of its superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

  • Part 1 was an email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that triggered the series.
  • Part 2 was my response to a reader's email questioning our dismissal of Williams's claims.
  • Part 3 was my email exchange with Williams where he provided his definition of a data model on which I conditioned any discussion with him and I debunked it.
  • Part 4 is my response to a reader's comments on my previous posts in the series.
  • Part 5.1 provided the background for my critique of Edward Hurley's report on Simon Williams's Lazy Software and his so-called "Associative Model of Data" (AMD).

Thursday, November 10, 2022

NEW "DATA MODELS" 4 (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

This is a re-published series of several DBDebunk 2001 exchanges on Simon Wlliams' so-called "Associative Model of Data" (AMD), academic claims of its superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

Part 1 was an email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that started the series. Part 2 was my response to a reader's email questioning our dismissal of Williams's claims. Part 3 was my email exchange with Williams where he provided his definition of a data model on which I conditioned any discussion with him and I debunked it. Part 4 is my response to a reader's comments on my previous posts in the series.

Sunday, August 28, 2022

NOBODY UNDERSTANDS DATABASE DESIGN 1 (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

In a previous SMS post I debunked an attempt to express something important about database practice that was handicapped by lack of foundation knowledge. Here is another example.

“This Codd guy might have been onto something. Unfortunately, normalization is usually taught in a somewhat backwards, overly technical way. If you start with concepts, connections between them and details about them, you usually are already at a fairly high normal form without going through any formal normalization steps.”
--LinkedIn.com

Saturday, August 20, 2022

DATABASE RELATIONS, DATABASE DESIGN & CORRECTNESS (sms)



Note: In "Setting Matters Straight" posts I debunk online Q&As that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“...The relational model organizes data through relations (aka tables). You then normalize it in one of six forms. By normalizing data you:
- Reduce redundancy
- Ensure consistency
- Optimize for atomic inserts, updates and deletes
The biggest drawback ... are keys that let you join different tables across multiple systems.”
                                                                      --LinkedIn.com

Saturday, May 21, 2022

NO RDBMS WITHOUT RELATIONAL DOMAINS (obg)



Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).

The following is an email exchange with a reader and DBMS designer.

ON DATA TYPES AND WHAT A DBMS IS

(originally published in 2001)

Reader:
"I would like to hear your (or Date's) opinion on The Suneido Database … it seems to me self-contradictory. They aren't typed ... so how can they define operators, or even the idea of domains. They also say they include administrative commands, which as far as I understand isn't allowed in the THIRD MANIFESTO. While they do not claim to be an implementation of the Manifesto, their claims that their database language was created by CJ Date do not sound appropriate."

 "They don't know what [domains (distinct from programming data types)] are and what their function in the RDM is. That's common for all DBMS vendors, the claims of which should be always taken with more than a grain of salt."

Saturday, January 8, 2022

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 2: DEBUNKING AN ONLINE EXCHANGE 1 (obg)



Note: To demonstrate the soundness and stability conferred by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. In re-publishing I may revise, break into or merge parts and/or add comments and/or references that I enclose in square brackets). 

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the)  order of the original publication.
Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com
Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com
Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com
Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com
Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com
Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com

Saturday, January 1, 2022

SCHEMA & PERFORMANCE: NEVER THE TWINE SHALL MEET



One of the core objectives of this site (and my work) has been to demonstrate that there will not be progress in data management as long as the industry and trade media require and promote exclusively (mainly tool) experience in the absence of foundation knowledge. I have published and analyzed ample evidence that relational language and terminology are used without grasping what it actually means -- a good way to gauge lack of foundation knowledge.

Recently I posted a four part series titled "Nobody Understands the Relational Model" showing that even a practitioner steeped in the RDM does not really understand it. Consider now a practitioner's mistake at the beginning of career -- "a bad database schema and what it did to system performance" -- which, he claims, belatedly taught him a lesson. Hhhhmmm, did it, really?

Saturday, December 11, 2021

NOBODY UNDERSTANDS THE RELATIONAL MODEL: SEMANTICS, CLOSURE & CORRECTNESS PART 4



with David McGoveran

(Title inspired by Richard Feynman)

In Parts 1 and Part 2 we provided some clarifications following a discussion on LinkedIn about our contention that, conventional wisdom notwithstanding, database relations -- distinct from mathematical relations -- are by definition not just in 1NF, but also in 5NF, as a consequence of which the RA, as currently defined for 1NF closure, produces what the industry calls "update anomalies" and, thus, is not a proper algebra. In Part 3 we used that information to debunk some leftover misunderstandings in the discussion.

We conclude in Part 4 with comments on a private exchange that followed the public one on LinkedIn regarding the difference between the McGoveran (DMG) and Date and Darwen's (TTM)
interpretations of the RDM, which can be summarized as follows:

Friday, November 19, 2021

THE FATE OF FADS: XML DBMS (obg)



Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.

Remember XML DBMS? At one point it was the fad of the day, similar to today's NoSQL or the old new "knowledge graph" -- "the future that you ignored at the peril of being left behind". As I predicted, it went the way of all fads (ODBMS, Associative DBMS, you name them) together with their "data models" that were nothing of the sort. My prediction was grounded in the same sound foundations I rely on today -- unlike the industry we are progressing it -- that fads lack and which were and still are dismissed, evidence be damned.

Here's a typical example (comments on republication in square brackets).

Saturday, September 4, 2021

Understanding Relational Constraints



“The data in a relational database is stored in form of a table. A table makes the data look organized. Yet in some cases we might face issues while working with the data like repetition. We might want enforce rules on the data to avoid such technical problems. Theses rules are called constraints. A constraint can be defined as a rule that has to enforced on the data to avoid faults. There are three kinds of constraints: entity, referential and semantic constraints. Listed below are the differences between these three constraints:
1. Entity constraints -- primary key, foreign key, unique, NULL -- are posed within a table and used to enforce uniqueness and to define no value [respectively].    
2. Referential constraints -- foreign key -- are enforced with more than one table for referring other tables for analysis of the data.
3. Semantic constraints -- datatypes -- are  enforced in a table on the values of a specific attribute and help the data segregate according to its type. Example: name varchar2(30).”
--GeeksforGeeks.com
Before we tackle the main subject, let's get some misconceptions out of the way. As we have explained so many times:

  • Data is not "stored in a form of a table" -- it can be stored in any number of physical formats, at the discretion of DBMS designers and DBAs. Physical independence is a core advantage of the RDM.
  • A table does not "make the data look organized". Data is by definition organized -- be it relationally or not -- otherwise it would be random noise not data.  A database relation can be visualized as a R-table, but tables do not play any role in RDM.
  • While some "repetition" (i.e., redundancy) is prevented by constraints (e.g., uniqueness), others are avoided by database design (e.g., 5NF DB relations).

And now to constraints.

Tuesday, August 31, 2021

TYFK: Normalized, Fully Normalized, Non-Normalized, Denormalized -- Clearing the Mess



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“A non-normalized database is a disorganized one, where nobody has bothered to work out where the facts should be stored. It is like a stack of paper files that has been tossed down the stairs. We are not interested in non-normalized databases.

A normalized database has been organized so that each fact is stored in exactly one place (2nf and greater) and no more than one fact is stored in each place (1nf). In a normalized database there is a place for everything and everything is in its place.

A denormalized database is a normalized database that has had redundancies deliberately re-introduced for some practical gain. Most denormalizing means adding columns to tables that provide values you would otherwise have to calculate as needed. Values are copied from table to table, calculations are made within a row, and totals, averages and other aggregrations are made between child and parent tables.”
--database-programmer.blogspot.com

Thursday, July 22, 2021

Documents and Databases



'These new data technologies were developed because there are new usage scenarios for data — which do not fit into the relational model.'
--Reddit.com

Don't let the NoSQL label fool you. It's the relational model (RDM), not SQL, that its proponents are really dismissing. The main argument, as advanced in a recent LinkedIn exchange, is that lots of information "cannot be represented in rows and columns". IOW, the RDM is not general enough -- there are certain types of information that it is not suited for. Ignoring the tabular nonsense, the response from David McGoveran, is important enough to restate here.
“Information consists of facts (i.e., propositions asserted to be true) about objects, properties, and relationships among objects and properties. We have shown that a database relation -- which a R-table visualizes -- is constrained to represent a set of facts about (properties of) a group of entities with within-group relationships among properties and entities and cross-group relationships. Yet we are told that document information "do not fit" in a relational structure. They are referred to as "unstructured" (which, if they were, they would contain random noise, not information).

But documents don't lack structure. Rather, they are multistructured: have complex multi-level/type structures -- lots of content, metadata, interpretations, and internal relationships (formatting, semantic, structural or syntactic, and so on). At one level of analysis, they are just documents that have subject matter or content involving objects, properties and relationships. At another they might relate to that of other data (e.g., other documents). How we represent knowledge and in how much detail is determined by which of the structures we choose to represent and that always partially determines the class of queries we can express. This is precisely what Codd understood and tried to address via the RDM.
--David McGoveran

And there's the rub: which type of data (facts) at which document level is of interest? Take this post. There are facts about it (e.g., author, title, date and so on). There are facts in it (its content). Either can can be readily represented relationally, for example:

POSTS (AUTHOR,TITLE,DATE,CONTENT)

where CONTENT is a column defined on a text, PDF, or HTML domain with built-in operators applicable to values of either of those types (e.g., a substring operator for text). Facts at other levels (e.g., grammatical, or semantic) could be of interest and would require multi-table representation. One must choose the type/level of information of interest to represent relationally in a database. We can choose to not do the analysis and modeling of the content of documents, but that does not mean that they are unstructurable as facts. More often than not data professionals don’t know what type of facts are to be represented, or are unfamiliar with data modeling and relational fundamentals. Product advocates avoid to say that without investing time and effort in analysis and modeling one cannot ask the same questions of and produce results equivalent to those from relational databases (i.e., make precise inferences from data that are guaranteed to be correct -- logically valid and semantically consistent). In fact, the use of such products trades upfront structuring effort for subsequent prohibitive manipulation effort.

As David points out, "complaints about RDM are not about knowledge representation, but knowledge discovery -- the problem, for example, that Google Search, analytics and data integration face and attempt to solve. It's an expensive, imprecise, and difficult problem", but it is distinct from what database management does and the two should not be confused.

 

 

 

 

Saturday, July 10, 2021

Relational Misconceptions Part 2: RDM is Applied Theory



In Part 1 we showed (yet again) how even those with their heart in the right (relational) place can't help being affected by the common and entrenched industry misconceptions, in this case about relationships, relations and tables. More often than not authors exhibit the very misconceptions they try to debunk.

We left the author distinguishing sets (with unordered, unique elements) from tables (lists of ordered, possibly duplicate rows). 

Thursday, June 10, 2021

RE-WRITE



See: https://www.dbdebunk.com/2023/08/entities-properties-and-codds-sleight.html

Thursday, March 25, 2021

OBG: Don't Confuse Levels of Representation Part 1



Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.

This is an email exchange with readers in response to my article Normalization and Performance: Never the Twain Shall Meet.

Saturday, January 23, 2021

"Codd was wrong" and "You're teaching the gospel" Betray Lack of Foundation Knowledge



Note: I have documented and debunked these misconceptions so many times that I will no longer reference them -- the reader motivated to gain genuine understanding should use the (1) blog labels (2) Blogger search (3) POSTS page to locate the relevant posts.

I have long claimed that a core problem in the industry is the vast majority of practitioners who use relational terminology, do not know/understand what it means, yet are convinced they do -- the less the understanding, the greater the convinction. A recent LinkedIn exchange provided -- as if it were needed -- yet another example. It was triggered by my comment:

“How many know today that a relation is by definition in 5NF, otherwise it's not a relation, the relational algebra has "anomalies" and all bets are off? IMO, none! If you need to "do" normalization, you did not design correctly, which means you don't understand the RDM.”
that prompted the following reaction:
“Is that really true? You construct a table and fill it full of garbage. It may not even be in 1NF, but is it not still a "relation" of columns, even if it's not a relation of rows or attributes? Codd had no real conception of syntax as separate from semantics, I don't think relational theory has a clear position on this. This is where Kimball and dimensional systems differ from Codd's relational, it made some effort (not a lot) to distinguish syntactic and semantic elements.”
--Joshua Stern

Sunday, November 22, 2020

OBG: Missing Data -- "Horizontal Decomposition" Part 1



Note: To demonstrate the correctness and stability of a sound foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may break long pieces into multiple posts, revise, and/or add comments and references.
 

“I'm excited to share a data.world research partnership with Prof Leonid Libkin and Paolo Guagliardo from The University of Edinburgh. Our goal is to understand how NULL values are used in the real word to bridge theory and practice. Please help us by participating in a survey.”


Thus a recent announcement on LinkedIn, which triggered reactions in praise of this "much needed effort".

Sigh! SQL's NULL is a blunder unworthy of research. The commonly used "NULL value" is a contradiction in terms, indicating that industry surveys are not a path to enlightening. The real issue is, of course, missing data, which is governed by long studied and well understood logic[1,2,3,4], though apparently not in the industry and today's academia.

In 2004 we published The Final NULL in the Coffin: A Relational Solution to Missing Data (a paper revised since) that we believe is theoretically sound and, importantly, consistent with McGoveran's work re-interpreting, extending and formalizing Codd's RDM[5]. At the time it generated a series of exchanges with readers, which were posted at the old DBDebunk (2000-2006). In light of the above they warrant re-production.

I start with the first, split in three parts: In this Part 1 a reader's reaction to both our solution and Hugh Darwen's "horizontal decomposition" alternative, How to Handle Missing Information without Using NULLs; Hugh's reply is in Part 2 and mine -- re-written to bring up to date with current state of knowledge and for clarity --
is in Part 3.

Note: In a later book Darwen dedicated a chapter to a "multi-relation" approach which seems an allusion to our solution.

Saturday, October 17, 2020

Understanding Codd's 12 Rules for RDBMS



In response to an online publication of a book appendix regurgitating Codd's 12 famous rules (some of which were, typically, incorrect[1]) I posted earlier a clarification of the rules. This is a revision thereof for better consistency with the new understanding of the RDM based on McGoveran's re-interpretation, extension and formalization[2] of Codd's work.

View My Stats