Showing posts with label RA. Show all posts
Showing posts with label RA. Show all posts

Sunday, May 28, 2023

INTENSION, EXTENSION AND R-TABLES (t&n)



Note: "Then & Now" (t&n) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the disregarded evolution/progress of RDM, I am re-visiting my old debunkings, bringing them up to the current state of knowledge. This will enable you to judge how well arguments have held up and realize the increasing gap between industry stagnation --  and scientific progress.

THEN: THE IMPORTANCE OF RELATIONAL TERMINOLOGY (t&n)

(email exchange with a reader originally published September 2002)

“Saw your latest and once again I think you have hit one of the many protruding nails on the head. Understanding one's data is so central and so crucial and yet so often ignored.

All this talk (not from you, I note) of silver bullets. Nothing new and I wonder if the paying customers and the big-ticket so-called technology strategy companies will ever wise up. Edward de Bono wrote of 'porridge words' that distract thought from the matter at hand. When used sparingly, they can facilitate new lines of thought but when, as they are in this field, they are used so casually and often they blur the real issues. All this technicalese of XML etcetera has this effect on me.

During one of the few times an employer allowed me to help people with logical design, I was having difficulty because the customer's IT staff knew very little English and had perhaps even less database background. I hit on the idea of explaining tables as relations and relations as sentences - sentences that must have the same 'size and shape'. Their faces seemed to light up and when they agreed that they had overloaded some of their tables, I was very pleased with myself. I felt vindicated a few weeks later when I read an article about predicates and propositions that Hugh Darwen had written in the now defunct DBPD magazine, put these thoughts much more precisely than I could, . Of course, the changes created new problems because the database product, like so many others, gave precious few ways to map the logical design to the physical one. But I regarded these as preferable problems since the staff was much more interested in the more concrete physical optimization techniques.

Without any disrespect to Dr. Codd (who I once met but was too awe-struck to ask any questions of), I have often thought that the language used by everybody in the field, with words such as "tables", nearly always brings connotations of physical arrangements to the mind of anybody who has done traditional programming. This seems unfortunate to me. Especially after I read Mr. McGoveran's proposals for results that might embody more than one table. (I wonder if these might not be part of the key for much better physical integration of databases with their visualization for users, not to mention smarter engines.)

I came across a site https://www.mcjones.org/System_R/ the other day, where a bunch of the System R people reminisced about its development on the occasion of, I think, the 25th anniversary of one of Codd's early papers. Presumably Mr. Date was absent from this gathering so that he could write his own most interesting history, which I remember reading five or six years ago. Anyway, I was struck again by how often their design decisions were either determined or distorted by physical considerations. And now, when many of the obstacles have been overcome courtesy of Moore's and other laws, some of those clever people seem regretful.

Also, please let me submit an historical, non-technical 'nit' to Mr. Date - I remember him writing that Codd did not coin the database term 'normalization of relations' as a result of R.M. Nixon's foreign policy excursion with China. But I also remember reading what I recall was an original interview with Dr. Codd in the DBMS magazine where he stated that this was the case. It's not really important, perhaps I'm just sensitive to it because I live in a country that established relations with modern China a year earlier!”

Saturday, March 4, 2023

ON NORMALIZATION AND THE SCIENTIFIC METHOD (t&n)



(originally published August 2002)

Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

Then ...

Email exchange with a reader:
“I find [your article in DM Review] to contradict your stated devotion to scientific methods and the value of theory. You present a single example of denormalization, then proceed to draw a conclusion about denormalization in general. In addition, the example chosen is not typical of real world denormalizations.” In order to be half-way consistent with your own ideals, you would need to present at a minimum an exhaustive list of the types of denormalizations used in practice, along with an objective list of the pros and cons of each.  I would expect that if this were undertaken, you would end up with a more balanced view, and some exceptions to your black-and-white conclusions. Of course, to prove your point scientifically would require far more effort than this, if indeed it were at all possible to prove or disprove your statements. This brings me to my key point: if your contention is not falsifiable, it does not belong in the realm of true science at all, instead it belongs in the domain of mere opinion and belief. Please tell us how you have proved your propositions, or else refrain from claiming that you are working from a sound scientific foundation and everyone else is somehow misguided. Relational algebra has nothing to say about real-world performance.”

Thursday, November 10, 2022

NEW "DATA MODELS" 4 (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

This is a re-published series of several DBDebunk 2001 exchanges on Simon Wlliams' so-called "Associative Model of Data" (AMD), academic claims of its superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

Part 1 was an email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that started the series. Part 2 was my response to a reader's email questioning our dismissal of Williams's claims. Part 3 was my email exchange with Williams where he provided his definition of a data model on which I conditioned any discussion with him and I debunked it. Part 4 is my response to a reader's comments on my previous posts in the series.

Saturday, October 29, 2022

NEW "DATA MODELS" 3 (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

This is a re-published series of several DBDebunk 2001 exchanges on Simon Wlliams' so-called "Associative Model of Data" (AMD), academic claims of its superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

Part 1 was the email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that started the series. Part 2 was my response to a reader's email questioning our dismissal of Williams's claims.  Part 3 is my email exchange with Williams: he provided his "definition" of a data model on which I conditioned any discussion with him and I proved my point by debunking it.

Thursday, August 4, 2022

DATABASE RELATIONS, TABLES AND SEMANTIC CONSISTENCY



by David McGoveran with Fabian Pascal

 

Note: In "Setting Matters Straight" posts I debunk online Q&As that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“In a RDBMS, a table is columned rows, as in you treat individual rows as an actual entity while the columns are its attributes. In an excel tab, you can create a column, but it doesn't have to have all the same data types in that column, nor does one row have to represent one entity. It's more free form ... All in all, RDB is relational because it's column based rows and constrained to that format, while non relational can have free form like an excel. When you have rows that are uniform (constrained to what the column should be), you create entities as tables, and link them through columns to keep track of the relationships.”
--Quora.com
I posted this on LinkedIn as one of my "To Laugh or Cry?" items which, unlike "Setting Matters Right" posts, are beyond debunking. But the exchange that followed made me realize that there is, nevertheless, pedagogical value to it: it expresses something important, but poorly due to author's lack of foundation knowledge.

Wednesday, July 13, 2022

MISSING DATA AND MULTI-RELATION QUERY RESULTS (t&n)



Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

On NULLs and Multi-Table Relvars

(first published 04/05/02)

"I had a question about the missing-values suggestion in PRACTICAL ISSUES IN DATABASE MANAGEMENT, page 234. You write:
"Table operations would have to be modified to yield results with as many tables as there are types of propositions with only known values."
How would this be represented in a language like Tutorial D, where relvars are required to be strongly typed? One possible idea is to make use of type inheritance. Suppose I had a domain of tuple values {x,a,b,c} (all integers, say) where x is not allowed to be missing but a, b, and c are allowed to be missing. Suppose we extended the domains of a, b, and c with an "imaginary" special value that we will never represent, which I will show for diagram purposes only as '?'. Then the domain can be split into parts:
XABC {x,a,b,c} possrep: {X: int, A: int, B: int, C: int}
XAB {x,a,b,'?'} possrep: {X: int, A: int, B: int}
XAC {x,a,'?',c} possrep: {X: int, A: int, C: int}
XBC {x,'?',b,c} possrep: {X: int, B: int, C: int}
XA {x,a,'?','?'} possrep: {X: int, A: int}
XB {x,'?',b,'?'} possrep: {X: int, B: int}
XC {x,'?','?',c} possrep: {X: int, C: int}
X {x,’?','?','?'} possrep: {X: int}
Using Mr. Date's specialization by constraint idea, we can inherit all the subtuple types from the main tuple type. Updates could make a tuple change type. A relation of relations of XABC type could be used to return results of a query. Each relation within the relation would contain one subtype.

However, the exponential explosion of possible subtypes would be very difficult to handle, practically speaking. As you admit in your book, a real DBMS might have to handle thousands of small subtables. This cannot be passed off as an "implementation detail" since table operations "yield results" at the user presentation level. No matter how efficient the underlying system might be, this seems unacceptable. Perhaps we have to fall back on default values after all."

Saturday, June 11, 2022

ORDER & RELATIONAL DATABASES (sms)



Note: In "Setting Matters Straight" I post on LinkedIn online Q&As that involve fundamentals under the header "What's Right and Wrong with this Database Picture" and then debunk them here. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “I'm not sure what this means: "The order of the rows and columns is immaterial to the DBMS?" -- could anyone explain?”

A: “It means two things:
The engine is under no obligation to insert new rows immediately following the previously inserted row(s)... During processing of selects, the optimizer is free to use any index it finds efficient to use or none at all... For this reason, if the order of returned data is important to your processing, then you must include an ORDER BY clause.”

Q: “How do you reorder fields in the database?”

A: “Depends on how you define "reorder". What view of your data are you trying to set the order. Are you in Table Design view? ... Are you looking at form? The answer is different depending on what you are referring to.”
--Quora.com

Saturday, May 21, 2022

NO RDBMS WITHOUT RELATIONAL DOMAINS (obg)



Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).

The following is an email exchange with a reader and DBMS designer.

ON DATA TYPES AND WHAT A DBMS IS

(originally published in 2001)

Reader:
"I would like to hear your (or Date's) opinion on The Suneido Database … it seems to me self-contradictory. They aren't typed ... so how can they define operators, or even the idea of domains. They also say they include administrative commands, which as far as I understand isn't allowed in the THIRD MANIFESTO. While they do not claim to be an implementation of the Manifesto, their claims that their database language was created by CJ Date do not sound appropriate."

 "They don't know what [domains (distinct from programming data types)] are and what their function in the RDM is. That's common for all DBMS vendors, the claims of which should be always taken with more than a grain of salt."

Monday, May 2, 2022

RELATION PROLIFERATION (sms)



Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “How do I avoid too many relations in databases?”

A: “You don’t. Every relation is there to store meaningful data, hopefully you do not define database relations for data that are not to be stored in your database.”

A: “By following proper design principles. Normalization, standard data patterns, and progressing from logical to physical always. Never denormalize (or avoid normalizing in the first place) because performance never trumps accuracy. It really doesn't matter how fast you get the wrong answer.”
--Quora.com

Monday, April 25, 2022

RELATIONAL DATABASES & SET THEORY (sms)



Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

Q: “To what extent is relational database theory related to set theory?”

A: “Relational database theory is indeed closely derived from set theory. Many operations in relational data are directly related to common operations one does with sets. In fact, SQL has keywords for them that should sound familiar to someone who has just taken a class in Discrete Mathematics:
  • UNION
  • INTERSECT
  • DIFFERENCE (called MINUS in Oracle)
Even the structure of a table is set-oriented. A table is a set of rows, and a row is a set of columns, and those columns must match the set of columns defined in the table's header.”

--Quora.com

Sunday, April 10, 2022

QUOTA QUERIES (sms)



Note: "Setting Matters Straight" (SMS) is a new format: I post on LinkedIn an online Q&A involving data fundamentals that I subsequently debunk in a post here. This is to encourage readers to test their foundation knowledge against our debunking here, where we confirm what is correct and correct what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). 

Q: “How do you return the most recent record in SQL?”

A: “There are many ways of doing it. I would suggest (first thing came to my mind):
Select Top 1
from YourTable
order by TablePrimaryKey Desc;”
A: “If you mean "the last inserted record which has no datetime stamp field" ... you have a few options.
  • If you cannot use date/time -- your next best bet would be an auto-increment/sequence field, which assigns increasing numbers to each inserted record.
  • If that’s not available, you would have to rely on business logic e.g. order # or some such.
Some vendors, like Oracle, provide ROWID pseudocolumn for each record which might help in some quick’n’dirty cases -- it is not guaranteed to be sequential but could be (e.g., when table has had no DELETE operations).”    --Quora.com

If you don't know, I set matters straight @dbdebunk.com.

Sunday, January 30, 2022

NOBODY UNDERSTANDS WHAT A DATA MODEL IS (tyfk)



 “A data model is a collection of concepts ... used to describe the structure of a database...data types, relationships and constraints...is basically a conceptualization between attributes and entities ...
The building blocks in the data model are as follows:
  • Entity − An entity represents a particular type of object in the real world.
  • Entity set − Sets of entities of the same type which share the same properties are called entity Sets.
  • Attribute − An attribute is a characteristic of an entity.
  • Constraints − A constraint is a restriction placed on the data. It is helpful to ensure data integrity.
  • Relationship − A relationship describes an association among entities.
--TutorialsPoint.com

Fallacies, Misconceptions and Confusion

  • A data model:

- does not describe (just) the structure of a database.
- is not "a conceptualization between attributes and entities" (whatever that means).

  • Entities, entity sets and relationships are not building blocks of a data model.

Friday, December 17, 2021

NO UNDERSTANDING WITHOUT FOUNDATION KNOWLEDGE PART 2: DEBUNKING A BOOK REVIEW (obg)



Note: To demonstrate the correctness and stability offered by a sound theoretical foundation (relative to the industry's fad-driven "cookbook" practices), I am re-publishing as "Oldies But Goodies" material from the old (2000-06) DBDebunk.com, so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references, which I enclose in square brackets).

A 2001 review of my third book triggered an exchange on SlashDot. This six-part series comprises my debunking at the time of both the review and the exchange in the chronological (slightly out of the)  order of the original publication.
Part 1: Clarifications on a Review of My Book Part 1 @DBDebunk.com
Part 2: Slashing a SlashDot Exchange Part 1 @DBAzine.com
Part 3: Slashing a SlashDot Exchange Part 2 @DBAzine.com
Part 4: Slashing a SlashDot Exchange Part 3 @DBAzine.com
Part 5: Slashing a SlashDot Exchange Part 4 @DBAzine.com
Part 6: Clarifications on a Review of My Book Part 2 @DBDebunk.com 

Clarifications of a Review of My Book Part 1

(originally posted 1/14/2001)
“Many of us ... do not think that harmony is the great goal, or unity or peacefulness, [and] actually quite like hard questions for their own sake, and enjoy ... the life of the mind. To the question of how to live, the answer is "by disagreement.” --Christopher Hitchens

Let me say, first and foremost, that as the subtitle of the book -- A REFERENCE FOR THE THINKING PRACTITIONER -- indicates, it is targeted at the minority of practitioners who think clearly, independently and critically. It should not be a surprise, then, that those not belonging to that (alas, very small) target audience don't see its practical value. As I said so many times, if my work gained mass appeal, I would wonder what I was doing wrong. This is the sad reality, whether we like it or not. In fact, to be consistent I will go one step further: I don't assume that positive reviews are any better than negative ones -- they are frequently grounded in as faulty reasoning and/or ignorance as the critiques.

Let me also make clear that I do not place all of the blame on  the individual database practitioners or users. Rather, problems are rooted in a systemic, much more profound societal and business culture that fails to instill and encourage foundation knowledge and independent, critical thinking, which not only does not reward, but actually punishes such. This is true to a degree in all societies, of course, but in the US the problem is much more acute (there can hardly be a better demonstration of the horrendous implications of this than how the election was covered, perceived and accepted by most of the press and public) [I wrote this prior to the last two elections -- I leave it to the reader to judge the steepness of the subsequent regress.]

Saturday, December 11, 2021

NOBODY UNDERSTANDS THE RELATIONAL MODEL: SEMANTICS, CLOSURE & CORRECTNESS PART 4



with David McGoveran

(Title inspired by Richard Feynman)

In Parts 1 and Part 2 we provided some clarifications following a discussion on LinkedIn about our contention that, conventional wisdom notwithstanding, database relations -- distinct from mathematical relations -- are by definition not just in 1NF, but also in 5NF, as a consequence of which the RA, as currently defined for 1NF closure, produces what the industry calls "update anomalies" and, thus, is not a proper algebra. In Part 3 we used that information to debunk some leftover misunderstandings in the discussion.

We conclude in Part 4 with comments on a private exchange that followed the public one on LinkedIn regarding the difference between the McGoveran (DMG) and Date and Darwen's (TTM)
interpretations of the RDM, which can be summarized as follows:

Friday, November 19, 2021

THE FATE OF FADS: XML DBMS (obg)



Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.

Remember XML DBMS? At one point it was the fad of the day, similar to today's NoSQL or the old new "knowledge graph" -- "the future that you ignored at the peril of being left behind". As I predicted, it went the way of all fads (ODBMS, Associative DBMS, you name them) together with their "data models" that were nothing of the sort. My prediction was grounded in the same sound foundations I rely on today -- unlike the industry we are progressing it -- that fads lack and which were and still are dismissed, evidence be damned.

Here's a typical example (comments on republication in square brackets).

Thursday, November 11, 2021

Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 2



 with David McGoveran 

(Title inspired by Richard Feynman)

In Part 1 we explained that all database relations are, mathematically, relations, but not all relations are database relations, which are in both 1NF and 5NF and we agreed with a statement in a LinkedIn discussion ending as follows: "Update anomalies are not as big of a problem as an algebra where relations aren't closed under join". Unfortunately, update anomalies, closure, and how relational operators were defined are all interrelated and represent an even "bigger problem". Update anomalies are not "bugs", let alone irrelevant, but actually a reflection of  that much bigger problem.

In this second part we delve into that problem.

Wednesday, October 27, 2021

Nobody Understands the Relational Model: Semantics, Relational Closure and Database Correctness Part 1



 with David McGoveran 

(Title inspired by Richard Feynman)

“As currently defined, relational algebra produces anomalies when applied to non-5NF relations. Since an algebra cannot have anomalies, they should have raised a red flag that RA was not defined quite right, especially defining "relation" as a 1NF table and claiming algebraic closure because 1NF was preserved. Being restricted to tabular representation as the "language" for relationships is like being restricted to arithmetic when doing higher mathematics like differential calculus -- you need more expressive power, not less! Defining RA operations in terms of table manipulations aided initial learning and implementations by making data management look simple and VISUAL. Unfortunately, it was never grasped how much was missing, let alone how much more "intelligent" the RA and the RDBMS needed to be made to fix the problems. And I can see that those oversights were, in part, probably due to having to spend so much time correcting the ignorance in the industry."
--David McGoveran
I recently posted the following Fundamental Truth of the Week on LinkedIn, together with links to more detailed discussions of 1NF and 5NF (see References):
“According to conventional understanding of the RDM (such as it is) [and I don't mean SQL], a relation is in at least first normal form (1NF) -- it has only attributes drawn from simple domains (i.e., no "nested relations") -- the formal way of saying that a relation represents at the logical level an entity group from the conceptual level that has only individual entities -- no groups thereof -- as members. 1NF is required for decidability of the data sublanguage.

However, correctness, namely (1) system-guaranteed logical validity (i.e., query results follow provably from the database) and (2) by-design semantic consistency (of query results with the conceptual model) requires that relations are in both 1NF and fifth normal form (5NF). Formally, the only dependencies that hold in a 5NF relation are functional dependencies of non-key attributes on the PK -- for each PK value there is exactly one value of every corresponding non-key attribute value. This is the formal way of saying that a relation represents facts about a group of entities of a single type.

Therefore we now contend that database relations are BY DEFINITION in both 1NF and 5NF, otherwise all bets are off.”
It triggered a discussion that raised some fundamental issues for which an online exchange is too limiting. This post offers further clarifications, including comments by David McGoveran, on whose interpretation of the RDM (LOGIC FOR SERIOUS DATABASE FOLKS, forthcoming) I rely on. The portions of my interlocutor in the discussion are in quotes.

Sunday, September 19, 2021

TYFK: Calculated Attributes -- Redundancy, Full Normalization and Relational Theory



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“If you have shopping cart, you probably have some field "TOTAL" somewhere that stores the final amount due for the customer. It so happens that such a thing violates relational theory...”

“Having a "TOTAL" field in your "order" table *might* violate relational theory, but if you make it so that only a trigger can update it based on what's in your "order_item" table, then I think it's fine. You still get data integrity and that is what matters.”

“I still fail to see what you mean by the "calculated TOTALS field" (attribute, really) violates the Relational Model.”

“The result of having the field ... is what is called a DELETE ANOMALY.”

“Most denormalizing means adding columns to tables that provide values you would otherwise have to calculate as needed.”

“There are four practical problems with a fully normalized database, three of which I have listed before. I will list them all here for completeness:
* No calculated values. Calculated values are a fact of life for all applications, but a normalized database lacks them. The burden of providing calculated values must be taken up by somebody somehow. Denormalization is one approach to this, though there are others.
--Database Programmer blog

“...I'm now working with IT to normalize part of the database to remove calculated fields...:
`lineitems`.`extended total` = `lineitems`.`units` * `biditems`.`price`.
`jobs`.`jobvalue` = the sum of related `lineitems`.`extended total` records
`orders`.`ordervalue` = the sum of related `jobs`.`jobvalue` records.”
--mySQL.com

Do calculated attributes (not fields!) violate relational theory and must be "normalized" out of them? Determining that requires foundation knowledge that is scarce in the industry, which has a poor and outdated understanding of the RDM.

Saturday, September 11, 2021

OBG: Data Warehouses Are Non-Relational Application Views



Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).


ON DATA WAREHOUSES

(originally posted 11/10/2001)

“It is dawning on me that data warehouse techniques (as advocated by Ralph Kimball) are a response to perceived SQL DBMS performance weaknesses and nothing more. Dimensional modeling is a physical implementation design to deal with what may already be out of date performance tests to back up said design. I agree that where data warehouse logical design deviates from the relational data model there will be trouble down the road. I can attest to the high cost involved in maintaining a data warehouse. I am now questioning three purported benefits of using the current popular data warehouse design techniques.

Physical DBMS designs like the star schema produce faster more predictable query results than a normalized one (I realize normalization is a strictly logical concept but it does appear to have its direct physical mappings in current DBMS systems). Well, I am about to find out. We will be performing some benchmarks. What really are the query times performed on our OLTP system vs. a Star schema for a few relevant reports.

Data warehouses offload query processing from the OLTP system. This is true, but may not be necessary. One needs to thoroughly analyze the traffic on the OLTP system to see if offloading is necessary. We are looking into simply replicating the database (or part of it) for reporting purposes. Replication is far simpler to maintain than a data warehouse.

Data warehouse designs are simpler to understand than a relational one, thereby query construction is easier. I think this is more due to the fact that designers have had a bad habit of throwing up on a wall a full ER chart of their systems. Saying in effect Look how great I am, no one will ever understand this!  Creating ER diagrams of subsystems to describe important characteristics of a database can also be simple to understand.

Item #3 brings up a question. What do you think of OLAP tools such a Microsoft s Analysis Services? We have discovered you can use the tool without redesigning your database. We have seen some pretty fast query times if we take the option of allowing the tool to store data extracted from the production database into its own proprietary format. The opposition to its adoption here is the learning curve to master the MDX query language."
 

Fabian Pascal: The fact is that so-called "dimensional modeling" is logical, not physical modeling. Kimball does not present his "techniques" as just performance-maximization and, what is more, they won’t necessarily yield better performance. He also seems to believe, erroneously, that star-schemas are fully normalized designs. The real solution to performance problems is true RDBMSs with better implementations, not ad-hoc logical designs. Like so many, Kimball simply does not understand relational technology and confuses levels of representation.

I consider warehouses a regression to the good old days of application-biased files -- which we discarded for application-neutral and DBMSs, because they did not prove cost-effective. They are application-specific views of databases that are not derived via relational algebra from relational databases, but are rather non-relational SQL tables. The industry has a long and profitable history of recycling relabeled old failures, witness XML and graph throwbacks to hierarchic databases.

Aside from being necessary for soundness, fully normalized relational databases are the easiest to understand if (1) one thoroughly knows and understands the segment of reality to be represented in the database -- the business, that is -- and (2) data fundamentals. Arbitrary designs such as star-schemas are cope-outs, due to poor knowledge and understanding of the latter.

I am not familiar with the product, but I am generally wary of what vendors do.


Note on re-publication: See also Data Warehouses and the Logical-Physical Confusion.

 

 

 

 

Saturday, July 10, 2021

Relational Misconceptions Part 2: RDM is Applied Theory



In Part 1 we showed (yet again) how even those with their heart in the right (relational) place can't help being affected by the common and entrenched industry misconceptions, in this case about relationships, relations and tables. More often than not authors exhibit the very misconceptions they try to debunk.

We left the author distinguishing sets (with unordered, unique elements) from tables (lists of ordered, possibly duplicate rows). 

View My Stats