DATABASE DEBUNKINGS: LDI

Showing posts with label LDI. Show all posts

Sunday, April 23, 2023

THE DENORMALIZATION ILLUSION (t&n)

Note: "Then & Now" (t&n) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the disregarded evolution/progress of RDM, I am re-visiting my old debunkings, bringing them up to the current state of knowledge. This will enable you to judge how well arguments have held up and realize the increasing gap between industry stagnation -- and scientific progress.

DENORMALIZATION, PERFORMANCE & INTEGRITY

(Email exchange with reader originally published August 2002)

Then ...

“I'd like to comment on your other recent articles: on denormalization. Of course you prove that denormalization does not improve performance, because you pay for it by maintaining integrity. But, when people say that de-normalization improves performance, they usually mean just on one side. For example, I can merge DEPT and EMP tables into a third table DE and achieve a better query performance by replacing a join by a simple select from the new table. If this is the most frequent and most important operation in my application (vs. updates, inserts, deletes), then my overall performance will be improved (and that's what usually happens in DW). But if the opposite is true, then performance will suffer. I didn't see these considerations in your articles ...

Many people, yes, but not nobody. I always considered the cost of denormalization. I know many people in this field that do the same; however, I do agree with you that many people, especially those "younger" ones learning from more "modern" books on database design, especially those in the OO field, are not aware, and what's worse, don't even want to be aware.

That's exactly how I always thought and when I had discussions with people, that's what I always said to them (not that it made a big difference in their thinking). However, when I read your articles on this topic, I had another thought. As you always say (and again, I fully agree with you on this), we must always separate logical and physical. I always considered denormalization as one of the things done at the physical level. So, denormalization shouldn't even be your concern, because it has nothing to do with the relational model. The rule I always follow is that whatever I do at the physical level, it should not destroy my logical model, which must stay normalized. If I denormalize to achieve some performance gains for a selected set of functions, then I do pay for it by writing additional logic to preserve the integrity and by creating views that represent the entities on my logical model, which I had to "destroy". So as long as I separate these two levels, I don't think I'm in any conflict with the relational model. Of course if DBMS gave me more options in physical design while protecting the integrity of my logical model, I wouldn't have to do this myself.

Theoretically, I think the way you do, and that's why I enjoy reading your columns. But I also have to deliver practical results to my users. Unfortunately, I can't go to my users and tell them that their response time is slow because of Oracle's technology. And I don't believe screaming at Oracle will do me any good either (and yes I know what you will say to this). So until that mysterious technology you mentioned many times is implemented, I have to do what I can.”

NEW "DATA MODELS" 5.2 (t&n)

Follow @DBDebunk Follow @ThePostWest

Note: "Then & Now" (T&N) is a new version of what used to be the "Oldies but Goodies" (OBG) series. To demonstrate the superiority of a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, as well as the evolution/progress of RDM, I am re-visiting my 2000-06 debunkings, bringing them up to my with my knowledge and understanding of today. This will enable you to judge how well my arguments have held up and appreciate the increasing gap between scientific progress and the industry’s stagnation, if not outright regress.

This is a re-published series of several DBDebunk 2002 posts on Simon Wlliams' Lazy Software so-called "Associative Model of Data" (AMD), academic claims of its superiority over RDM ("The Associative Data Model Versus the Relational model") and predictions of the demise of the latter ("The decline and eventual demise of the Relational Model of Data").

Part 1 was an email exchange among myself (FP), Chris Date (CJD) and Lee Fesperman (LF) in reaction to Williams' claims that triggered the series.
Part 2 was my response to a reader's email questioning our dismissal of Williams's claims.
Part 3 was my email exchange with Williams where he provided his definition of a data model on which I conditioned any discussion with him and I debunked it.
Part 4 is my response to a reader's comments on my previous posts in the series.
Part 5.1 provided the background for my critique of Edward Hurley's report on Simon Williams's Lazy Software and his so-called "Associative Model of Data" (AMD).

NOBODY UNDERSTANDS THE RELATIONAL MODEL: SEMANTICS, CLOSURE & CORRECTNESS PART 4

Follow @DBDebunk Follow @ThePostWest

with David McGoveran

(Title inspired by Richard Feynman)

In Parts 1 and Part 2 we provided some clarifications following a discussion on LinkedIn about our contention that, conventional wisdom notwithstanding, database relations -- distinct from mathematical relations -- are by definition not just in 1NF, but also in 5NF, as a consequence of which the RA, as currently defined for 1NF closure, produces what the industry calls "update anomalies" and, thus, is not a proper algebra. In Part 3 we used that information to debunk some leftover misunderstandings in the discussion.

We conclude in Part 4 with comments on a private exchange that followed the public one on LinkedIn regarding the difference between the McGoveran (DMG) and Date and Darwen's (TTM)
interpretations of the RDM, which can be summarized as follows:

TYFK: Calculated Attributes -- Redundancy, Full Normalization and Relational Theory

Follow @DBDebunk Follow @ThePostWest

Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“If you have shopping cart, you probably have some field "TOTAL" somewhere that stores the final amount due for the customer. It so happens that such a thing violates relational theory...”

“Having a "TOTAL" field in your "order" table *might* violate relational theory, but if you make it so that only a trigger can update it based on what's in your "order_item" table, then I think it's fine. You still get data integrity and that is what matters.”

“I still fail to see what you mean by the "calculated TOTALS field" (attribute, really) violates the Relational Model.”

“The result of having the field ... is what is called a DELETE ANOMALY.”

“Most denormalizing means adding columns to tables that provide values you would otherwise have to calculate as needed.”

“There are four practical problems with a fully normalized database, three of which I have listed before. I will list them all here for completeness:
* No calculated values. Calculated values are a fact of life for all applications, but a normalized database lacks them. The burden of providing calculated values must be taken up by somebody somehow. Denormalization is one approach to this, though there are others.”
--Database Programmer blog

“...I'm now working with IT to normalize part of the database to remove calculated fields...:
`lineitems`.`extended total` = `lineitems`.`units` * `biditems`.`price`.
`jobs`.`jobvalue` = the sum of related `lineitems`.`extended total` records
`orders`.`ordervalue` = the sum of related `jobs`.`jobvalue` records.”
--mySQL.com

Do calculated attributes (not fields!) violate relational theory and must be "normalized" out of them? Determining that requires foundation knowledge that is scarce in the industry, which has a poor and outdated understanding of the RDM.

Normalization -- Will They Ever Learn?

Follow @DBDebunk Follow @ThePostWest

“To Normalize or not to Normalize? that really isn't a question. few things to consider:
Normalization is supposed to protect from data anomalies, but not prevent us from using data encapsulation is the magic trick that allows you to do what you want without breaking rules.what are your experiences with normalization?”

--LinkedIn

This is a question that at this time need -- and should -- not be asked anymore, and the fact that it still is is one confirmation -- among many -- that there is no progress in data management. According to the current understanding of the RDM:

Database relations are both normalized (in 1NF) and fully normalized (in 5NF) by definition, otherwise they are not relations and the relational algebra (RA) does not work;
Adherence to three database design principles produces 1NF and 5NF relational databases;
Consequently, there should not be such a thing as "doing" normalization (to 1NF) and further normalization (to 5NF) except to repair databases that are non-relational due to failure to adhere to the principles.

Note: The three design principles are fundamental to SST/FOPL foundation of the RDM, but were never understood even by relational proponents. I do not know what encapsulation has to do with this.

OBG: Database Design and Guaranteed Correctness Part 2

Follow @DBDebunk Follow @ThePostWest

Note: This is a re-write of an earlier post (which now links here), to bring it into line with the current understanding of the RDM derived from McGoveran formalization and interpretation of Codd's work[1]. Reference [9] is also an important re-write and is recommended pre-requisite for this post.

Continued from Part 1

“The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views ... However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system(DBMS). The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must:

Determine the data to be stored in the database.
Determine the relationships between the different data elements.
Superimpose a logical structure upon the data on the basis of these relationships.
Within the relational model the final step above can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects.”
--What is a Relational Database, Quora.com

There is, typically, much vagueness and confusion here and instead of debunking it makes more sense to provide a rigorous description of what database design really is: formalization of a conceptual model -- expressed as business rules -- as a logical model for representation in the database using a formal data model. If the data model is the RDM, the logical model consists of relations constrained for semantic consistency with the conceptual mode, the constraints being formalizations of the business rules.

OBG: Data Independence and "Physical Denormalization"

Follow @DBDebunk Follow @ThePostWest

Note: I am re-publishing some of the articles and reader exchanges from the old DBDebunk (2000-06). How well do they hold up -- have industry knowledge and practice progressed? Judge for yourself and appreciate the difference between a sound foundation and the fad-driven cookbook approach.

January 2, 2001

"... one of the "4 great lies" is "I denormalize for performance." You state that normalization is a logical concept and, since performance is a physical concept, denormalization for performance reasons is impossible (i.e., it doesn't make sense). What term would you use to describe changing the physical database design to be different from the logical design to enhance performance? Because normalization is a logical concept, you imply that this is not called denormalization."

The RDM and Model Stability

Follow @DBDebunk Follow @ThePostWest

“3rd normal form data models in data warehousing efforts struggle when changes impact parent child relationships. These impacts cause cascading changes to the data model, the queries, and the loading processes. [For example:]

There are bank accounts

Each account belongs to exactly one customer

A customer can have more than one account

The bank introduces a new product: joint accounts, which means an account can now have more than one owner. It is clear that the 3NF model has to be extended in order to keep this new information; the data vault models seems to be able to fulfill the new requirement.

Some banks propose joint accounts, some don’t, therefore some use M:N relation between client and accounts and others 1:N. A model which is good for any possible case is actually awful model because it describes nothing: by looking at this model you can’t say if joint accounts exist among bank's products.”

--Data Vault and Model (in)Stability

Data warehousing/vault[1] are a red herring here -- the real issue is data independence. Some corrections and clarifications first:

Normal forms do not pertain to the data model itself -- the RDM -- but to relations in logical models created using strictly the RDM[2].
3NF is insufficient -- relations are in 5NF by definition, otherwise correctness is not guaranteed[3].
The RDM was introduced as a database representation superior to old directed graph -- hierarchic and network (CODASYL) -- systems for conceptual models focused on relationships among entity groups, rather than among individual entities[4]. Graph database representation (nodes and edges) corresponds to a worldview at the conceptual level of parents-children (network) relationships, of which parent-children (hierarchy) is a special case. The relational representation (relations) corresponds to M:N relationships among entity groups, of which M:1 is a special case[5].

Note: Correctness -- logical and semantic[6] -- requires adherence to three principles of database design that jointly imply 5NF[7].

Comments on a Stonebraker Article

Follow @DBDebunk Follow @ThePostWest

These comments were prompted by a LinkedIn post referencing Michael Stonebraker's Those Who Forget the Past Are Doomed to Repeat It -- something I often reiterate myself -- where he argues:

“Over the past decade, there have been a number of DBMSs introduced (typically labeled as NoSQL) which utilize a network or hierarchical data model. MongoDB and Cassandra come immediately to mind as examples. Some such systems support networks through the concepts of "links" and some support hierarchical data using a nested data model often utilizing JSON. In my opinion, these systems have not internalized lessons from history.

“At the SIGFIDET (now SIGMOD) annual conference in 1974, there was a "Great Debate" over the merits of the relational model versus the network and hierarchical models ... Basically, the argument was about which model [relational or network] was a better fit for structured data (as opposed to documents, e-mails, etc.) and boiled down to two questions:

Question 1: Are high-level data sublanguages a good idea?
Question 2: Are tables the best data structure or should one use a network or hierarchy?”

“The last 45 years have definitely affirmed Codd’s position on both issues ... The conclusion from the 1970s was that the relational model provides superior data independence, compared to the network and hierarchical [graph] models. Forty-five years later, this conclusion is still true. If you want to insulate yourself from the changes that business conditions dictate, use a relational DBMS. If you want the successor to the successor to your job to thank you for your wise decision, use a relational model.”

I couldn't agree more, having repeatedly argued this myself. But he misses some old aspects that the industry has failed to recognize, has ignored, or dismissed[1]; and some important new aspects due to a new understanding of Codd's work[2].

Data Sublanguage Part 1: Relational vs. Computational Completeness

Follow @DBDebunk Follow @ThePostWest

Note: I have revised the "Logical Access, Data Sublanguage, Kinds of Relations, Database Redundancy, and Consistency" paper in the "Understanding the Real RDM" series" (available from the PAPERS page) for consistency with this post.

“Recently I have read that SQL is actually a data sublanguage and not a programming language like C++ or Java or C# ... The answers ... have the pattern of "No, it is not. Because it's not Turing complete.", etc, etc. ... I am a bit confused, because since you can develop things through SQL, I thought it is similar to other programming languages ... I am curious about knowing why exactly is SQL not a programming language? Which features does it lack? (I know it can't do loops, but what else more?)”

--StackOverflow.com

“The SQL operators were meant to implement the relational algebra as proposed by Dr. Ted Codd. Unfortunately Dr. Codd based some of his ideas on a "extended set theory", which was an idea formulated and described in a 1977 paper by D. L. Childs ... But Childs’ extensions were not ideally suited, which is explained in quite some detail in [a] book ... by Professor Gary Sherman & Robin Bloor [who] argue that mainstream Zermelo-Fraenkel set theory (Cantor), would have been a better starting point. One key issue is that sets should be able to be sets of sets.”

--Dataversity.net

The concept of a sublanguge cannot be understood without foundation knowledge and familiarity with the history of the database management field, both lacking in the industry.

Graph Databases: They Who Forget the Past...

Follow @DBDebunk Follow @ThePostWest

Out of the plethora of misconceptions common in the industry[1], quite a few are squeezed into this paragraph:

“The relational databases that emerged in the ’80s are efficient at storing and analyzing tabular data but their underlying data model makes it difficult to connect data scattered across multiple tables. The graph databases we’ve seen emerge in the recent years are designed for this purpose. Their data model is particularly well-suited to store and to organize data where connections are as important as individual data points. Connections are stored and indexed as first-class citizens, making it an interesting model for investigations in which you need to connect the dots. In this post, we review three common fraud schemes and see how a graph approach can help investigators defeat them.”

--AnalyticBridge.DataScienceCentral.com

Relational databases did not emerge in the 80s (SQL DBMSs did);

There is no "tabular data" (the relational data structure is the relation, which can be visualized as a table on a physical medium[2], and SQL tables are not relations);

Analysis is not a DBMS, but an application function (while database queries, as deductions, are an important aspect of analysis, and computational functions can be added to the data sublanguage (as in SQL), the primary function of a DBMS is data management)[3];

A data model has nothing to do with storage (storage and access methods are part of physical implementation, which determines efficiency/performance[4]).

Here, however, we will focus on the current revival (rather than emergence) of graph DBMSs claimed superior -- without any evidence or qualifications -- to SQL DBMSs (not relational, which do not exist) that purportedly "make it difficult to connect data scattered across multiple tables". This is a typical example of how lack of foundation knowledge and of familiarity with the history of the field inhibit understanding and progress[5].

Class, Type, Set, Relvar, and Relation

Follow @DBDebunk Follow @ThePostWest

Note: This is a rewrite of a part of an older post (now redirecting here), to bring into line with McGoveran's formalization, re-interpretation, and extension of Codd's RDM[1] (the rewrite of the other part was posted last week).

“[According to Date] relvar ≠ class. [But i]n simple terms, class applies to a collection of values allowed by a predicate, regardless of whether such a collection could actually exist. Every set has a corresponding class, although a class may have no corresponding set ... in mathematical logic, a relation is a class (and trivially also a set), which contributes to confusion.”

“In modern programming parlance, class is generally distinguished from type only in that the latter refers to primitive (system-defined) data definitions, while class refers to higher-level (user-defined) data definitions. This distinction is almost arbitrary, and in some contexts, type and class are actually synonymous.”

Class, type, and set are often used interchangeably in the industry. Relations are neither class, nor type, and Date's relvars must be placed properly in their formal context. While details regarding these concepts vary with the flavor of set theory, they are sufficiently well defined to be distinguishable in each of the three formal foundations of the RDM, simple set theory (SST), mathematical relation theory, and first order predicate logic (FOPL).

What Is a Data Model, and What It Is Not

Follow @DBDebunk Follow @ThePostWest

“The term data model is used in two distinct but closely related senses. Sometimes it refers to an abstract formalization of the objects and relationships found in a particular application domain, for example the customers, products, and orders found in a manufacturing organization. At other times it refers to a set of concepts used in defining such formalizations: for example concepts such as entities, attributes, relations, or tables. So the "data model" of a banking application may be defined using the entity-relationship "data model". This article uses the term in both senses.”

--Data Model, Wikipedia

What a True Data Model Is

Few practitioners realize that Codd invented the Relational Data Model (RDM) as the first exemplar of a data model, a concept that he formalized in 1980 as follows:

Order Is For Society, Not Databases

Follow @DBDebunk Follow @ThePostWest

8/18/18: I have re-written this post for a better explanation. If you read it prior to the revision, you should re-read it.

“I learned that there is no concept of order in terms of tuples (e.g. rows) in a table, but according to wikipedia "a tuple is an ordered list of elements". Does that mean that attributes do have an order? If yes why would they be treated differently, couldn't one add another column to a table (which is why the tuples don't have order)? [OTOH], "In this notation, attribute–value pairs may appear in any order." Does this mean attributes have no order?”

--Do the “columns” in a table in a RMDB have order?

“Is it possible to reorder rows in SQL database? For example, how can I swap the order of 2nd row and 3rd row's values? The order of the row is important to me since i need to display the value according to the order [and] 'Order by' won't work for me. For example, I put a list of bookmarks in database. I want to display based on the result I get from query. (not in alphabet order). Just when they are inserted. But user may re-arrange the position of the bookmark (in any way he/she wants). So I can't use 'order by'. An example is how the bookmark display in the bookmark in firefox. User can switch position easily. How can I mention that in DB?”

--How can I reorder rows in sql database

While some data professionals may know that rows and columns of "database tables" are "unordered", few of them know what that means, and understand why. This is due to two, not unrelated, of the many common misconceptions[1] rooted in the lack of foundation knowledge in the industry, namely that relational databases consist of tables[2], and logical-physical confusion (LPC)[3]. They obscure understanding of the RDM and its practical implications, which is reflected in the answers to the above questions. Instead of debunking them, this post fills the gap in knowledge such that you can debunk them yourself -- try it before and after you read it.

Understanding Relations Part 1: Tables? So What?

Follow @DBDebunk Follow @ThePostWest

Note: This is a re-write of two older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation of Codd's real RDM, including his own refinements, corrections, and extensions[1]

“Put simply, a "relation" is a table, the heading being the definition of the structure and the rows being the data.”

“In simple English: relation is data in tabular format with fixed number of columns and data type of each column. This can be a table, a view, a result of a subquery or a function etc.”

“Practically, a "Relation" in relational model can be considered as a "Table" in actual RDBMS products(Oracle, SQL Server, MySQL, etc), and "Tuples" in a relation can also be considered as "Rows" or "Records" in a table.”

“In common usage, however, when someone refers to a "relation" in a database course, they are referring to a tabular set of data either permanently stored in the database (a table) or derived from tables according to a mathematical description (a view or a query result).”

“In SQL RDBMSes (such as MS SQL Server and Oracle] tables are permently stored relations, where the column names defined in the data dictionary form the "heading" and the rows are the "tuples" of the relation. Then from a table, a query can return a different relation.”

“Data is stored in two-dimensional tables consisting of columns (fields) and rows (records). Multi-dimensional data is represented by a system of relationships among two-dimensional tables.”

“I read [that] "Relations are multidimensional. They are not flat. They are not two dimensional. Don't let the term table mislead you." on the back cover of CJ Date's DATABASE IN DEPTH. Can anyone help how to visualize this multidimensional nature of relations?”

Because SQL DBMSs have been sold as relational databases (which they are not), and in SQL the data structure is the table, in the absence of foundation knowledge[2] most practitioners think that relational databases consist of tables, but do not ask themselves why and how is that significant for database practice. The subtitle of this post is a question I used to ask in presentations years ago that always got silence. I see no evidence of improvement -- in fact, it's gotten worse. To emulate Feynman, "Nobody understands the RDM".

That such a simple and commonly understood structure can visualize relations is an advantage of the RDM, but a table is not a relation and, SQL notwithstanding, confusing the two reflects a lack of understanding of the RDM, misses its significance for database practice, and prevents taking full advantage of its benefits.

Note: The table is the preferred way to picture relations, there are others (e.g., array).

First, the fundamentals.

DBMS for Analytics: Risky Business Without Foundation Knowledge, Part 2

Follow @DBDebunk Follow @ThePostWest

Note: This was originally posted at AllAnalytics, which no longer exists, so some links to other posts there no longer work, but I left them in to alert the reader that I have written on those specific subjects. Other links work.

In this two-part series I alert analysts that correct interpretation and assessment of media/industry claims without being misled requires a good grasp of data fundamentals. In Part 1, I discussed the logical-physical confusion and the erroneous missclassification of DBMSs as relational and non-relational underlying the argument that the latter are superior to the former for analytics applications. In Part 2, I discuss a third misconception behind the claim.

Understanding Conceptual vs. Data Modeling Part 1: Data Model - The RDM Is, the E/RM Isn't

Follow @DBDebunk Follow @ThePostWest

Re-write 10/16/18

“E/RM is a data model -- So says Date, Chen, etc. So says the majority of current industry experts ... With very strong references to Codd (who he worked with), Date elegantly explains the differences between RM and E/RM -- but clearly believes both are data models (even allowing for the charitable comment). If we take a RDB as the ultimate target implementation of data, and an E/RM (or extended) can correctly design all the artifacts that are implemented, this means it is modeling the data. Granted, an E/RM does not explicitly model some of the non-structural aspects of the original Codd definition.”

“Out of interest, is there a common Relational Modeling tool, that is not also an E/RM tool and models the full Codd definition? There are also several other methods of modeling data -- E/RM is more a mechanism to represent the data. If E/RMs are used by IT professionals across the world to direct the design and build of the majority of applications guided by standard methodologies, is the view of this argument that these were all build wrongly? Regardless of success? Is the inferred conclusion that only the RM models data, and ERM, [or] any other techniques do not? [If so] that is a little limiting.”

Objects, Properties, and Ontological Commitment

We are culturally and linguistically conditioned to conceptualize the world as objects with properties. Objects in a universe thereof that share common properties are of the same type and form a class, distinguishing them from objects that are not and do not. Applying a class definition to the universe selects out the group of objects of that type from the universe.

Philosophical ontology is the study of being, existence, reality, as well as the basic categories of being, and their relationships -- what entities exist or may be said to exist, and how they may be grouped, related, and subdivided according to similarities and differences.

Note: 'Object' is used in the general, not OO sense. Ontology, as used herein, should not be confused with "computer science ontology", whereby the term ontology was usurped, and is understood by programmers as meaning a conceptual graph of directed semantic relationships among objects (and only sometimes among object types).

Conceptual modeling (1) identifies types of objects of interest, and (2) formulates business rules (BR) that specify their properties and relationships and, as such, makes an ontological commitment. Any approach to conceptual modeling must consider the ontological commitment upon which it is based, which has major implications for the data model used to formalize conceptual models as logical models for computable database representation -- it must be consistent with that commitment.

Unfortunately, due to lack of foundation knowledge in the industry[1], practitioners -- both vendors and users -- are largely unaware of, and oblivious to ontological underpinning and their implications for database technology and practice, one reason why they not only stagnated, but regressed in the last five decades. In this multipart series we explain the important distinction between conceptual, and data modeling (aka logical database design), which requires a formal data model. The E/RM is not, and while it can be used for conceptual modeling of reality, not data, we outline a new conceptual modeling approach that makes a different ontological commitment and requires adjustments to the RDM, both necessary for genuine progress.

Understanding the Division of Labor between Analytics Applications and DBMS

Follow @DBDebunk Follow @ThePostWest

I am coming across, on the one hand, instructions on how to do "analytics with SQL" and, on the other, tools purporting to enable "analytics without SQL." They are an umpteenth iteration of essentially similar ideas during my 30-plus years in data management and reflect common and entrenched fundamental misconceptions that I have documented and analyzed the costly consequences of in my writings and teachings. They will keep repeating, inhibiting genuine progress, as long as data fundamentals are ignored or dismissed. One of the least understood is the distinction between DBMS and application functions.

Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness

Follow @DBDebunk Follow @ThePostWest

Note: This is a 8/24/17 rewrite of a 5/20/13 post to bring it in line with McGoveran's formal exposition of Codd's RDM [1] and its correct interpretation.

08/25/17: I have added formal definitions of logical validity and semantic correctness.
09/01/17: Minor revisions.
09/02/17: Added references.
03/15/18: Minor revisions.

Here's what's wrong with last week's picture, namely:

"In my experience, using an object model in both the application layer and in the database layer results in an inefficient system. This are my personal design goals:
- Use a relational data model for storage
- Design the database tables using relational rules including 3rd normal form
- Tables should mirror logical objects, but any object may encompass multiple tables- Application objects, whether you are using an OO language or a traditional language using structured programming techniques should parallel application needs which most closely correspond to individual SQL statements than to tables or "objects". --LinkedIn.com

On View Updating (C. J. Date and D. McGoveran)

Follow @DBDebunk Follow @ThePostWest

My recent posts on denormalization [1], identical relations [2] and the POOD [3,4,5] based on D. McGoveran (DMG) interpretation of Codd's RDM, triggered online reactions [6,7] and some comments in place that reflect the current understanding of the RDM. One of my readers referred me back to a 2004 exchange triggered by an exchange @old dbdebunk.com with both CJD and DMG on view updating--an important aspect discussed in my posts--on which the two perspectives differ. Last week I asked what's wrong with CJD's position in the exchange. Here is the original exchange, albeit in abbreviated form (I made minor revisions for clarity and added references to more recent sources the reader may want to consult, such as the 2013 CJD book on the subject [8], which also purports to describe some of DMG's more recent thinking), in which DMG counters with his position and adds a new comment. Throughout, I've changed "relvar predicate" (still used by CJD) to "relation predicate", as preferred in the DMG interpretation.

POSTS

Sunday, April 23, 2023

DENORMALIZATION, PERFORMANCE & INTEGRITY

Then ...

Monday, January 2, 2023

Saturday, December 11, 2021

Sunday, September 19, 2021

Monday, February 1, 2021

Friday, January 1, 2021

Monday, July 20, 2020

Friday, December 20, 2019

Friday, November 1, 2019

Sunday, September 22, 2019

Wednesday, March 27, 2019

Saturday, February 16, 2019

Sunday, December 2, 2018

What a True Data Model Is

Wednesday, August 15, 2018

Sunday, June 24, 2018

Friday, December 29, 2017

Wednesday, November 8, 2017

Objects, Properties, and Ontological Commitment

Monday, October 2, 2017

Sunday, August 27, 2017

Tuesday, December 20, 2016