Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).
“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:
- Data is stored in tables called relations.
- Relations can be normalized. In normalized relations, values saved are atomic values.
- Each row in a relation contains a unique value.
- Each column in a relation contains values from a same domain.”
SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics
column. The site was discontinued in 2018. The content here is not available
anywhere else, so if you deem it useful, particularly if you are a regular
reader, please help upkeep it by purchasing publications, or donating. On-site
seminars and consulting are available.Thank you.
LATEST UPDATES
-12/24/20: Added 2021 to the POSTS page
-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page
LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations,
Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE
REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition
of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper
#1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my
latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide
Mauri).
USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or
acronyms of the terms listed on the FUNDAMENTALS
page. For detailed
instructions on how to understand and use the labels in conjunction with the
that page, see the ABOUT
page. The 2017 and 2016 posts,
including earlier posts rewritten in 2017 were relabeled accordingly. As other
older posts are rewritten, they will also be relabeled. For all other older
posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns
to dbdebunk, within which only links to sources external to AllAnalytics may
work or not.
SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as
To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on
LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the
only universally acceptable hatred left – as the (traditional) response to the
existential crisis of decadence and decline of Western civilization
(including the US).
- @ThePostWest on Twitter where I comment on global
#Antisemitism/#AntiZionism and the Arab-Israeli conflict.
------------------------------------------------------------------------------------------------------------------
Fundamentals
No data model compliant with Codd's definition thereof -- a theory-based combination of structure/integrity and manipulation -- other than the RDM has been formalized (those who pretend otherwise, specifiy -- precisely, please! -- the components of any such model and the theory on which it is based).
It is the dual theoretical grounding of the RDM -- simple set theory (SST) expressible in first order predicate logic (FOPL) -- that makes it "scientific", namely it enables a relational database system -- a RDBMS and properly designed database -- to ensure correctness, as well as a host of other advantages.
Most practitioners believe SQL DBMSs are relational, but they are not; neither are databases designed as required by the RDM. While SQL was intended to be a relational data sublanguage, it violates the RDM, it is not just a data sublanguage, and is a poorly designed language.
While a database relation can be displayed as a R-table on some physical medium -- tuples as rows, attributes as columns -- it is not a table: the row-column arrangement on the medium plays no part in the RDM.
The RDM is mute on how relation data is represented in storage -- any physical structure is permitted, provided it is not exposed to applications and users -- which is referred to as physical independence (PI).
Note: A SQL table, while not a relation, is a logical structure, the data of which may be stored (base) or not (view). A direct image representation as an indexed sequential file is common, but the RDM neither mandates, nor prohibits it, or any other implementation. SQL DBMSs, however, usually restrict to it.
“There are many physical structures that could be used to represent a logical relation in storage (e.g., a linked list with multiple orderings, or an indexed file storing the join of the relations, all maintained transparently by the DBMS), but products do not offer such choices -- only one is usually possible (essentially one or two flat files, possibly ordered by some key in SQL DBMSs). Because of this 1:1 mapping of a logical model to one specific physical representation, practitioners end up modifying the former and manifest a change in the latter, creating the (false) impression that denormalization improves performance.” --David McGoveranIndustry misconceptions nowithstanding, a database relation is by definition in both first normal form (1NF) and fifth normal form (5NF), otherwise it is not a relation and all bets are off.
- 1NF means attributes draw their values from simple domains, the values of which are treated as atomic ("non-decomposable") by the data sublanguage (i.e., even if they have internal structure/components, they are not accessible to the language).
- 5NF means the only attribute dependencies that hold in a relation are functional dependencies of the non-key attributes on the primary key (PK) (informally, the relation represents (facts about) a group of entities of a single type).
A relation attribute represents either a property that describes entities (in which case it is non-key), or a name that identifies (i.e., references) them (in which case it is a PK).
The Debunking
- The popular model is the "SQL model", confused with RDM, which it is not (and, strictly speaking, is not even a proper data model by Codd's definition); no other Codd-compliant data models have been formalized).
- The RDM is based on SST expressible in FOPL; its structure/integrity is the database semantically constrained relation, which is displayable as -- but is not -- a R-table; tables play no part in RDM.
- tuples display as rows and each has a unique PK value;
- attributes display as columns -- each draws its values from a simple domain.
- Data in relations are not stored in tables, which are a logical -- not physical storage -- structure; a direct image representation as a sequential file is the storage commonly implemented by SQL DBMSs.
- By definition relations are not only normalized (in 1NF) -- attributes are defined on simple domains with non-decomposable values -- but also fully normalized (in 5NF).
No comments:
Post a Comment