Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).
“The relational calculus is good in describing sets. But it´s bad at describing relations between data in different sets. Explicit identities (primary keys) need to be introduced and normalization is needed to avoid update inconsistencies due to duplication of data. To say it somewhat bluntly: The problem with the relational calculus and RDBMS etc. is the focus on data. It´s seems to be so important to store the data, that connecting the data moves to the background. That might be close to how we store filled in paper forms. But it´s so unlike how the mind works. There is no data stored in your brain. If you look at the fridge in your kitchen, there is no tiny fridge created in your brain so you can take the memory of your fridge with you, when you leave your kitchen.” --Weblogs.asp.net
The lack of foundation knowledge exposed by the above paragraph is so complete that its claims are practically upside down and backwards.
Fundamentals
As
we have demonstrated, in mathematical set theory a relation (set) is a
subset of a cross-product of domains (sets). In other words, it is a set
that is a relationship among sets. Being abstract (i.e., having no
real world meaning), the values of mathematical relations can be
arbitrary.
The RDM is an application of simple set theory
expressible in first order predicate logic (SST/FOPL) to database
management: a relational database represents a conceptual model of some
reality, namely (facts about) a multigroup in the real world -- a
collection of related entity groups -- each database relation
representing one such group; a database is also a set of related relations. The values in database relations (i.e., the
data) are, thus, not arbitrary, but must be consistent with the conceptual
model: relations and the database as a whole are semantically
constrainted to be so consistent: (1) individual properties of entities
and (2) collective properties of (a) groups (i.e., relationships among
entities within groups), and (b) the multigroup (i.e., relationships
among groups).
A primary key (PK) represents names given in the
real world to entities of a given type, and the corresponding PK
constraint (uniqueness) enforces consistency of a relation with the
distinguishability of those entities in the real world, the facts about which it represents. These are not
RDM artifacts, but rather part of the adaptation of SST/FOPL to database management.
For the primary advantage of
the RDM -- guaranteed correctness of query results (i.e., inferences
made from the database) -- to materialize, logical database design must
adhere to three core principles which, jointly, imply fully normalized
relations (5NF). In fact, in RDM relations are in 5NF by definition,
otherwise they are not relations -- relational algebra (RA) operations lose information and
all bets are off.
The RA is the manipulative
component of the RDM -- a collection of primitive and derived set
operations on relations that describe
relationships among relations. For example, the join operation r1 JOIN
r2 describes a relationship between r1 and r2 relation, the result itself a relation. Note that since every result of a RA operation on even one relation is always a relation and still describes a relationship -- between the "input" and "output" relations.
A data model -- and, industry claims notwithstanding,
the only one satisfying Codd's definition that has been formalized is the
RDM -- is by nature focused on data. However, the RDM supports
physical independence (PI) and, thus, not concerned with how data is
physically stored and accessed. The notion of "files stored in paper
form" is an example of the common and entrenched logical-physical
confusion (LPC) due to failure to understand the distinction between a
logical relation and its tabular visualization on a physical medium,
induced/reinforced by the industry's "direct image" implementation of
SQL DBMSs.
Conclusion
We rephrase the above paragraph as follows:
“The relational algebra describes relationships among relations (sets). Primary keys are one of the adaptations of the SST/FOPL for database management: a PK constraint -- uniqueness -- represents formally in the database a within-group relationship among all its entities.
Mandatory adherence to three core design principles jointly imply full normalization, which is necessary to guarantees correctness of query results. True RDBMSs:
- Implement the RA for logical data retrieval independent of how the data is physically stored and accessed. SQL DBMSs notwithstanding, vendors are free to store data whichever way they want as long as they don't expose it to users in applications.
- Enforce relational constraints that are formal database representations of relationships in the conceptual model represented by the database.”
The "brain" stuff is sheer nonsense.
No comments:
Post a Comment