Follow @DBDebunk
Follow @ThePostWest
My October post @All Analytics.
Be that as it may, practitioners insist that performance improves when they denormalize databases, because "bundling" facts into less relations reduces joins.
But even if this were always true -- it is not -- performance gains, if
any, do not come from denormalization per se, but from trading off
integrity for performance.
What many data professionals miss is that the redundancy introduced by
denormalization must be controlled by the DBMS to ensure data integrity,
which requires special integrity constraints
that, it turns out, involve the very joins that denormalization is
intended to avoid, defeating its purpose. These constraints are
practically never declared and enforced, which creates the illusion that denormalization improves performance at no cost.
Read it all. (Please comment there, not here)
Monday, October 24, 2016
Sunday, October 16, 2016
THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS Available to Order
Follow @DBDebunk
Follow @ThePostWest
When I discussed with a book publisher the idea of a guide/reference to misconceptions about data fundamentals, whose objective -- distinct from the usual cookbooks -- is to help data professionals base their practice on understanding, rather than cookbooks, he said "they are not interested in understanding, only in succeeding in their jobs". Apparently, the former is no longer a factor in the latter. Given the increasingly deteriorating experiences I had with publishers, it was time to stop bothering with them -- they pay and do very little -- and self-publish.
THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS - A DESK REFERENCE FOR THE THINKING DATA PROFESSIONAL AND USER is now available for purchase ($35, order via the BOOKS page (not to be confused with the RECOMMENDED BOOKS page); contact me by email for volume discounts).
When I discussed with a book publisher the idea of a guide/reference to misconceptions about data fundamentals, whose objective -- distinct from the usual cookbooks -- is to help data professionals base their practice on understanding, rather than cookbooks, he said "they are not interested in understanding, only in succeeding in their jobs". Apparently, the former is no longer a factor in the latter. Given the increasingly deteriorating experiences I had with publishers, it was time to stop bothering with them -- they pay and do very little -- and self-publish.
THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS - A DESK REFERENCE FOR THE THINKING DATA PROFESSIONAL AND USER is now available for purchase ($35, order via the BOOKS page (not to be confused with the RECOMMENDED BOOKS page); contact me by email for volume discounts).
Monday, October 10, 2016
This Week
Follow @DBDebunk
Follow @ThePostWest
1. Quote of the Week
1. Quote of the Week
"Legion is a Hadoop MapReduce tool that turns big, messy data sources into clean, normalized flat files ready for ingestion into relational tables in a data warehouse (e.g., Postgres COPY)." --GitHub.com2. To Laugh or Cry?
- Everything you need to know about data modelling for document databases
- Data denormalization is broken
- When To Avoid JSONB In A PostgreSQL Schema
Wednesday, September 28, 2016
Re-write
Follow @DBDebunk
Follow @ThePostWest
My September column @All Analytics:
I've posted a revision here.
My September column @All Analytics:
I've posted a revision here.
Monday, September 26, 2016
This Week
Follow @DBDebunk
Follow @ThePostWest
1. Quote of the Week
1. Quote of the Week
"Which leads to another bad experience: the pernicious use of foreign keys. In the ORMs I've used, links between classes are represented in the data model as foreign keys which, if not configured carefully, result in a large number of joins when retrieving the object. (A recent count of one such table in my work resulted in over 600 attributes and 14 joins to access a single object, using the preferred query methodology.)
...When you have foreign keys, you refer to related identities with an identifier. In your application, "identifier" takes on various meanings, but usually it's the memory location (a pointer). In the database, it's the state of the object itself. These two things don't really get along because you can really only use database identifiers in the database (the ultimate destination of the data you're working with)." --wozniak.ca
Monday, September 19, 2016
The Principle of Orthogonal Database Design Part I
Follow @DBDebunk
Follow @ThePostWest
Note: This is a 11/24/17 re-write of Part I of a three-part series that replaced several older posts (the pages of which which now redirect here), to bring in line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
Note: This is a 11/24/17 re-write of Part I of a three-part series that replaced several older posts (the pages of which which now redirect here), to bring in line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
"The principle of orthogonal design (abbreviated POOD) ... is the second of the two principles of database design, which seek to prevent databases from being too complicated or redundant, the first principle being the principle of full normalization (POFN). Simply put, it says that no two relations in a relational database should be defined in such a way that they can represent the same facts. As with database normalization, POOD serves to eliminate uncontrolled storage redundancy and expressive ambiguity, especially useful for applying updates to virtual relations (views). Although simple in concept, POOD is frequently misunderstood ... is a restatement of the requirement that a database is a minimum cover set of the relational algebra. The relational algebra allows data duplication in the relations that are the elements of the algebra. One of the efficiency requirements of a database is that there be no data duplication. This requirement is met by the minimum cover set of the relational algebra." --Wikipedia.orgWell, not quite.
Monday, September 12, 2016
This Week
Follow @DBDebunk
Follow @ThePostWest
1. Quote of the Week
2. To Laugh or Cry?
1. Quote of the Week
"Data sense-making does not benefit from the relational data model. Dr. Codd’s rules for relational modeling were designed to improve efficiencies in data processing and storage, not to make data more intelligible. In fact, structuring data relationally makes the work of data sensemaking more difficult, which is why Dr. Kimball created dimensional data modeling and also why an entire industry of middleware products emerged to hide the complexities of relational models." --Stephen Few, PerceptualEdge.com
2. To Laugh or Cry?
Subscribe to:
Posts (Atom)