My October post @All Analytics.
Be that as it may, practitioners insist that performance improves when they denormalize databases, because "bundling" facts into less relations reduces joins.
But even if this were always true -- it is not -- performance gains, if
any, do not come from denormalization per se, but from trading off
integrity for performance.
What many data professionals miss is that the redundancy introduced by
denormalization must be controlled by the DBMS to ensure data integrity,
which requires special integrity constraints
that, it turns out, involve the very joins that denormalization is
intended to avoid, defeating its purpose. These constraints are
practically never declared and enforced, which creates the illusion that denormalization improves performance at no cost.
Read it all. (Please comment there, not here)
When I discussed with a book publisher the idea of a guide/reference to misconceptions about data fundamentals, whose objective -- distinct from the usual cookbooks -- is to help data professionals base their practice on understanding, rather than cookbooks, he said "they are not interested in understanding, only in succeeding in their jobs". Apparently, the former is no longer a factor in the latter. Given the increasingly deteriorating experiences I had with publishers, it was time to stop bothering with them -- they pay and do very little -- and self-publish.
THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS - A DESK REFERENCE FOR THE THINKING DATA PROFESSIONAL AND USER is now available for purchase ($35, order via the BOOKS page (not to be confused with the RECOMMENDED BOOKS page); contact me by email for volume discounts).
1. Quote of the Week
"Legion is a Hadoop MapReduce tool that turns big, messy data sources into clean, normalized flat files ready for ingestion into relational tables in a data warehouse (e.g., Postgres COPY)." --GitHub.com
2. To Laugh or Cry?