My May post @All Analytics:
We are constantly told how data scientists
must be “jacks of many skills”, but one of the most important is rarely
included in the list.Very few databases are properly designed. Many SQL databases are
denormalized inadvertently, or intentionally (and erroneously) "for performance". They
require special constraints to control data redundancy and prevent
inconsistencies, which are practically never enforced. Analysts cannot,
therefore, take database consistency for granted. Furthermore, to issue
sensible queries and ensure correct results and interpretation thereof,
it’s not enough for analysts to know the types of fact represented in
the database, but also whether and how the database designer has chosen
to bundle -- nest or merge -- those facts and how to disentangle them
for analysis.
Read it all. (Please comment there, not here)
No comments:
Post a Comment