One of the clearest indications of poor foundation knowledge in data management practice is misuse and abuse of terminology. Many data professionals are inducted into the industry without a formal education, via programming and software tools, and use terms indiscriminately, as jargon, without understanding them. This has produced weak DBMS implementations and poorly designed databases that put the correctness of databased analytics at risk).
--------------------------------------------------------------------------------
I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so please take advantage of his generosity. Thanks.
---------------------------------------------------------------------------------
For example, 'class' in data management is confused with programming class, an important distinction between class and 'set' is missed, and a 'relation' -- which is a set -- is often thought of as a class.
In object-oriented programming a class is a code template for creating objects that encapsulate data and behavior, each object being an instantiation of a class (i.e., a specific application of the template). In data management, however, a class is not code, but a formal, well defined concept from set theory, one-half of the theoretical foundation of the Relational Data Model (RDM), the understanding of which is critical for proper conceptual modeling, database design, and valid analytics.
A class is a group of objects that share the properties required for group membership (i.e., are of the same type). As I explained in Data Meaning: Analytics vs. Data Mining, there are two kinds of properties required for group membership:
- Individual properties shared by class members
- Collective properties that arise from relationships among (1) individual properties and (2) all members.
Conceptual (or business) modeling formulates business rules that define object classes of interest, each of which is jointly defined by several types of rules that specify the properties required for membership. Applying the rules to corresponding object universes induces sets of members, and these sets are represented formally in the database by relations:
- Facts about each group's members are represented by 'tuples' (displayed as rows);
- Individual properties are represented by 'attributes' defined on 'domains' (displayed as columns);
- Collective property rules are represented by 'constraints' on relations (that constrain them to be consistent with the rules);
Failure by data professionals to understand these fundamentals gives rise to the common mistake of asking if one or more tables (not even guaranteed to be R-tables) are properly designed, without specifying the rules (which denote the meaning assigned by the database designer to the relations represented by the tables) and the corresponding constraints that enforce the rules in the database, the knowledge of which is often insufficient.
In these circumstances databases are not guaranteed to be properly designed and constrained and 'logical validity' and 'semantic correctness' of datasets retrieved by queries for analysis cannot be assumed, which means that insofar as analytics are concerned, all bets are off.
No comments:
Post a Comment