Follow @DBDebunk
Follow @ThePostWest
The
Web is chockful of unnoticed/unquestioned pronouncements by novices or
"experts", many self-taught, that are (1) wrong, or (2) gobbledygook.
Any attempt to demonstrate lack of foundation knowledge underlying these
misconceptions and their practical implications are usually dismissed
as "theory, not practical", attacked as "insulting ad-hominem", or
ignored altogether, regardless of the amount and quality of the
supporting evidence and argument logic. This is understandable: in the
absence of foundation knowledge and ability to reason, it is by
definition impossible to comprehend and appreciate corrections that
require them.
Practitioners who cannot detect such misconceptions
and understand their practical implications and the importance thereof
are insufficiently prepared for a professional career in data
management. Worse, they cannot associate problems with their real causes
and, thus, cannot come up with proper solutions, which explains the
industry's "cookbook approach" and succession of fads.
What about
you? This is another batch in the Test Your Foundation Knowledge
regular series of posts of online statements reflecting common
misconceptions that are difficult to discern without foundation
knowledge. You can test yours by trying to debunk them in Comments,
including what category, (1) or (2) do they fall in? If you can't,
proper education is in order.
Tuesday, September 17, 2019
Sunday, August 25, 2019
Meaning Criteria and Entity Supertype-Subtypes Relationships
Follow @DBDebunk
Follow @ThePostWest
Note: This is a re-write of a previous post.
Note: This is a re-write of a previous post.
"I have a database for a school ... [with] numerous tables obviously, but consider these:How would somebody who "does not know past, or new requirements, modeling, and database design" and messes with a working database just because "he heard something about (insert your favorite fad here)" figure out correct from bad answers? Particularly if the answers suffer from the same lack of foundation knowledge as the question?
CONTACT - all contacts (students, faculty): has fields such as LAST, FIRST, ADDR, CITY, STATE, ZIP, EMAIL;"Do you think it is a good idea to have a single table hold such info? Or, would you have had the tables FACULTY and STUDENT store LAST, FIRST, ADDR and other fields? At what point do you denormalize for the sake of being more practical? What would you do when you want to close out one year and start a new year? If you had stand-alone student and faculty tables then you could archive them easily, have a school semester and year attached to them. However, as you go from one year to the next information about a student or faculty may change. Like their address and phone for example. The database model now is not very good because it doesn’t maintain a history. If Student A was in school last year as well but lived somewhere else would you have 2 contact rows? 2 student rows? Or do you have just one of each and have a change log. Which is best?"
FACULTY - hire info, login/password, foreign key to CONTACT;
STUDENT - medical comments, current grade, foreign key to CONTACT."
Saturday, June 29, 2019
Friday, June 21, 2019
Data Meaning and Mining: Knowledge Representation and Discovery
Follow @DBDebunk
Follow @ThePostWest
Note: This is a re-write -- prompted by a LinkedIn exchange -- of two columns I published @All Analytics.
One way to view the difference between data, information, and knowledge is:
Note: This is a re-write -- prompted by a LinkedIn exchange -- of two columns I published @All Analytics.
“Scientific research experiments that "require assignment of data to tables, which is difficult when the scientists do not know ahead of time what analysis to run on the data, a lack of knowledge that severely limits the usefulness of relational [read: SQL] databases.”NoSQL are recommended in such cases. But what does "scientists do not know ahead of time what analysis to run" really mean?
Data, Information, and Knowledge
One way to view the difference between data, information, and knowledge is:
“1. Data: Categorized sequences of values representing some properties of interest, but if and how they are related is unknown (e.g., research variables in scientific experiments);
2. Information: Properties further organized in named combinations -- "objects", but how they are related is unknown (e.g., "runs", or "cases" in scientific experiments);
3. Knowledge: Relationships among properties and among objects of different types are known.”
--David McGoveran
Friday, June 14, 2019
Normalization and Further Normalization Part 3: Understanding Database Design
Follow @DBDebunk
Follow @ThePostWest
Note: This is a re-write of two older posts, to bring them into line with McGoveran's formalization, re-interpretation, and extension[1] of Codd's RDM.
In Part 1 we explained that for a database to be relational, database design must adhere to three core principles, in which case it consists of relations that are by definition in both 1NF and 5NF. In Part 2 we showed that whether tables visualize relations (i.e., are R-tables) can be determined only with reference to the conceptual model that the database designer intended the database to represent (not what any users might think it does). This is obscured by the common and entrenched confusion/conflation of levels of representation and, consequently, of types of model -- conceptual, logical, physical, and data model -- that we have so often debunked[2].
Note: This is a re-write of two older posts, to bring them into line with McGoveran's formalization, re-interpretation, and extension[1] of Codd's RDM.
In Part 1 we explained that for a database to be relational, database design must adhere to three core principles, in which case it consists of relations that are by definition in both 1NF and 5NF. In Part 2 we showed that whether tables visualize relations (i.e., are R-tables) can be determined only with reference to the conceptual model that the database designer intended the database to represent (not what any users might think it does). This is obscured by the common and entrenched confusion/conflation of levels of representation and, consequently, of types of model -- conceptual, logical, physical, and data model -- that we have so often debunked[2].
Sunday, June 2, 2019
Normalization and Further Normalization Part 2: If You Need Them, You're Doing It Wrong
Follow @DBDebunk
Follow @ThePostWest
In Part 1 we outlined some fundamentals of database design, namely the distinction between normalization to 1NF, and further normalization (to "full" 5NF), and explained that they are necessary only to repair poor designs -- if you (1) develop a complete conceptual model and (2) formalize it properly using the RDM, (3) adhering to the three core principles of database design, you should end up with a relational database in both 1NF and 5NF.
Here we apply this knowledge to the typical request for "normalization" help we presented in Part 1.
In Part 1 we outlined some fundamentals of database design, namely the distinction between normalization to 1NF, and further normalization (to "full" 5NF), and explained that they are necessary only to repair poor designs -- if you (1) develop a complete conceptual model and (2) formalize it properly using the RDM, (3) adhering to the three core principles of database design, you should end up with a relational database in both 1NF and 5NF.
Here we apply this knowledge to the typical request for "normalization" help we presented in Part 1.
Friday, May 31, 2019
Normalization and Further Normalization Part 1: Databases Representing ... What?
Follow @DBDebunk
Follow @ThePostWest
Note: This is a re-write of older posts (which now link here), to bring them into line with the McGoveran formalization, re-interpretation, and extension[1] of Codd's RDM.
Note: This is a re-write of older posts (which now link here), to bring them into line with the McGoveran formalization, re-interpretation, and extension[1] of Codd's RDM.
“A particular bug-bear and a mistake that +90% of "data modelers" make, is analyzing "point in time" views of the business data and "normalizing" those values hence failing to consider change over time and the need to reproduce historic viewpoints. Let’s say we start with this list of data-items for a Sales-Invoice (completely omitting details of what’s been sold):
SALES-INVOICE
{Invoice-Date,
Customer-Account-ID,
Customer Name,
Invoice-Address-Line-1,
Invoice-Address-Line-2,
Invoice-Address-Line-3,
Invoice-Address-Line-4,
Invoice-Address-Postcode,
Net-Amount,
VAT,
Total-Amount
};
Nearly every time, through the blind application of normalization we get this ... there’s even a term for it -- it’s called "over-normalization":
A measure of scarcity of foundation knowledge in the industry are the attempts to correct a plethora of common misconceptions[2] that suffer from the very misconceptions they aim to correct. One of the most common fallacies is confusion of levels of representation[3] that takes two forms[4]. We have written extensively about the logical-physical confusion (LPC)[5,6,7,8] underlying "denormalization for performance"[9], and the conceptual-logical conflation (CLC) that lumps conceptual with data modeling[10,11,12], inhibiting understanding that the latter is formalization of the former.SALES-INVOICE
{Invoice-Date,
Customer-Account-Id
REFERENCES Customer-Account,
Net-Amount,
VAT,
Total-Amount
};
CUSTOMER-ACCOUNT
{Customer-Account-Id,
Customer-Name,
Invoice-Address
REFERENCES Address
};
ADDRESS
{Address-Line-1,
Address-Line-2,
Address-Line-3,
Address-Line-4,
Postcode
};”
Subscribe to:
Posts (Atom)