Note: I have revised the "Logical Access, Data Sublanguage, Kinds of Relations, Database Redundancy, and Consistency" paper in the "Understanding the Real RDM" series" (available from the PAPERS page) for consistency with this post.
“Recently I have read that SQL is actually a data sublanguage and not a programming language like C++ or Java or C# ... The answers ... have the pattern of "No, it is not. Because it's not Turing complete.", etc, etc. ... I am a bit confused, because since you can develop things through SQL, I thought it is similar to other programming languages ... I am curious about knowing why exactly is SQL not a programming language? Which features does it lack? (I know it can't do loops, but what else more?)”
--StackOverflow.com
“The SQL operators were meant to implement the relational algebra as proposed by Dr. Ted Codd. Unfortunately Dr. Codd based some of his ideas on a "extended set theory", which was an idea formulated and described in a 1977 paper by D. L. Childs ... But Childs’ extensions were not ideally suited, which is explained in quite some detail in [a] book ... by Professor Gary Sherman & Robin Bloor [who] argue that mainstream Zermelo-Fraenkel set theory (Cantor), would have been a better starting point. One key issue is that sets should be able to be sets of sets.”
--Dataversity.net
The concept of a sublanguge cannot be understood without foundation knowledge and familiarity with the history of the database management field, both lacking in the industry.
The
Web is chockful of unnoticed/unquestioned pronouncements by novices or
"experts", many self-taught, that are (1) wrong, or (2) gobbledygook.
Any attempt to demonstrate lack of foundation knowledge underlying these
misconceptions and their practical implications are usually dismissed
as "theory, not practical", attacked as "insulting ad-hominem", or
ignored altogether, regardless of the amount and quality of the
supporting evidence and argument logic. This is understandable: in the
absence of foundation knowledge and ability to reason, it is by
definition impossible to comprehend and appreciate corrections that
require them.
Practitioners who cannot detect such misconceptions
and understand their practical implications and the importance thereof
are insufficiently prepared for a professional career in data
management. Worse, they cannot associate problems with their real causes
and, thus, cannot come up with proper solutions, which explains the
industry's "cookbook approach" and succession of fads.
What about
you? This is another batch in the Test Your Foundation Knowledge
regular series of posts of online statements reflecting common
misconceptions that are difficult to discern without foundation
knowledge. You can test yours by trying to debunk them in Comments,
including what category, (1) or (2) do they fall in? If you can't,
proper education is in order.
Note: This is a re-write of a previous post.
"I have a database for a school ... [with] numerous tables obviously, but consider these:
CONTACT - all contacts (students, faculty): has fields such as LAST, FIRST, ADDR, CITY, STATE, ZIP, EMAIL;
FACULTY - hire info, login/password, foreign key to CONTACT;
STUDENT - medical comments, current grade, foreign key to CONTACT."
"Do
you think it is a good idea to have a single table hold such info? Or,
would you have had the tables FACULTY and STUDENT store LAST, FIRST,
ADDR and other fields? At what point do you denormalize for the sake of
being more practical? What would you do when you want to close out one
year and start a new year? If you had stand-alone student and faculty
tables then you could archive them easily, have a school semester and
year attached to them. However, as you go from one year to the next
information about a student or faculty may change. Like their address
and phone for example. The database model now is not very good because
it doesn’t maintain a history. If Student A was in school last year as
well but lived somewhere else would you have 2 contact rows? 2 student
rows? Or do you have just one of each and have a change log. Which is
best?"
How
would somebody who "does not know past, or new requirements, modeling,
and database design" and messes with a working database just because "he
heard something about (insert your favorite fad here)" figure out
correct from bad answers? Particularly if the answers suffer from the
same lack of foundation knowledge as the question?
Note: This is a re-write -- prompted by a LinkedIn exchange -- of two columns I published @All Analytics.
“Scientific research experiments that "require assignment of data to tables, which is difficult when the scientists do not know ahead of time what analysis to run on the data, a lack of knowledge that severely limits the usefulness of relational [read: SQL] databases.”
NoSQL are recommended in such cases. But what does "scientists do not know ahead of time what analysis to run" really mean?
Data, Information, and Knowledge
One way to view the difference between data, information, and knowledge is:
“1. Data: Categorized sequences of values representing some properties of interest, but if and how they are related is unknown (e.g., research variables in scientific experiments);
2. Information: Properties further organized in named combinations -- "objects", but how they are related is unknown (e.g., "runs", or "cases" in scientific experiments);
3. Knowledge: Relationships among properties and among objects of different types are known.”
--David McGoveran
Note: This is a re-write of two older posts, to bring them into line with McGoveran's formalization, re-interpretation, and extension[1] of Codd's RDM.
In Part 1 we explained that for a database to be relational, database design must adhere to three core principles, in which case it consists of relations that are by definition in both 1NF and 5NF. In Part 2 we showed that whether tables visualize relations (i.e., are R-tables) can be determined only with reference to the conceptual model that the database designer intended the database to represent (not what any users might think it does). This is obscured by the common and entrenched confusion/conflation of levels of representation and, consequently, of types of model -- conceptual, logical, physical, and data model -- that we have so often debunked[2].
In Part 1 we outlined some fundamentals of database design, namely the distinction between normalization to 1NF, and further normalization (to "full" 5NF), and explained that they are necessary only to repair poor designs -- if you (1) develop a complete conceptual model and (2) formalize it properly using the RDM, (3) adhering to the three core principles of database design, you should end up with a relational database in both 1NF and 5NF.
Here we apply this knowledge to the typical request for "normalization" help we presented in Part 1.