Friday, September 27, 2019

Data Sublanguage Part 2: Data Manipulation and Definition




Revised 10/10/2019.

In Part 1 we showed that Codd intended in 1969 to base the RDM on axiomatic set theory (AST) and second order logic (SOL) to accommodate relation-valued domains (RVD) (i.e., sets of sets), but that for the benefit of relational advantages and to avoid SOL problems he had to trade off the expressive power of AST/SOL for the simple set theory (SST) of proper sets (i.e., relations in normal form) expressible in first order predicate logic (FOPL) and, thus, computational for relational completeness[1]. He retained the power of the former for applications by hosting a relationally complete FOPL-based language expressing the RA in computationally complete programming languages (CCL).

We also alerted to an important, but unnoticed detail: data sublanguage appeared in the 1970 paper -- in 1969 Codd referred to retrieval sublanguage. This can be understood only with reference to the theoretical foundation of the RDM.


Wednesday, September 25, 2019

Testing Your Foundation Knowledge



The Web is chockful of unnoticed/unquestioned pronouncements by self-taught novices or "experts" that are (1) wrong, or (2) gobbledygook. Any attempt to demonstrate lack of foundation knowledge underlying these misconceptions and their practical implications are usually dismissed as "theory that is not practical", attacked as "insulting ad-hominem", or ignored altogether, regardless of the amount and quality of the supporting evidence and argument logic. This is understandable: in the absence of foundation knowledge and ability to reason, it is by definition impossible to comprehend and appreciate corrections that require them.

I have always contended that practitioners who cannot detect such misconceptions, and understand their practical implications and the importance thereof are insufficiently prepared for a professional career in data management. Worse, neither can they associate problems with their real causes and, thus, cannot come up with proper solutions, which explains the industry's "cookbook approach" and succession of fads.

What about you? This is another batch in the Test Your Foundation Knowledge regular series of posts of online statements reflecting common misconceptions that are difficult to discern without foundation knowledge. You can test yours by trying to debunk them in Comments -- what category, (1) or (2) do they fall in? 

Sunday, September 22, 2019

Data Sublanguage Part 1: Relational vs. Computational Completeness




Note: I have revised the "Logical Access, Data Sublanguage, Kinds of Relations, Database Redundancy, and Consistency" paper in the "Understanding the Real RDM" series" (available from the PAPERS page) for consistency with this post.

“Recently I have read that SQL is actually a data sublanguage and not a programming language like C++ or Java or C# ... The answers ... have the pattern of "No, it is not. Because it's not Turing complete.", etc, etc. ... I am a bit confused, because since you can develop things through SQL, I thought it is similar to other programming languages ... I am curious about knowing why exactly is SQL not a programming language? Which features does it lack? (I know it can't do loops, but what else more?)”
--StackOverflow.com
“The SQL operators were meant to implement the relational algebra as proposed by Dr. Ted Codd. Unfortunately Dr. Codd based some of his ideas on a "extended set theory", which was an idea formulated and described in a 1977 paper by D. L. Childs ... But Childs’ extensions were not ideally suited, which is explained in quite some detail in [a] book ... by Professor Gary Sherman & Robin Bloor [who] argue that mainstream Zermelo-Fraenkel set theory (Cantor), would have been a better starting point. One key issue is that sets should be able to be sets of sets.”
--Dataversity.net

The concept of a sublanguge cannot be understood without foundation knowledge and familiarity with the history of the database management field, both lacking in the industry.

Tuesday, September 17, 2019

Testing Your Foundation Knowledge



The Web is chockful of unnoticed/unquestioned pronouncements by novices or "experts", many self-taught, that are (1) wrong, or (2) gobbledygook. Any attempt to demonstrate lack of foundation knowledge underlying these misconceptions and their practical implications are usually dismissed as "theory, not practical", attacked as "insulting ad-hominem", or ignored altogether, regardless of the amount and quality of the supporting evidence and argument logic. This is understandable: in the absence of foundation knowledge and ability to reason, it is by definition impossible to comprehend and appreciate corrections that require them.

Practitioners who cannot detect such misconceptions and understand their practical implications and the importance thereof are insufficiently prepared for a professional career in data management. Worse, they cannot associate problems with their real causes and, thus, cannot come up with proper solutions, which explains the industry's "cookbook approach" and succession of fads.

What about you? This is another batch in the Test Your Foundation Knowledge regular series of posts of online statements reflecting common misconceptions that are difficult to discern without foundation knowledge. You can test yours by trying to debunk them in Comments, including what category, (1) or (2) do they fall in? If you can't, proper education is in order.


Sunday, August 25, 2019

Meaning Criteria and Entity Supertype-Subtypes Relationships




Note: This is a re-write of a previous post.
"I have a database for a school ... [with] numerous tables obviously, but consider these:
CONTACT - all contacts (students, faculty): has fields such as LAST, FIRST, ADDR, CITY, STATE, ZIP, EMAIL;
FACULTY - hire info, login/password, foreign key to CONTACT;
STUDENT - medical comments, current grade, foreign key to CONTACT."
"Do you think it is a good idea to have a single table hold such info? Or, would you have had the tables FACULTY and STUDENT store LAST, FIRST, ADDR and other fields? At what point do you denormalize for the sake of being more practical? What would you do when you want to close out one year and start a new year? If you had stand-alone student and faculty tables then you could archive them easily, have a school semester and year attached to them. However, as you go from one year to the next information about a student or faculty may change. Like their address and phone for example. The database model now is not very good because it doesn’t maintain a history. If Student A was in school last year as well but lived somewhere else would you have 2 contact rows? 2 student rows?  Or do you have just one of each and have a change log. Which is best?"
How would somebody who "does not know past, or new requirements, modeling, and database design" and messes with a working database just because "he heard something about (insert your favorite fad here)" figure out correct from bad answers? Particularly if the answers suffer from the same lack of foundation knowledge as the question?

Friday, June 21, 2019

Data Meaning and Mining: Knowledge Representation and Discovery




Note: This is a re-write -- prompted by a LinkedIn exchange -- of two columns I published @All Analytics.
“Scientific research experiments that "require assignment of data to tables, which is difficult when the scientists do not know ahead of time what analysis to run on the data, a lack of knowledge that severely limits the usefulness of relational [read: SQL] databases.”
NoSQL are recommended in such cases. But what does "scientists do not know ahead of time what analysis to run" really mean?

Data, Information, and Knowledge


One way to view the difference between data, information, and knowledge is:
“1. Data: Categorized sequences of values representing some properties of interest, but if and how they are related is unknown (e.g., research variables in scientific experiments);
2. Information: Properties further organized in named combinations -- "objects", but how they are related is unknown (e.g., "runs", or "cases" in scientific experiments);
3. Knowledge: Relationships among properties and among objects of different types are known.”

--David McGoveran


View My Stats