The Real Data Science: Tables -- So What?
During hyping of fads such as "Data Science", all you hear is the "huge opportunities for enterprises to gain hitherto unimagined insights" and very little about the potential to tell enterprises really big lies, which can rise from 100% correct data in poorly designed databases. That's because what passes for "Data Science" is not science, let alone science of data.
Most data professionals know that relational databases consist of tables, but so what? Provably correct query results are guaranteed by the real data science--the RDM--if and only if tables are well-designed and properly constrained R-tables and the DBMS truly and fully supports it. Unfortunately, more often than not tables are neither, and SQL DBMS's don't, which makes databases harder to understand, queries don't always make sense and results are hard to interpret, or outright wrong.
You will learn:
- The Real Data Science
- Relations and databases
- 5NF R-tables
- "Table arithmetic"
- RDM and SQL
6:30 PM, Tuesday, September 15, 2015
Microsoft
1065 La Avenida Building 1
Mountain View, CA (map)
For details see Meetup.
1. Quote of the Week
Later, as use of RDBMS became more widespread, the complexity associated with design of a RDBMS was also well documented ... The associative database model is claimed to offer advantages over RDBMS ... “two fundamental data structures” as “„Items‟ and a set of „Links‟ that connect them together ... Items, which have "a unique identifier, a name and a type” and Links, which have “a unique identifier, together with the unique identifiers of three other things, that represent the source, verb and target of a fact that is recorded about the source in the database ... “each of the three things identified by the source, verb and target may each be either a link or an item.”
--Homan, J. V. and Kovacs, P. J., A Comparison of the Relational Database Model and the Associative Database Model, Issues in Informtion Systems, Vol. X, No. 1, 2009.
2. To Laugh or Cry?
A Comparison of the Relational Database Model and the Associative Database Model
3. Online Debunkings
4. Interesting Elsewhere
Why Domain Expertise is More Important than Algorithms
5. And now for something completely different
My August post @All Analytics.
One of the most common misconceptions is the view of the relational data
model (RDM) as “just theory”, implying it is not practical. But the RDM
is theory adapted and applied to the practical needs of database
management and it serves as the scientific foundation that guarantees,
among other benefits, provably logically correct query results.
Read it all. (Please comment there, not here)
HOUSEKEEPING
- New Appendix to paper #3: While working on my book, I collected all comments by readers and replies by me (edited) and David McGoveran and added them as Appendix B. It further clarifies some of the aspects of the proposed relational/2VL solution to missing data. Those who ordered the paper in 2014 and 2015 should email me for a copy.
Why even the most intelligent software architects don't understand the Relational Model
1. Quotes of the Week
In 15-20 years from now: Information will stay only in XML (no more tuples, no more objects). Imperative languages as we know them today (Java, C, C++, C#) will be gone. We will program with some extension of XQuery, or in any case a declarative dataflow/workflow language specially --Daniela Florescu, 2010 Interview
Exactly 20 years ago I wrote this article: "Storing and Querying XML Data using an RDMBS". I curse myself every day for doing so. I should be damned by the fires of hell for ever opening my mouth and letting people believe that one can REASONABLY use SQL to query hierarchical, complex structures like XML or JSON. NO, PEOPLE. YOU CAN NOT! --Daniela Florescu, 2015, LinkedIn.com
2. To Laugh or Cry?
SQL Will Inevitably Come To NoSQL Databases
3. Online Debunkings
Data Scientists: The talent crunch (that isn't)
4. Interesting
5. And now for something completely different
My July post @All Analytics:
One would expect “data scientists” to be keen on the dual scientific
foundation of database management -- the relational data model (RDM) --
but they know little beyond “related tables” and, in fact, complain that
more often than not data “do not fit” into them. Much of that is the
result of poor education and an almost exclusive focus on software tool
training. Even the analyst intent on acquiring foundation knowledge is
more likely to be misled than enlightened by published information.