Tuesday, January 3, 2017

Understanding the Relational Data Model: A New Series of Papers



"Nowadays, anyone who wishes to combat lies and ignorance and to write the truth must overcome at least five difficulties. He must have:
  1. The keenness to recognize it, although it is everywhere concealed;
  2. The courage to write the truth when truth is everywhere opposed;
  3. The skill to manipulate it as a weapon;
  4. The judgement to select in whose hands it will be effective, and
  5. The cunning to spread the truth among such persons."
--Berthold Brecht
A rather accurate explanation of why it has been so difficult to dispel the misuse and abuse of the Relational Data Model since inception. To the point that most of its core practical benefits have failed to materialize, with the IT industry regressing all the way back to its pre-relational and even pre-database state:
  • Graph DBMSs;
  • XML;
  • JSON;
  • NoSQL;
  • Application-specific databases and DBMSs;
  • "Unstructured data";
  • No integrity enforcement;
  • A cacophony of imperative programming languages rather than declarative data sublanguages (suffixed with QL, just like old non-relational DBMSs were with /R). 

Saturday, December 24, 2016

This Week with Season's Greetings





Data Sublanguages, Programming and Data Integrity

My December post @All Analytics

Both data science employers and candidates stress the eclectic nature of the required skills, programming in particular. Indeed, coding has acquired such an elevated role, that it now entirely replaces education. Aside from the societal destructive consequences of this trend, in the context of data management it is a regressive self-fulfilling prophecy that obscures and disregards the core practical objective of database management to minimize programming. You can frequently encounter it in comments like:
"Anything you can model in a DBMS you can model in Java. The next paradigm shift is business rules centralized in Java business objects, rather than hard-coded in SQL for better manageability, scalability, etc. The only ones that should reside in a database are referential integrity (and sometimes even that isn't really necessary). Don't let pushy DBAs tell you otherwise -- integrity constraints slow down development as well as performance."
Upside down and backwards.

Read it all (and comment there, not here, please).


THE DBDEBUNK GUIDE TO MISCONCEPTIONS OF DATA FUNDAMENTALS available to order here.

("What's Wrong with this Picture" will return in 2017)


1. Quote of the Week

"The value of the model may be diminishing in certain enterprises, since busy with deliverables." --Harshendu Desai, LinkedIn.com

3. To Laugh or Cry?

5 Reasons Relational Databases Hold Back Your Business

4. Added to the LINKS page

  • What a Database Really Is: Predicates and Propositions
  • The Logical Fallacies

5. Of Interest


And now for something completely different


New at The PostWest (check it out)

My take of the week

Choosing not to veto, Obama lets anti-settlement resolution pass at UN Security Council
The press refused to publish Obama's Chicago speech to the Palestinian lobby to hide his anti-semitism. He was never as troubled by Assad, or Putin, or Erdogan as he was by Netanyahu.  That's because Jews have always been a soft target (Barak Obama's Israeli Settlements Canard).  If that is not anti-semitism, I don't know what is.
When the US is in the same camp with Russia, China, Iran and Turkey and her acts are cheered by Hamas, Islamic Jihad and Hezbollah, she has sold out and moved to the dark side.

America (like most other countries) is occupied Indian land via atrocities (and not by people who returned to their own country, like the Jews did). So when America returns its settlements to the Indians, Israel will return its "settlements" (which Israelis got when they defended themselves from "being thrown into the sea"). Until then moralizing and selling out Israel to genocidal terrorists is hypocritical anti-semitism, just like everyone else's (see below).


Global Hypocritical Anti-semitism

UN

 
US

EU

Article of the week

Israel and the Occupation Myth

Video of the week
The Red Disaster. The "life" in Romania during the 60s. The Jews did the worst due to deep anti-semitism. America paid Ceausescu to get us out, but neither she nor Europe wanted us. Had there been no Israel, we would have probably starved to death, not necessarily in a rotten jail. Nobody talks about us, or the hundreds of thousands of the Jewish refugees kicked out from the Arab countries, none of whom were murderous, but everybody is obsessed with the suffering of the Palestinians, who are genocidal.

Pinch-me of the week

Ahmad Tibi urges Israelis not to ‘live by the sword’. As if she is allowed to live without it.

Book of the week (Purchase via this link to support the site)
Bard, M., MYTHS AND FACTS: A GUIDE TO THE ARAB-ISRAELI CONFLICT

Note: I will not publish or respond to anonymous comments. If you want to say something, stand behind it. Otherwise don't bother, it'll be ignored.

Tuesday, December 20, 2016

On View Updating (C. J. Date and D. McGoveran)



My recent posts on denormalization [1], identical relations [2] and the POOD [3,4,5] based on D. McGoveran (DMG) interpretation of Codd's RDM, triggered online reactions [6,7] and some comments in place that reflect the current understanding of the RDM. One of my readers referred me back to a 2004 exchange triggered by an exchange @old dbdebunk.com with both CJD and DMG on view updating--an important aspect discussed in my posts--on which the two perspectives differ. Last week I asked what's wrong with CJD's position in the exchange. Here is the original exchange, albeit in abbreviated form (I made minor revisions for clarity and added references to more recent sources the reader may want to consult, such as the 2013 CJD book on the subject [8], which also purports to describe some of DMG's more recent thinking), in which DMG counters with his position and adds a new comment. Throughout, I've changed "relvar predicate" (still used by CJD) to "relation predicate", as preferred in the DMG interpretation.

Monday, December 12, 2016

This Week



THE DBDEBUNK GUIDE TO MISCONCEPTIONS OF DATA FUNDAMENTALS available to order here.

1. What's wrong with this picture

"First of all, let me say that I no longer regard view updating as a fully solved problem. A year ago or so I thought it was--but then Hugh Darwen started to ask some hard questions and I realized I was wrong. (David McGoveran will probably disagree with me here.) That said, I remain optimistic that the problem is solvable. The discussion in my 8th Ed. is generally along the right lines, though it gets some of the details wrong." --C. J. Date

2. Quote of the Week

"SQL is the lingua franca for retrieving structured data. Existing semantics for SQL, however, either do not model crucial features of the language (e.g., relational algebra lacks bag semantics, correlated subqueries, and aggregation)." --Konstantin Weitz, homotopytypetheory.org

3. To Laugh or Cry?

Why is MongoDB wildly popular? It's a data structure thing

Monday, December 5, 2016

Prediction, Explanation and the November Surprise



Note: My November post @All Analytics, which I reposted here.  

Given the overhyped promise of "data science", the "shock" at the broad failure to predict the election outcome was inevitable. Skimming through the media and technical accounts, it looks like a better understanding of prediction and explanation is necessary for less surprises and sounder analytics. Let's take two examples (oversimplified somewhat to make the point). 

First, a game-theoretic account derived from observed behavior in a 2-player game in which one player gets a sum of money and decides how to share it with another, who can only accept or reject the offer: even though accepting any offer as better than nothing is rational, "we don’t behave rationally ... [but] emotionally ... we reject offers we consider unfair".

"… there’s been plenty of economic growth inside the U S--vastly increasing the pile of money to be divided. But ... The first player consists of those people who have benefitted from globalization and trade: the “elites”, derisively referred to as “the 1%”. And the second player ... everyone ... who aren’t in those upper income echelons ... are seeing the pile of money in the game growing ever bigger. And ... the other player keeps an ever-larger share of that pile for themselves ... Trump allowed them to channel their feelings into a rejection of the proposal that has been made—on trade, immigration, and globalisation, and dividing up those spoils ...[and they threw] everything out". --What voters do when they feel screwed--the economic theory)

Second, a complex algorithm that runs a multitude of sophisticated simulations on a "raft of carefully collected public and private polling numbers, as well as ground-level voter and early voting data”. Assume that “the raft” consists of, vote predictors—vote correlates discovered by computers (Whatdidn't Clinton’s data-driven campaign's algorithm named Ada see?).

Suppose (1) an appropriate hypothesis in the form of a correlation at the aggregate level between variables measuring affinity to the first and second player and vote had been derived in the former case which proved accurate and (2) the algorithm in the latter case produced an equally accurate prediction.  Is there any difference between the two approaches?

For those who equate prediction with explanation, the answer is yes. For those for whom explanation is about the past and prediction about the future, the question does not come up. But these are views that obscure rather than enlighten.

In both cases there is a data pattern in the form of predictive correlations. In the first case a theory of individual behavior specifies the causal mechanism—the individual behavior that explains how the pattern is produced--why it exists at the aggregate level. In the second case, the mechanism is of no particular interest and is not specified. In general explained behavioral predictions are more reliable than those without.

Data patterns discovered by computers explained can produce insights—causal mechanisms— for theory development, this is what data mining should be about. That's the context of discovery in science, which requires predictions from the theory developed from the discovered patterns to be tested in the context of validation on different data. But because, unlike in natural science, human behavior is not governed by unchanging universal laws, it is easier to explain post-hoc than to predict. Given the pressure for prediction in industry and politics, the temptation not to bother with the second context is too strong. 

In this age of "big data", "data mining", "data lakes" and machine learning the important difference between prediction and explanation should be understood and kept firmly in mind when performing analytics and assessing their results.

See also Unthinking Machines.

 




Re-write



See the rewrite
Class, Type, Relation and Domain in Database Management



View My Stats