DATABASE DEBUNKINGS

Sunday, September 29, 2013

Testing Your Foundation Knowledge

Follow @DBDebunk Follow @ThePostWest

Expertise in a field and ability to convey it to others are distinct and the latter requires different motivation, skills and talent. Many top technical experts are more often than not poor communicators, whether verbally or in writing, for some inherent reasons, Codd being an excellent example. That's one of the core reasons for poor foundation knowledge in data management in general, and the appreciation of the relational model in particular.

In a previous post I started a little experiment: I asked both readers who think they know and understand the relational model (RM) and those who do not but want to, to comment on whether a theoretically correct explanation of data fundamentals offered by reader PK was helpful and, if not, why not. I promised to draw some conclusions regarding the difficulty of dispelling misconceptions without losing either theoretical rigor, or the audience--a non-trivial task for an educator in an industry that deems theory impractical.

I can't say the response exactly answered my question (I recommend reading the comments, though). But let me, as promised, try my hand at making better sense of both the explanation and the comments (for an in-depth treatment see paper #1, Business Modeling for Database Design). Let me know if it helps..

Site Update

Follow @DBDebunk Follow @ThePostWest

1. Schedule reminder

September 23rd, 10:00am, San Francisco, CA
The CWA, Missing Data and the Last NULL in the Coffin
Presentation, Oaktable Conference, Oracle OpenWorld

October 8, Milan, Italy
Denormalization for Performance: A Costly Illusion
Public presentation, UGISS SQLSaturday

October 9-10, 2013, Milan, Italy
Business Modeling for Database Design
Private seminar sponsored by Microsoft and organized by SolidQ
Contact: Davide Mauri, SolidQ

2. Quote of the Week

I am constructing a new website ... using node.js. Its aim is to have many subscriber (people who offer help and people who need help) it should be scalable in different language. I have to decide wich is the more suitable db. I am thinking about to have two db (mongodb and postgress) for site languages and people account, people should vote other people ability. As db experts could you give me some suggestions? What would think could be a good db choice?
--LinkedIn.com

3. To Laugh or Cry?

Can anyone guide about using DB2

4. Online

My latest All Analytics column: Understanding Data Independence
Exchange I participated in: Not convinced about this whole Big Data thing

5. There were several posts on this site about Meijer article, its support by a letter to the editor and reactions by David McGoveran and C. J. Date to both. But I missed the one by my fellow relationlander Erwin Smout: A letter by Carl Hewitt. At one point he writes:

At any rate, I'm still left wondering what mr. Hewitt's problem is here.

I don't know why he wonders -- it is pretty obvious to me.

6. The frequency of fads has been increasing and the time between them decreasing. Today pushing a "new thing" starts before the last fad is exhausted: The Next Wave of Data Management

7. And now for something completely different: How the US Crushed Youth Resistance

Sunday, September 15, 2013

Re-write

Follow @DBDebunk Follow @ThePostWest

See

Data Model: The RDM Is, the E/RM Isn't

Sunday, September 8, 2013

Site Update

Follow @DBDebunk Follow @ThePostWest

1. Schedule reminder

September 23rd, 10:00am, San Francisco, CA
The CWA, Missing Data and the Last NULL in the Coffin
Presentation, Oaktable Conference, Oracle OpenWorld

October 8, Milan, Italy
Denormalization for Performance: A Costly Illusion
Public presentation, UGISS SQLSaturday

October 9-10, 2013, Milan, Italy
Business Modeling for Database Design
Private seminar sponsored by Microsoft and organized by SolidQ
Contact: Davide Mauri, SolidQ

2. Quote of the Week

Q: One of the main resistences of RDBMS users to pass to a NoSQL product are related to the complexity of the model: Ok, NoSQL products are super for BigData and BigScale but what about the model?

A: Actually graphs are the way we (people) think and organization data in our head, as computer people it is on[e] of the most popular way[s] we are taught to think about data, so this should be natural.
--slideshare.net

3. To Laugh or Cry?

"Splunk for Big Data"

4. My comment at Robert Young's blog

No Mas!! No Mas!!

5.
Something I argued much before they did.

Think Big Data Is All Hype? You're Not Alone

5. And now for something completely different.

High-tech toilets vulnerable to hackers

No comment.

Sunday, August 25, 2013

Site Update

Follow @DBDebunk Follow @ThePostWest

1. Schedule update

September 23rd, 10:00am, San Francisco, CA
The CWA, Missing Data and the Last NULL in the Coffin
Presentation, Oaktable Conference, Oracle OpenWorld

October 8, Milan, Italy
Denormalization for Performance: A Costly Illusion
Public presentation, UGISS SQLSaturday

October 9-10, 2013, Milan, Italy
Business Modeling for Database Design
Private seminar sponsored by Microsoft and organized by SolidQ
Contact: Davide Mauri, SolidQ

2. Quote of the Week

How many software programs are mathematically provable. And yet everybody still writes software and for the most part it works. Relational theory and SQL was very important for establishing a standard across vendors to a point. And yet switching relational database vendors is still very expensive proposition because the standards don't address the features that users need and use everyday that are not part of the standard. At the heart of the system the relational model can still be enforced. But a product lives and dies not on whether it is mathematically provable but it's features set, efficiency and cost to develop in.
--LinkedIn.com

3. To Laugh or Cry?

Please help with my data model design

If this was student homework, it is an excellent example of how database management should not be learned and a validation of the substitution of the "cookbook approach" for education. Ironically it's in the forum's section "Relational theory". Had theory been taught, such questions would have not been asked.

4. Two online exchanges I participated in

Modeling for NoSQL, Schemaless & Unstructured Data

Predictable--it was just a matter of time. My latest post at All Analytics is quite apropos: Real Data Science: General Theories of Data.

Five myths about big data

In this context, consider In Silicon Valley, age can be a curse.

5. And now for something completely different

Not entirely unrelated:

Facebook boosts connections, not happiness study

The Curse of Self-Service (h/t Davide Mauri)

Sunday, August 11, 2013

Site Update

Follow @DBDebunk Follow @ThePostWest

A while ago my friend Stephen Henley published his opinion on Missing Data, which questioned the thoughts--not well formed and definitive at the time--of C. J. Date, Hugh Darwen and myself on the subject. Since then Date has proposed a default values scheme which he has subsequently renounced; Darwen has published How To Handle Missing Information Without Using NULL and I proposed a relational solution in the recently revised paper #3, The Last NULL in the Coffin.

In this context, I dedicate this update (except the last item) to NULL. Whatever difference may exist among the above mentioned relational proponents, we do agree that it is certainly not a solution to the problems of missing data.

Time permitting, I may post some belated comments on Henley's piece.

1. QUOTE OF THE WEEK

If SQL is based on relational algebra which is based on set theory where the concept of null set (empty set) is an axiom of the theory. In this theory empty set is not the same thing as nothing. A point that confuses many people.

Relational algebra is based on 3VL predicates, that is, the answer to any predicate can have three states true, false or unknown. Unknown is caused by the use of a operator on an the absence of a value (null). Within relational algebra null is not to be treated as a value but merely a marker of unknown (absence of a value).

None of this is rocket science and I suggest doesn't result in bad implications. I suggest the so called "bad implications" are only introduced as people use null as a patch for problems for example the division by zero. indeterminate state, open ended ranges, data states to name a few. That is, the issue is not the concept of null but its abuse as a patch for other issues.
--LinkedIn.com

2. TO LAUGH OR CRY?

Why shouldn't we allow NULLs?, stackexchange.com

3. An ONLINE exchange I participated in.

NULL Handling in Databases, LinkedIn.com

4. And now for something completely different.

An astonishing act of statistical chutzpah
Why Great Teachers Are Fleeing the Profession
The ABCs of MOOCs

What does this say about the educational system?

Tuesday, July 30, 2013

The Final NULL in the Coffin: A Relational Solution to Missing Data

Follow @DBDebunk Follow @ThePostWest

Order via the PAPERS page

NEW! THE FINAL NULL IN THE COFFIN: A RELATIONAL SOLUTION TO MISSING DATA NEW!

v.3 (August 2013)

The relational data model is based on the two-valued logic (2VL) of the real world: every proposition about the real world is unequivocally true or false. But our knowledge of the real world is usually imperfect—some data is missing—which means that we don't always know whether propositions are true or not; 2VL no longer applies and data integrity and database query results are no longer guaranteed to be enforceable and provably logically correct with respect to the real world.

Missing data has possibly been the thorniest aspect of database management: without a logically sound yet practical solution, data professionals and users are left between a rock and a hard place. They must either (a) rely on SQL's arbitrary and flawed implementations of three-valued logic (3VL) based on NULLs and risk results that are easy to misinterpret, or erroneous in ways hard to discern, or (b) undertake in applications a prohibitively complex, error prone and unreliable burden that belongs in the DBMS.

This paper illustrates some of the drawbacks of the many-valued logic (nVL, n > 2) approach to missing data and SQL’s NULL scheme and proposes a solution within the 2VL/relational framework that:

Guarantees data integrity and logically correct query results;
Avoids the complications and problematics of nVL/NULL's;
Requires no changes to the relational model;
Is largely transparent to users;
Keeps users better apprised of the existence and effects of missing data.

The proposed solution requires research into its implications for data manipulation and integrity enforcements before it is implemented, but we believe it is theoretically sound and implementable in a truly relational DBMS (TRDBMS) using technologies that, unlike SQL, support full physical data independence e.g. the TransRelational™ Model (TRM).

Table of Contents

Introduction
"Inapplicable Data”: Nothing's Missing
Missing Data: Into the Unknown
SQL’s 3VL: NULL
Known Unknowns: Metadata
A 2VL Relational Solution
The Practicality of Theory
2VL vs. NULL in the Real World
Relation Proliferation
The TransRelational™ Model
Conclusion
Some Misconceptions Debunked
References

POSTS

Sunday, September 29, 2013

Testing Your Foundation Knowledge

Sunday, September 22, 2013

Site Update

Sunday, September 15, 2013

Re-write

Sunday, September 8, 2013

Site Update

Sunday, August 25, 2013

Site Update

Sunday, August 11, 2013

Site Update

Tuesday, July 30, 2013

The Final NULL in the Coffin: A Relational Solution to Missing Data