Sunday, August 25, 2013

Site Update




1. Schedule update
September 23rd, 10:00am, San Francisco, CA
The CWA, Missing Data and the Last NULL in the Coffin
Presentation, Oaktable Conference, Oracle OpenWorld
October 8, Milan, Italy
Denormalization for Performance: A Costly Illusion
Public presentation, UGISS SQLSaturday
October 9-10, 2013, Milan, Italy
Business Modeling for Database Design
Private seminar sponsored by Microsoft and organized by SolidQ
Contact: Davide Mauri, SolidQ

2. Quote of the Week
How many software programs are mathematically provable. And yet everybody still writes software and for the most part it works. Relational theory and SQL was very important for establishing a standard across vendors to a point. And yet switching relational database vendors is still very expensive proposition because the standards don't address the features that users need and use everyday that are not part of the standard. At the heart of the system the relational model can still be enforced. But a product lives and dies not on whether it is mathematically provable but it's features set, efficiency and cost to develop in.
--LinkedIn.com

3. To Laugh or Cry?
Please help with my data model design
If this was student homework, it is an excellent example of how database management should not be learned and a validation of the substitution of the "cookbook approach" for education. Ironically it's in the forum's section "Relational theory". Had theory been taught, such questions would have not been asked. 


4. Two online exchanges I participated in
Predictable--it was just a matter of time. My latest post at All Analytics is quite apropos: Real Data Science: General Theories of Data.
In this context, consider In Silicon Valley, age can be a curse.


5. And now for something completely different

Not entirely unrelated:
Facebook boosts connections, not happiness study
The Curse of Self-Service (h/t Davide Mauri)




Sunday, August 11, 2013

Site Update




A while ago my friend Stephen Henley published his opinion on Missing Data, which questioned the thoughts--not well formed and definitive at the time--of C. J. Date, Hugh Darwen and myself on the subject. Since then Date has proposed a default values scheme which he has subsequently renounced; Darwen has published How To Handle Missing Information Without Using NULL and I proposed a relational solution in the recently revised paper #3, The Last NULL in the Coffin.

In this context, I dedicate this update (except the last item) to NULL. Whatever difference may exist among the above mentioned relational proponents, we do agree that it is certainly not a solution to the problems of missing data.

Time permitting, I may post some belated comments on Henley's piece.


1. QUOTE OF THE WEEK
If SQL is based on relational algebra which is based on set theory where the concept of null set (empty set) is an axiom of the theory. In this theory empty set is not the same thing as nothing. A point that confuses many people.

Relational algebra is based on 3VL predicates, that is, the answer to any predicate can have three states true, false or unknown. Unknown is caused by the use of a operator on an the absence of a value (null). Within relational algebra null is not to be treated as a value but merely a marker of unknown (absence of a value).

None of this is rocket science and I suggest doesn't result in bad implications. I suggest the so called "bad implications" are only introduced as people use null as a patch for problems for example the division by zero. indeterminate state, open ended ranges, data states to name a few. That is, the issue is not the concept of null but its abuse as a patch for other issues. 
--LinkedIn.com

2. TO LAUGH OR CRY?

Why shouldn't we allow NULLs?, stackexchange.com


3. An ONLINE exchange I participated in.

NULL Handling in Databases, LinkedIn.com


4. And now for something completely different.

An astonishing act of statistical chutzpah
Why Great Teachers Are Fleeing the Profession
The ABCs of MOOCs

What does this say about the educational system?




Tuesday, July 30, 2013

The Final NULL in the Coffin: A Relational Solution to Missing Data




Order via the PAPERS page


NEW! THE FINAL NULL IN THE COFFIN: A RELATIONAL SOLUTION TO MISSING DATA NEW!

v.3 (August 2013)

The relational data model is based on the two-valued logic (2VL) of the real world: every proposition about the real world is unequivocally true or false. But our knowledge of the real world is usually imperfect—some data is missing—which means that we don't always know whether propositions are true or not; 2VL no longer applies and data integrity and database query results are no longer guaranteed to be enforceable and provably logically correct with respect to the real world.

Missing data has possibly been the thorniest aspect of database management: without a logically sound yet practical solution, data professionals and users are left between a rock and a hard place. They must either (a) rely on SQL's arbitrary and flawed implementations of three-valued logic (3VL) based on NULLs and risk results that are easy to misinterpret, or erroneous in ways hard to discern, or (b) undertake in applications a prohibitively complex, error prone and unreliable burden that belongs in the DBMS.

This paper illustrates some of the drawbacks of the many-valued logic (nVL, n > 2) approach to missing data and SQL’s NULL scheme and proposes a solution within the 2VL/relational framework that:
  • Guarantees data integrity and logically correct query results;
  • Avoids the complications and problematics of nVL/NULL's;
  • Requires no changes to the relational model;
  • Is largely transparent to users;
  • Keeps users better apprised of the existence and effects of missing data.
The proposed solution requires research into its implications for data manipulation and integrity enforcements before it is implemented, but we believe it is theoretically sound and implementable in a truly relational DBMS (TRDBMS) using technologies that, unlike SQL, support full physical data independence e.g. the TransRelational™ Model (TRM).


Table of Contents
  • Introduction
  • "Inapplicable Data”: Nothing's Missing
  • Missing Data: Into the Unknown
  • SQL’s 3VL: NULL
  • Known Unknowns: Metadata
  • A 2VL Relational Solution
  • The Practicality of Theory
  • 2VL vs. NULL in the Real World
  • Relation Proliferation
  • The TransRelational™ Model
  • Conclusion
  • Some Misconceptions Debunked
  • References




Sunday, July 28, 2013

Site Update




1.
Some housekeeping. The posting to the blog and multiple static pages is a bit of a hassle. I am also facing some work on my seminars and papers. Until further notice:
  • There will be one post/week--alternating articles and Site Updates (I may skip the latter on certain weeks, if absolutely necessary);
  • Quotes and links to LAUGH/CRY? and FP ONLINE will be posted directly into Site Update posts (like below); the respective static Pages will be updated at the end of each month.
Some tool that would automate posts and updates in one shot would have helped. I looked into it, but for various reasons (including Google's Blogger updates), nothing is available (if you know of any, preferably from experienc, please recommend).

2.
Quote of the Week:
...the relational model has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes" ... Consider if we had a bunch of tables, each containing the thing A. Now what is the population of A? It cannot be found in any one of the tables. It is actually the union of all the populations of A plus more if we allow A to exist (i.e., be of interest to us) but does not appear in any of the tables. That would be the case of a master reference list of "codes" for which we would then build a separate table. But even that is insufficient. We would also have to define and enforce referential integrity everywhere an A appeared. All of this is handled explicitly and correctly in ORM -- we model objects (each one appears only once in a data model diagram) and relationships. There are no attributes. As I said before, an attribute is an object playing a role in a relationship with another object.
--LinkedIn.com
3.
To Laugh or Cry?
What’s the Best Way for Structured Data Computing in Java?
4.
FP Online:
Let's innovate....database
5.
Good advice:
Designing a Database: 7 Things You don't Want To Do
But why it bothers me?

6.
And now for something completely different.
NSA claims inability to search agency's own emails
Clueless doctor sleeps through math class, reinvents calculus…and names it after herself. At least the doctor re-invented something in a different field. Data professionals do it all the time in their own field.
You can't make these things up.



Monday, July 15, 2013

Site Update (UPDATED)




07/19/13: I have also added my latest post at All Analytics to the FP ONLINE page.

07/18/13: This update referred to items that were erroneously dated 7/3/13 instead of 7/15/13. This has now been corrected. 



1.
The 'Quote of the Week' was posted   to the QUOTES page.

2.
A 'To Laugh or Cry' item was posted on the LAUGH/CRY page.

Everything should be as simple as possible, but not simpler.
--Albert Einstein

3.
A link to an exchange I participated was posted on the FP ONLINE page.

4.
And now for something completely different.

If You Search, Advertise on, Invest in, or Have Kids Who Use Google, You Must See

Too much power is always dangerous, no matter who holds it.




Monday, July 8, 2013

Relational Theory and Database Practice




I shared the links to my recent three-part series on foreign keys (and integrity constraints in general) on LinkedIn. Comments on the second installment raised an important issue about keys (discussed in more depth in Business Modeling for Database Design), which deserves attention.
NK: Let me first affirm my position that I believe foreign keys are the fundamental bases on which relational database managements system operate. Foreign keys provide the relationship in database normalization. Foreign keys are like the framework of a building structure. While some developers may have the notion that constraints and integrity checks can be handled better at the application layer, I would want to refer them to tools like ER Studio, ERWIN, and Visual Studio ... A good database design starts at the logical design level. Abstracting constraints and integrity checks from this layer to the application layer can lead to corrupt database designs. A simple case in point; How would you enforce a unique constraint on a table with 10 million rows? Will it make better sense to have a unique index on the table\field or have the application layer enforce the constraint?
View My Stats