Monday, October 1, 2012

Normalization, Further Normalization, Ease of Use, Integrity and Performance



Revised: 10/15/16
"Normalization was invented in the 70's as a way to put some structure around how developers were storing data in a database, in addition to trying to save disk space. You need to remember this was a time when 1MB was billions of dollars and a comput er needing 1GB of data storage was inconceivable. It was a way to squeeze as much data into as small a space as possible." --Tom Phillips, social.technet.microsoft.com
Perhaps the lack of understanding of the relational model was so acute at the time when it was first published (1969-70), that it would not surprise me if a belief existed then that normalization would save storage space, even if I don't understand in what this belief was grounded. But there is no justification for such a belief to persist in 2012, no matter what else one thinks of normalization.

For the multiple advantages from full normalization (5NF) -- chief among them semantic correctness of query results (i.e., no anomalous side-effects)--see the just published THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, available via the BOOKS page).

Thursday, September 27, 2012

A Note on Education vs. Training



Gene Wirchenko drew my attention to an Infoworld article by Andrew Oliver, Ill-informed haters go after MongoDB, which is kind of a response to the articles critical of MongoDB which I commented on in previous posts. The gist of the article is described as: "NoSQL databases like MongoDB are great for some tasks but not for others. Is it MongoDB's fault if misguided developers use it to solve the wrong problem?"
With any new technology comes a wave of marketing happy talk, which in turn leads to inexperienced developers "jumping on the train" of a new fad. Inevitably, these newbies find themselves disappointed that the technology doesn't deliver on their inflated expectations.
Oliver correctly identifies the core systemic problem in database management, one that I have been warning of for almost my entire 25+ career in the field: the lethal combination of proliferation of thoroughly hyped ad-hoc products and technologies by vendors unfamiliar with the foundation and history of the field to database professionals and users equally unfamiliar with same. Neither do I find fault with the advice he offers at the end of his article:
"Take blogs with a grain of salt ... make sure you understand the technology before using it on a critical project. If you don't heed this advice, some writer for Infoworld on a short deadline in a slow news week might decide to ridicule you!
although I don't think the ridicule by journalists, even less knowledgeable about what they cover, is the most serious consequence.

But he fails to make the connection between a major source of the problem and the effectiveness of his advice.

The IT industry in general and the database field in particular rely almost exclusively on tools experience. Practitioners are inducted in the field mainly via practice with specific tools that happen to be in vogue at specific times; job descriptions don't require much beyond that; and academia has been turned away from science and education into a research and certification vehicle for vendors and their tools, a trend which Dijkstra has attacked decades ago much better than I can. I experienced this personally on more than one occasion. To recall two:
  • When I offered a presentation on data fundamentals to a reputable computer science department, there was no interest, as they were too busy with "XML research".
  • When I tried to teach an introductory course in database management at a local university by developing a syllabus on data fundamentals, I was quickly disabused of that illusion by a demand to use a specific book and teach Oracle.
A lot is being made on the "high education bubble", the exploding cost of an academic education and the burdening indebtness caused by it. Among its many implications there is an insidious one. Enormous pressure is exerted, for obvious reasons, on academia to turn from educational to vocational: from employers and vendors (via various incentives that are hard to resist) and from students, particularly those sponsored by employers or vendors.

I do not believe that knowledge of and experience with tools alone is sufficient to address the problems underlying the database field. Without foundation knowledge, including the history of the field, relabeled old discarded products will continue to proliferate and practitioners will lack the capacity to avoid being seduced by hype.

Indeed, one could argue that the attraction of the so-called "schema-less" NoSQL products is due to the difficulty to think conceptually and logically about requirements and evaluate technologies and products critically because the necessary knowledge and ability to reason and abstract--distinct from tool experience--have not been inculcated. The commonly used assertion "different databases for different purposes" or "the right tool for the right task" are trivial and trite and can be misleading without the benefit of foundation knowledge external to the tools themselves.

Note very carefully that I do not mean to imply that tool experience is unimportant, which would be nonsense. Rather, I claim that it is necessary but insufficient for intelligent functioning in the database field, as it probably is in many other fields.

Wednesday, September 26, 2012

Mountain View Presentation



I will be giving a presentation to the Silicon Valley SQL Server User Group:

Foundation Knowledge for Database Professionals

October 16, 6:30pm

1065 La Avenida
Building 1
Mountain View, CA (map)

It is open to all. Please join us and invite anybody who might be interested.

More details here

Quote of the Week



I've read a few things about NoSQL (technology or movement? That is the question!). As my colleague Stuart McLachlan rephrased the question: If it's a movement, the real question is whether it's a religious or bowel movement.
--SQL/PostSQL/NoSQL, artfulopinions.blogspot.com

To Laugh or Cry?



To Laugh or Cry? items are specifically selected as lost causes (usually, the problems lie with either the author of the item, or with those who respond to it, or both). For that reason I normally do not comment on them. This week's piece is somewhat unusual in that the target is not the author, but rather a vendor. The following item was brought to my attention by Matt Rogish.

Diego Basch, I’ll Give MongoDB Another Try. In Ten Years.

Saturday, September 22, 2012

Davide and David on NoSQL



After publishing my 3-parter on NoSQL I came across a post by Davide Mauri and an interview with David McGoveran on the subject, to which I would like to add some clarifications (this is not a debunking, as they hardly justify such).

Mauri asks: "Do NoSQL people really want to drop the relational model?"

Thursday, September 20, 2012

View My Stats