DATABASE DEBUNKINGS

Thursday, September 27, 2012

A Note on Education vs. Training

Follow @DBDebunk Follow @ThePostWest

Gene Wirchenko drew my attention to an Infoworld article by Andrew Oliver, Ill-informed haters go after MongoDB, which is kind of a response to the articles critical of MongoDB which I commented on in previous posts. The gist of the article is described as: "NoSQL databases like MongoDB are great for some tasks but not for others. Is it MongoDB's fault if misguided developers use it to solve the wrong problem?"

With any new technology comes a wave of marketing happy talk, which in turn leads to inexperienced developers "jumping on the train" of a new fad. Inevitably, these newbies find themselves disappointed that the technology doesn't deliver on their inflated expectations.

Oliver correctly identifies the core systemic problem in database management, one that I have been warning of for almost my entire 25+ career in the field: the lethal combination of proliferation of thoroughly hyped ad-hoc products and technologies by vendors unfamiliar with the foundation and history of the field to database professionals and users equally unfamiliar with same. Neither do I find fault with the advice he offers at the end of his article:

"Take blogs with a grain of salt ... make sure you understand the technology before using it on a critical project. If you don't heed this advice, some writer for Infoworld on a short deadline in a slow news week might decide to ridicule you!

although I don't think the ridicule by journalists, even less knowledgeable about what they cover, is the most serious consequence.

But he fails to make the connection between a major source of the problem and the effectiveness of his advice.

The IT industry in general and the database field in particular rely almost exclusively on tools experience. Practitioners are inducted in the field mainly via practice with specific tools that happen to be in vogue at specific times; job descriptions don't require much beyond that; and academia has been turned away from science and education into a research and certification vehicle for vendors and their tools, a trend which Dijkstra has attacked decades ago much better than I can. I experienced this personally on more than one occasion. To recall two:

When I offered a presentation on data fundamentals to a reputable computer science department, there was no interest, as they were too busy with "XML research".
When I tried to teach an introductory course in database management at a local university by developing a syllabus on data fundamentals, I was quickly disabused of that illusion by a demand to use a specific book and teach Oracle.

A lot is being made on the "high education bubble", the exploding cost of an academic education and the burdening indebtness caused by it. Among its many implications there is an insidious one. Enormous pressure is exerted, for obvious reasons, on academia to turn from educational to vocational: from employers and vendors (via various incentives that are hard to resist) and from students, particularly those sponsored by employers or vendors.

I do not believe that knowledge of and experience with tools alone is sufficient to address the problems underlying the database field. Without foundation knowledge, including the history of the field, relabeled old discarded products will continue to proliferate and practitioners will lack the capacity to avoid being seduced by hype.

Indeed, one could argue that the attraction of the so-called "schema-less" NoSQL products is due to the difficulty to think conceptually and logically about requirements and evaluate technologies and products critically because the necessary knowledge and ability to reason and abstract--distinct from tool experience--have not been inculcated. The commonly used assertion "different databases for different purposes" or "the right tool for the right task" are trivial and trite and can be misleading without the benefit of foundation knowledge external to the tools themselves.

Note very carefully that I do not mean to imply that tool experience is unimportant, which would be nonsense. Rather, I claim that it is necessary but insufficient for intelligent functioning in the database field, as it probably is in many other fields.

Wednesday, September 26, 2012

Mountain View Presentation

Follow @DBDebunk Follow @ThePostWest

I will be giving a presentation to the Silicon Valley SQL Server User Group:

Foundation Knowledge for Database Professionals

October 16, 6:30pm

Microsoft

1065 La Avenida
Building 1
Mountain View, CA (map)

It is open to all. Please join us and invite anybody who might be interested.

More details here.

Quote of the Week

Follow @DBDebunk Follow @ThePostWest

I've read a few things about NoSQL (technology or movement? That is the question!). As my colleague Stuart McLachlan rephrased the question: If it's a movement, the real question is whether it's a religious or bowel movement.
--SQL/PostSQL/NoSQL, artfulopinions.blogspot.com

To Laugh or Cry?

Follow @DBDebunk Follow @ThePostWest

To Laugh or Cry? items are specifically selected as lost causes (usually, the problems lie with either the author of the item, or with those who respond to it, or both). For that reason I normally do not comment on them. This week's piece is somewhat unusual in that the target is not the author, but rather a vendor. The following item was brought to my attention by Matt Rogish.

Diego Basch, I’ll Give MongoDB Another Try. In Ten Years.

Davide and David on NoSQL

Follow @DBDebunk Follow @ThePostWest

After publishing my 3-parter on NoSQL I came across a post by Davide Mauri and an interview with David McGoveran on the subject, to which I would like to add some clarifications (this is not a debunking, as they hardly justify such).

Mauri asks: "Do NoSQL people really want to drop the relational model?"

PRACTICAL DATABASE FOUNDATIONS PAPER #1: BUSINESS MODELING FOR DATABASE DESIGN

Follow @DBDebunk Follow @ThePostWest

Pls see the PAPERS page for current version of this paper, when it becomes available.

Wednesday, September 19, 2012

Data Models and Usefulness

Follow @DBDebunk Follow @ThePostWest

In my 3-part article on dynamic schema, NoSQL and the relational model I admitted that I did not know much about NoSQL products in general and MongoDB in particular. Nevertheless, it was not difficult to figure out what problems would be faced using a docubase for database management purposes.

Then Matt Rogish alerted me to Why I Migrated Away From MongoDB. I suggest you read it in the context of my article and see how close my suspicions were to reality. Here are some quotes worth pondering (emphasis mine):

Alas, not being aware of the mathematics behind relational algebra, I could not see clearly the trap I was falling into - document databases are remarkably hard to run aggregations on and aggregating the data and presenting meaningrful statistics on your receipts is one of the core features of digiDoc. Without the powerful aggregation features that we take for granted in RDBMSs, I would constantly be fighting with unweildy map-reduce constructs when all I want is SUM(amount) FROM receipts WHERE <foo> GROUP BY <bar>.

People keep complaining that JOINs make your data hard to scale. Well, the converse is also true - Not having JOINs makes your data an intractable lump of mud.

...when I last looked something as simple as case-insensitive search did not exist. The recommended solution was to have a field in the model with all your search data in it in lower case [FP: heh, heh]... And if I add a new field to the model? Time to regenerate all the search strings. I can only come to the conclusion that mongodb is a well-funded and elaborate troll.

...somewhere along the stack of mongodb, mongoid and mongoid-map-reduce, somewhere there, type information was being lost.

Then it might have been the lack of an enforced schema? Thinking about it though, schemas are wonderful. They take all the constraints about your data and put it in one place. Without a schema, this constraint checking would be spread all over my application. A document added a month ago and a document added yesterday could look completely different and I’d have no way of knowing. Such fuzzy schemaless data models encourage loose thinking and undisciplined object orientation.

Anybody with foundation knowledge would expect these problems. As I argued so many times: a data structure determines manipulation and, therefore, usefulness for a given informational purpose.

Oh, and in this context, see my AllAnalytics posts and the exchanges with readers:

Knowing What a Database Is
Unstructured Data

POSTS