Sunday, May 5, 2013

Theory: As Far From Religion As One Can Get



In So What is a 'Large Database' JS states:
The points you make here, and consistently ... center pretty clearly on distinction between logical models and physical implementations. Products that sacrifice the logical model for various practical considerations (speed, size, cost, etc. - at least in the short term), reinforce the general lack of focus on, or understanding of, the relational model, as well as diminishing appreciation of the distinction betweenlogical and physical.
Physical data independence (PDI) is, indeed, a core advantage of the relational model, but hardly the only one I have focused on over the years. And the relational model is hardly the only component of the foundation knowledge that is increasingly lacking in the industry.

Thursday, May 2, 2013

Site Update



1.
Details of my keynote address at the Northern California Oracle User Group Spring 2013 conference is on the SCHEDULE page.

BTW: If you live in San Francisco, attend the conference on 5/22 and can give me rides to and/or from Pleasanton, or know somebody who can, it will be greatly appreciated. Please email me at the address on the ABOUT page.
 `
2.
The 'Quotes of the Week' were posted on the QUOTES page.

3.
A 'To Laugh or Cry' item was posted on the LAUGH/CRY page.

Carl Hewitt's "response" to Date and McGoveran letter to the editor criticizing  his previous nonsense. Incidentally, somebody Googled "chris date mcgovern [sic] carl hewitt" and here's the blurb that comes up:
Carl Hewitt - Wikipedia, the free encyclopedia
Carl Hewitt is Board Chair of the International Society for Inconsistency ... which was developed in the early 1970s by Sussman, Hewitt, Chris Reeve, and David ...
4.
A link to an online exchange I participated in was posted to the FP ONLINE page.

5.
The first installment of my Debunking Corner for the Northern California Oracle User Group Journal Spring 2013 issue has been published. A link to the journal PDF was posted on the FP ONLINE page (scroll down).

6.
The Costly Illusion: Normalization, Integrity and Performance paper has been revised to correct an error (see Understanding Further Normalization: 2NF).

7.
A Bing search that hit my site: "optimal database for complex xml schemas nosql". I don't think that's what the author had in mind.

8.
From a LinkedIn Profile:
Mary Hart
B2B Tech Marketing Copywriter/Professional Liar, Greater Boston Area
I would appreciate the honesty but for for the logical paradox.


Sunday, April 28, 2013

Tables, Full Normalization and Business Rules



REVISED: 10/16/16
 

Often somebody produces a table and asks if it is fully normalized (in 5NF) and, if not, in what normal form it is. This is an indication of poor grasp of data fundamentals.

Consider ASSIGNMENTS:
 EMP# ENAME    PROJECT       DEPT#
===================================
 100  Spenser  Sys Support   E21
 100  Spenser  Comp Svcs     E21
 100  Spenser  Supp Svcs     E21
 160  Pianka   Info Center   D11
 310  Setright Documentation D11
 310  Setright Mfg Systems   D11
 150  Adamson  Info Center   D11
-----------------------------------
First, a normal form is a property of a relation, not a table (a R-table is only a "visual shorthand" for a relation -- a special kind of table that visualizes a relation on some physical medium (e.g., paper) -- and the two should not be confused.

Second, the normal form of a relation is determined from attribute dependencies. Formally, a relation is fully normalized (in 5NF) if and only if the only dependencies that hold in it are functional dependencies (FD) of the non-key attributes on the key (i.e., there is exactly one value of each non-key attribute for every key value, but not vice-versa). Since a key represents an entity identifier, this condition exists only when, informally, a relation represents entities of a single type (why?) Is this true for the relation pictured by ASSIGNMENTS?

The fact is that whether a relation represents a single type of entity -- and, therefore, is fully normalized -- cannot be ascertained from sheer visual inspection of the table picturing it. It requires knowledge of what the underlying relation means, namely the type(s) of entity specified by the business rules in the corresponding conceptual model that the relation represents.

For example, if the rules:

  • R1: Every employee is identified by an employee number.
  • R2: Every employee has an employee name.
  • R3: Every employee works in a department.
  • R4: Every project assignment is identified by an employee number and a project name.
model two types of entity:
  • Employees: {emp. number} --> {employee name, department number}
  • Project assignments: {employee number, project}
then the relation represents both and, consequently, is not in 5NF. In fact, without a well defined and complete conceptual specification of entity types, you can't even tell whether relations have keys (if they do not, they are not relation) and, if they do, what the key is.

Database design adhering to the Principle of Full Normalization (POFN) do not "bundle" entity types and produce 5NF relations, obviating the need for further normalization. For the advantages of full normalization and the drawbacks of "denormalization for performance" illusion, see paper #2 and the recently published DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS.

Explicit further normalization is necessary only to "repair" poorly designed non-5NF relations by replacing them 5NF projections. For example, ASSIGNMENTS with EMPLOYEES and PROJ_ASSIGNS pictured by R-tables:

EMP# ENAME    DEPT#
====================
 100  Spenser  E21
 160  Pianka   D11
 310  Setright D11
 150  Adamson  D11
--------------------

 EMP# PROJECT
=================  
 100  Sys Support
 100  Comp Svcs
 100  Supp Svcs
 160  Info Center
 310  Mfg Systems
 310  Comp Svcs
 150  Info Center
------------------
each representing a single entity type. The repair is possible because the following holds:
ASSIGNMENTS{EMP#,ENAME, DEPT#} JOIN ASSIGNMENTS{EMP#,PROJECT} = PROJ_ASSIGNMENTS
where the left-hand side is a join of two projections of ASSIGNMENTS (i.e., no information is lost).




Thursday, April 25, 2013

Site Update



1.
My keynote address at the Northern California Oracle User Group Spring 2013 conference was added to the SCHEDULE.

BTW: If you live in San Francisco, attend the conference on 5/22 and can give me rides to and/or from Pleasanton, or know somebody who can, it will be greatly appreciated. Please email me at the address on the About page.

2.
A link to my latest All Analytics column was posted on the ONLINE page.

Incidentally, since with the discovery by business of analytics as some sort of "new data science", overnight born-again BI experts proliferate like frogs after heavy rain. It suggest a similar poverty of foundation knowledge and rich debunking targets.
Please submit any pearls you come across that could be targets interesting from a data perspective.

3.
The 'Quote of the Week' was posted on the QUOTES page.

4.
A 'To Laugh or Cry' item was posted on the LAUGH/CRY page.

Many years ago I wrote something about what I called the "kitchen sink" approach to data management, but this one takes the cake. All the following are included:
  • Key-value pair programming language
  • Entity Attribute Value database model
  • Relational Database Management System, specifically Postgres 9
  • Objects and object metadata
  • SQL client interface (returns objects of various types)
  • Procedural SQL [FP: Huh?]
  • Schema of "Sprout data model" [FP: Wonder what that is]
  • Objects (tables, views) are accessed with their resource identifier
  • High level syntax-independent [FP: Wow!!!!]
and much more (check out, in particular, the bulleted list of features).

5.
A link to an online exchange I participated in was posted to the FP ONLINE page.

6.
Consider the topics in Jonathan Lewis' Oracle Mechanisms Webinar in the context of my argument that, given so many physical/implementation factors that affect performance, why the instinct to attribute poor performance to (logical) denormalization?  And there are many more than those tackled by Jonathan.

7.
While checking hits to this site, I noticed that one of them was due to the  following Google search: "My data model is a better model of reality than your data model. What would your response be?"

Well?



Thursday, April 18, 2013

Site Update



1.
My keynote address at the Northern California Oracle User Group Spring 2013 conference was added to the SCHEDULE.

2.
A link to my latest All Analytics column was posted on the ONLINE page.

3.
The Quote of the Week was posted on the QUOTES page.

There was a comment to my recent Un-muddling Modeling, Part 1 that the conceptual and logical models do not require the relationship concept. However, this does not mean we cannot refer to relationships that are implicit in the models and that is usually in response to arguments like this one.

4.
A 'To Laugh or Cry' item was posted on the LAUGH/CRY page.

Nokia Entertainment: Why we went Mongo

An excellent example of how products are selected in the absence of foundation knowledge.

Related.

Ideas to integration data sets from structured and unstructured data

Bay Area coding boot camps promise to launch tech careers   
SAN FRANCISCO -- Looking for a career change, Ken Shimizu decided he wanted to be a software developer, but he didn't want to go back to college to study computer science.

5.
A link to an online exchange I participated in was posted to the FP ONLINE page.

How to I create a logical data model for Geospatial Data?

6.
Big Data Is Just For Big Companies - And Other BS

There are two related core cycles in IT: centralization/decentralization/re-centralization and corporatization/democratization/re-corporatization.

7.
I have often referred to the difficulty of conveying informally the formal without losing either the rigor, or the audience. David Portas, one of the few knowledgeable practitioners, demonstrates some of that difficulty in his comments to the following post by Hugo Kornelis: NULL - The database's black hole

8.
Enjoy.

Big Data Dilbert


View My Stats