Monday, September 12, 2016

This Week



1. Quote of the Week
"Data sense-making does not benefit from the relational data model. Dr. Codd’s rules for relational modeling were designed to improve efficiencies in data processing and storage, not to make data more intelligible. In fact, structuring data relationally makes the work of data sensemaking more difficult, which is why Dr. Kimball created dimensional data modeling and also why an entire industry of middleware products emerged to hide the complexities of relational models." --Stephen Few, PerceptualEdge.com

2. To Laugh or Cry?

Saturday, September 3, 2016

Relation Predicates and Identical Relations



Note: This is a 11/25/17 re-write of an earlier post, to bring it in line with the McGoveral formalization and interpretation [1] of Codd's real RDM.

Here's what's wrong with the last wrong picture I posted, namely:

Q: "Can you have 2 tables, VIEWS and DOWNLOADS, with identical structure in a good DB schema (item_id, user_id, time). Some of the records will be identical but their meaning will be different depending on which table they are in. The "views" table is updated any time a user views an item for the first time. The "downloads" table is updated any time a user downloads an item for the first time. Both of the tables can exist without the other."

A1:"I don't think that there is a problem, per se. From a E/R modeling point of view I don't see a problem with that, as long as they represent two semantically different entities."

A2:"Are you saying that both tables have an 'item_id' Primary Key? In this case, the fields have the same name, but do not have the same meaning. One is a 'view_id', and the other one is a 'download_id'. You should rename your fields consequently to avoid this kind of misunderstanding."

A3: "Chris Date and Dave McGoveran formalised the Principle of Orthogonal Design. Roughly speaking it means that in database design you should avoid the possibility of allowing the same tuple in two different relvars. The aim being to avoid certain types of redundancy and ambiguity that could result."

A4: "When designing a DB there are lots of different parameters, and some (e.g.: performance) may take precedence. Case in point: even if the structures (and I suppose indexing) are identical, maybe "views" has more records and will be accessed more often. This alone could be a good reason not to burden it with records from the downloads." --StackOverflow.com

Monday, August 22, 2016

This Week



1. What's wrong with this picture?
"Can you have 2 tables
- VIEWS
- DOWNLOADS
with identical structure in a good DB schema (item_id, user_id, time). Some of the records will be identical but their meaning will be different depending on which table they are in. The "views" table is updated any time a user views an item for the first time. The "downloads" table is updated any time a user downloads an item for the first time. Both of the tables can exist without the other.

"I don't think that there is a problem, per se from a E/R modelling point of view, as long as they represent two semantically different entities."
"Are you saying that both tables have an 'item_id' Primary Key? In this case, the fields have the same name, but do not have the same meaning. One is a 'view_id', and the other one is a 'download_id'. You should rename your fields consequently to avoid this kind of misunderstanding."
 "Chris Date and Dave McGoveran formalised the Principle of Orthogonal Design. Roughly speaking it means that in database design you should avoid the possibility of allowing the same tuple in two different relvars. The aim being to avoid certain types of redundancy and ambiguity that could result."

"When designing a DB there are lots of different parameters, and some (e.g.: performance) may take precedence. Case in point: even if the structures (and I suppose indexing) are identical, maybe "views" has more records and will be accessed more often. This alone could be a good reason not to burden it with records from the downloads."
--Can you have 2 tables with identical structure in a good DB schema?, StackOverflow.com

Monday, August 8, 2016

This Week



1. What's wrong with this picture?
Q: "My understanding has always been that a primary key should be immutable, and my searching since reading this answer has only provided answers which reflect the same as a best practice. Under what circumstances would a primary key value need to be altered after the record is created?"

A: "When a primary key is chosen that is not immutable?" --StackExchange.com

2. Quote of the Week

"We start with a value. This may be a number, a date, or a chunk of text. Domain refers to the meaning for a value or a set of possible values. When we have a value with a consistently and widely used set of units, value domains, or applications, we call this value a data element. A data element may be a ticket number, a temperature reading, a hair color. (Some modeling approaches omit the notion of data elements or domains.)
In all the techniques and stages of data modeling, the concepts of entity and attribute are universal. An entity is the “thing” that must be manipulated as a data object. Entity represents an indivisible concept that consists of data elements. Each data element in the entity is called an attribute. Conversely, we could say that we build an entity by attributing data elements to it. You create an entity as a whole, and you delete it as a whole." --What a Concept! Is Logical Data Modeling Obsolete? -- LinkedIn.com
Note: In the preface to my PRACTICAL DATABASE FOUNDATIONS series of papers I deplored an author's (considered an expert) reliance on an industry ANSI committee as a starting point in an explanation of conceptual, logical and physical data fundamentals--there is practically 0 chance of soundness and 100% chance of confusion. So when this article started with "In the common usage established by ANSI in 1975, data modeling goes from abstract to concrete in three steps" I knew I would not read very far. Sure enough, I stopped almost immediately, after the above quote. If you don't understand why, I recommend you check out my papers.

Friday, July 29, 2016

Data Science, Coding, the Automation Paradox, and the Silicon Valley State




My July post @All Analytics.

Indeed, if what Linus Torvald is admiringly quoted as saying is true -- "Don’t ever make the mistake [of thinking] that you can design something better than what you get from ruthless massively parallel trial-and-error with a feedback cycle. That’s giving your intelligence much too much credit." -- why bother with education, science, and theory at all? Just “plug your coding skills” by getting a Microsoft or IBM “Professional Degree”, or even take a free coding course by your favorite billionaire.

Read it all. (Please comment there, not here) 

Monday, July 25, 2016

Duplicates: Stating the Same Fact More Than Once Does Not Make it Truer, Only Redundant




Here's what what wrong with last week's picture, namely:
RB: "From the tabular point of view, does it make sense why we can't have duplicate rows in a relation?"

John Sullivan: "As with everything else in life, it depends what you are trying to do (and exactly what you mean when you talk about a DBMS table v. a formal relation). From an operational (transactional) database point of view, for obvious reasons, you don't want duplicate rows (enforce a natural key). But if you're analysing data from various legacy sources (e.g. spreadsheets) it might be useful. Then again, you might introduce a surrogate key to give you more control over what's going on - again, depends on what you are trying to do." --LinkedIn.com

One of my readers once wondered why "database professionals understand uniqueness via keys, but don't seem to understand why duplicate rows should be prohibited and the consequences of breaking relational closure": 

View My Stats