Friday, July 29, 2016

Data Science, Coding, the Automation Paradox, and the Silicon Valley State




My July post @All Analytics.

Indeed, if what Linus Torvald is admiringly quoted as saying is true -- "Don’t ever make the mistake [of thinking] that you can design something better than what you get from ruthless massively parallel trial-and-error with a feedback cycle. That’s giving your intelligence much too much credit." -- why bother with education, science, and theory at all? Just “plug your coding skills” by getting a Microsoft or IBM “Professional Degree”, or even take a free coding course by your favorite billionaire.

Read it all. (Please comment there, not here) 

Monday, July 25, 2016

Duplicates: Stating the Same Fact More Than Once Does Not Make it Truer, Only Redundant




Here's what what wrong with last week's picture, namely:
RB: "From the tabular point of view, does it make sense why we can't have duplicate rows in a relation?"

John Sullivan: "As with everything else in life, it depends what you are trying to do (and exactly what you mean when you talk about a DBMS table v. a formal relation). From an operational (transactional) database point of view, for obvious reasons, you don't want duplicate rows (enforce a natural key). But if you're analysing data from various legacy sources (e.g. spreadsheets) it might be useful. Then again, you might introduce a surrogate key to give you more control over what's going on - again, depends on what you are trying to do." --LinkedIn.com

One of my readers once wondered why "database professionals understand uniqueness via keys, but don't seem to understand why duplicate rows should be prohibited and the consequences of breaking relational closure": 

Sunday, July 17, 2016

This Week



1. What's wrong with this picture? 
RB: "From the tabular point of view, does it make sense why we can't have duplicate rows in a relation?"

John Sullivan: "As with everything else in life, it depends what you are trying to do (and exactly what you mean when you talk about a DBMS table v. a formal relation). From an operational (transactional) database point of view, for obvious reasons, you don't want duplicate rows (enforce a natural key). But if you're analysing data from various legacy sources (e.g. spreadsheets) it might be useful. Then again, you might introduce a surrogate key to give you more control over what's going on - again, depends on what you are trying to do." --Question about the relational model


Sunday, July 10, 2016

Levels of Representation: Relationships, Rules, Relations and Constraints




What's Wrong with Last Week's Picture (Question about relational model )

There are relationships at both the conceptual and logical representation levels. Confusing them is bad conceptual modeling and database design.

Relationships, Rules and Relations

AT: "In my personal understanding, a relation is defined as a set of tuples. Then ... "in the relational model every relation represents a relationship". And then a quote from Chen: "each tuple of entities ... is a relationship". If I use the first and the second statements - I can say that a relationship is a set of tuples. The third statement says that a relationship is a tuple. So far, is a relationship a set of an element of a set? (Or may be a set of sets?)".
A relation is a set of tuples that is a subset of the Cartesian product of the domains, i.e., it is a relationship of domains at the logical level (Chen may have been misquoted: a tuple is a set of attribute values (i.e., a relationship thereof, that represents a fact about a single entity).

Thursday, July 7, 2016

NoSQL, Big Data Analytics, and the Loss of Knowledge and Reason



My June post @All Analytics:

The data management industry operates like the fashion industry. Its most persistent characteristic is migration from fad to fad. Every few years -- the number keeps getting smaller -- some "new" problem is discovered, for which the solution is so magical, that it is extended everywhere to everything, whether it is applicable or not. But many of these problems are old and fundamental and some of the “solutions” bring them back, rather than solve them. ...

Read it all. (Please comment there, not here)



Saturday, July 2, 2016

This Week



1. What's wrong with this picture?

AT: Well, I think I am a bit confused now. In my personal understanding, a relation is defined as a set of tuples. Then ... "in the relational model every relation represents a relationship". And then a quote from Chen: "each tuple of entities ... is a relationship". If I use the first and the second statements - I can say that a relationship is a set of tuples. The third statement says that a relationship is a tuple. So far, is a relationship a set of an element of a set? (Or may be a set of sets?)
GE: I argue that there is essentially no difference between relationships between entity (type tables) and between an entity and its attributes. They both represent relationships between two populations of things. Something is an attribute by virtue of there being a relationship. If relationships are represented by foreign keys and the entity tables must be in 1NF, as in the relational model, then all relationships must be at most Many-to-One (a very unnecessary limitation when modeling some user domain).
TF: The relational model was a mathematical construct, derived from set theory. Hence that particular terminology. The entity-relationship model is essentially a directed graph model, where relationships are prominent residents. Not so in the relational model (despite the name), where relationships (between relations, mind you) are not visible and in the SQL implementations is reduced to constraints. Relationships are about structure, which is as important as meaning (the semantics of the terms used in the universe being modeled).
2. Quote of the Week
"In Relational Theory sometimes the relationships, where we do our Joins are much more important than the attributes on an Entity." (quoted in LinkedIn.com exchange)

Sunday, June 19, 2016

This Week



1. What's wrong with this picture?

This week's picture is the one of the state of knowledge about keys that Martijn Evers painted in Kinds of Keys: On the Nature of Key Classifications, that I had already commented on. As a result of discussions I've been having with David McGoveran in the context of our forthcoming books (his LOGIC FOR SERIOUS DATABASE FOLKS, my DBDEBUNK GUIDE TO FUNDAMENTAL DATA MANAGEMENT MISCONCEPTIONS), I've decided to rewrite my
comments On Kinds of Keys on the subject. I refer the reader to Martijn's article for a refresher--my rewrite will be posted next week.

2. Quote of the Week

There are no rules of normalization for non-relational databases. Effectively, you start out by denormalizing everything. Which means you're designing the data organization to serve specific queries. So follow the same principle in NoSQL databases as you would for denormalizing a relational database: design your queries first, then the structure of the database is derived from the queries. --Bill Karwin, What is a good way to design a NoSQL database

View My Stats