Friday, April 11, 2014

Weekly Update




1. Quote of the Week
Now that we have seen a lot of information about NoSQL databases, it is interesting to drop back and look around at how much NoSQL stuff we already have in our organizations. I had never thought of a file system as a database, but it is. The comparison is fascinating. File systems don’t impose any structure on the data that is stored in any given file. There is a key value relationship to each file. There is little control over concurrency beyond file locking. This is very similar to NoSQL, with locking only at the aggregate level. File systems are cheap; everyone has one and they hold huge amounts of data on multiple nodes. --Book review, NoCOUG Journal

2. To Laugh or Cry?
Find GUID in Database

3. Online exchanges

4. Interesting elsewhere (corrected first link and added a second):

5. And now for something completely different
Joe Biden wants to nominate Obama for sainthood
The VP of the world's "superpower".
Stanford opening new lab to study bad science
And tomorrow we'll need a lab to study bad science in the study of bad science.










Tuesday, April 1, 2014

Analytics = Manipulation of Data Structure



In What the $&@#^ is Applied Big Data, venture capitalist Greg Sands raises an issue about which I as a scientist (and not just a "data scientist") have often expressed concern. 
"The world is awash in data. Figuring out what to do with it is the problem. The press is littered with reports about Big Data. Many CIOs report that their CEOs have come to them and said, “We need some of that Big Data.” That often means make sure we’re collecting all the available data, often deploying a new Hadoop-based infrastructure to store and analyze it. After this elaborate process and extensive investment, they’ll start mining to figure out if there are critical insights that come out of the data. We see many entrepreneurs that start the same way. Aggregate data and look for a problem."

Monday, March 24, 2014

Simplicity: Forgotten, Misunderstood, Underrated Relational Objective



In a LinkedIn exchange I argued that an optimal generality-to-simplicity ratio (ability to represent a maximal range of reality with minimal complexity) and a 1:1 correspondence between informal business modeling constructs and formal logical database constructs are beneficial. And I claimed that insofar as data models that are formally defined are concerned, the relational model scores best on both.

One of the responses I got was
GE: Though I might come up with slightly different lists, in general, I agree with your expression of criteria for selecting a primary key and of generality and simplicity, but disagree with your conclusion that "RM scores better than any other modeling scheme."
Let me take his points one by one.

Sunday, March 23, 2014

Weekly Update




Note: I usually receive notifications of comments to my posts, which I then publish if they are not spam. Today, however, I noticed a whole bunch of old comments that were waiting for moderation, some of 2013, of which I was not notified. I have just published them and replied. Apologies to the commenters.

1. Quote of the Week
Schemaless describes the storage engine, not the data. Data has schema. No Data is ever Schemaless. Schemaless DBs merely describe a feature of themselves, not the data they store. Namely that they don't store and enforce this schema in addition to the data.
One advantage of this Is that you can change the schema "easily" - helps with up time. Now if you don't evolve the old data with each schema change you can end up with multiple schemas stored in your backend and no way of knowing which data is of which schema without some form of analysis of the data. Show me the Front end that can deal with evolving schemas without knowing about them ;) Point being Schema's always there wether any tier deals with it explicitly or not. Something has to manage it. --LinkedIn.com
2. To Laugh or Cry?
Big data means the reign of the relational database is over
3. Online
What is the best way to explain Normalization 1NF,2NF and 3NF
4. And now for something completely different
Samsung’s entire leadership team is paid less than individual executives at Google, Apple





Tuesday, March 18, 2014

Science, Religion, EAV and the Relational Model




Note: This is a 11/06/17 revision. Thanks to Erwin Smout for his review of a draft and suggesting improvements.

The claims that (1) the relational data model (RDM) is old and, by implication, obsolete -- the industry has purportedly "progressed" -- and (2) promoting it as a superior alternative to NoSQL, Hadoop and other "modern" data management technologies is "religious" in character are routine. They have popped again in a LinkedIn exchange and I responded as I usually do, by asking why is the promotion of a scientific approach deemed religious, while pushing ad-hoc alternatives is not?

Sunday, March 9, 2014

Weekly Update




1. My presentation at SQLSaturday:
The Last NULL in the Coffin: A Relational 2VL Solution to Missing Data
March 15, 2014 11:15am
Microsoft Technology Center, 1065 La Avenida, Building One, Mountain View, CA, 94043
2. Descriptions of all available courses are now posted on the SEMINARS page. Contact me if you are interested in public or private sessions, with possible customization for particular needs.

3. Quote of the Week
1: - a picture is only one representation of a data model: What is the adequate data structure to store a data model: Picture, Text (UML), formal (Gellish), Database) - is there an API to access, create and manipulate data models - do you handle hundreds of types (Entity, Relationship, Attribute): - do you handle one conceptual data model and derived consistent (external) submodels - document : which meta-attributes must be maintained to describe the elements of the data model. How to create different views of the data model (Pictures with different views, detailed printed documents)
2: - How to check the data model with instance data and queries based on the requirements.
3. - Use the ER-Model for Instance data (instead of the relational model) and thereby avoid the impedance mismatch:
4: A query language to access instance data which is based on the Entity-Relationship-Model and NOT on the relational model" --LinkedIn.com
Ugh!

4. To Laugh or Cry?
mysql - additional information on normalization 
5. Online
Do you use Composite Primary Keys to design a good, solid data model?
6. Elsewhere
http://image-store.slidesharecdn.com/a606a81e-9bd4-11e3-8016-22000a9aa8cc-large.png
The database field.

7. And now for something completely different
Zombie Studies Gain Ground on College Campuses








Wednesday, February 26, 2014

Anatomy of a Data Management Project: Distribution Independence



The term "distributed" is thrown around a lot these days. Hype notwithstanding, just as with analytics and data science, distribution in data management is nothing new.In fact, SQL vendors (IBM, Sybase, Ingres, Oracle) -- frequently criticized today for non-scalability -- tackled distribution decades ago. The non-relational systems preceding SQL were not amenable to it, and SQL is the closest to the relational model the industry allows you to get. 

View My Stats