Thursday, December 11, 2014

Happy Holidays!





To all my readers and colleagues, Happy Hanukkah, Merry Xmas and a healthy, prosperous and Happy New Year!!!!

Sunday, December 7, 2014

Weekly Update



1. Quote of the Week
Relational is/was a way for humans to understand how computers could organise data. From a day back when disks were expensive. --LinkedIn.com

2. To Laugh or Cry?
How Google Will Use Firebase to Supercharge Its Cloud Computing
Another reinvention of a (square) weheel.


3. Online debunkings
Calendar supertype

4. Interesting elsewhere
On Persistence and Data Management
An oldie but goodie; check out my comment.


5. And now for something completely different

Fascinating:
John Cleese on the Black Knight and Douglas Adams' High Heels

About The PostWest:
White House exempts Syria airstrikes from tight standards on civilian deaths
Remember all the fuss about Israel not doing enough to prevent civilian deaths? The hypocrisy!




Wednesday, December 3, 2014

Analytics & SQL Tables



My December blog @All Analytics. 

Manipulating/extracting data from SQL databases and interpreting results without knowledge of what the source tables mean is almost certain to lead analysis astray. To ensure sensible analysis and properly interpreted results, the conscientious analyst may have to do some digging that requires basic database knowledge. Here's why.

Read it all. (Please comment there, not here)








Sunday, November 30, 2014

SQL's Incomplete Set-lization, Part 2




by Erwin Smout


[FP: Two weeks ago I posted a debunking of an article blaming some SQL sins. Erwin has some additional comments.]

1. Multisets


From the original article:
It is beyond any doubt that set is the basis of mass data computation. Although SQL has the concept of set, it is limited to describing simple result set, and it does not take the set as a basic data type to enlarge its application scope.
Sidestepping several possible nitpicks here, such as e.g., that SQL allows duplicate rows and thus, in its basic form, has bag, not set algebra, the intention behind the complaint here is mostly accurate.

Sunday, November 23, 2014

Weekly Update UPDATE 2



Housekeeping: I have added a link to Nijssen's paper The Entity-Relationship Model Considered Harmful to FUNDAMENTALS on the HOME page. UPDATE 2: The paper is fine if read with a PDF viewer other than
Adobe Reader XI (11.0.09).


1. Quotes of the Week
Platfora’s mission is to empower customers to transform their businesses into fact-based enterprises. Platfora's Big Data Analytics Platform masks the complexity of Hadoop, making it easy for customers to understand all the facts in their business... --Platfora.com
Q: I don't know what the different between detect inference in database and prevent it, any help?
A: Why would you want to prevent inferences that a DMBS makes? That's where the power of it is. --LinkedIn.com
2. To Laugh or Cry?
Graphs: A Better Database Abstraction
3. Online debunkings

4. Interesting elsewhere
You Too May Be A Victim Of Developaralysis
H/t Will Sisson.

5. And now for something completely different 
  • About The PostWest
If they do this:
Fatah official calls for blood to 'purify' Jerusalem of Jews
PA airs 'anti-Semitic' film as tensions mount in Jerusalem
Four killed in terror attack at Jerusalem synagogue
then obviously we should do this:
Croatia likely to recognize Palestine as a state MidEast
Sweden To Recognize State Of Palestine
Spanish Parliament Calls on Rajoy to Recognize Palestine
UK lawmakers vote to recognize Palestine as a state
and this
EU threatens 'further action' to protect two-state solution
EU considering 'sanctions' against Israel over settlements
Makes perfect sense. So what else is new?



Friday, November 7, 2014

Relational Fidelity and Analytics Integrity




My November blog post @All Analytics:


I have shown in previous posts that reliance on sheer visual inspection of database tables for data analysis is a risky proposition, with high probability of misinterpretation. All the more so when databases are complex, with wide and/or long tables. The analyst needs to know table interpretations -- their real-world meaning derived from the business rules with which the database must be consistent. The problem is that they are left out of the tables because DBMSs do not understand them, nor are they usually documented in the database (as they well should be), because database professionals underestimate their importance.

Read it all. (Please comment there, not here)









Sunday, November 2, 2014

Weekly Update



1. Quote of the Week
Those who argue for natural keys typically do so from a position of philosophical purity, as is the case in the Simple Talk article you cited. In my (25+) years of experience, people who argue from this position are long on education and short on real-world experience. In the real world just about every natural key I've ever come across is subject to duplication and/or redefinition. There are very few cases outside of smallish code tables where it is practical to take the philosophical high ground regarding natural keys. --StackExchange.com

2. To Laugh or Cry?
R2G a Tool for Migrating Relations to Graphs
H/t Erwin Smout.

3. Online debunkings

4. Interesting elsewhere
The Delusions of Big Data
Must read.

5. And now for something completely different
Ebola-- Failures of Imagination
Not to worry, America, Ebola will go to India first.
Can you detect the stealth animals hiding in all these pictures?
Fascinating.

About The PostWest:
Jihadism is OK as long as it kills Jews. Nice people. Let's ...
Irish parliament calls on government to recognize Palestine
... give them a state. Really?
Pat Condell: 'Boo Hoo Palestine'




Sunday, October 26, 2014

Weekly Update




1. Quote of the Week
Q: What's the key technical skills for Data Modeling? 
A: Erwin or Rational or phycial [sic] modelling or conceptual modelling or Logical modelling. --LinkedIn.com 
NULL means data is not available, nothing more...--LinkedIn.com
In case you were wondering.


2. To Laugh or Cry?

Oldie, but goodie from old dbdebunk:
On a Pile of ... what?

3. Online debunkings

4. Elsewhere

5. And now for something completely different
CDC blames cuts for Ebola response, pays millions in bonuses

About The PostWest:
Never again? Think again: they're at it again, to finish the job.
The Bible's Buried Secrets
Fascinating.



Sunday, October 19, 2014

Precision, Procedurality and SQL, Part 1



 by Erwin Smout and Fabian Pascal

"To be as precise as we possibly can is not a luxurious mannerism that the academic prig can afford himself in his (supposedly!) sheltered environment; for people facing the problems of "the real world" it is a Must." --E.W. Dijkstra, An Open Letter to L. Bass


From In Some Cases illustrating drawbacks of SQL in data computing and analytics
The computing power of SQL for mass structured data is complete, that is to say, it is impossible to find anything that SQL cannot compute. But its support layer is too low, which can lead to over-elaborate operation in practical application.
One of the four aspects of this "over-elaboration" is "computation without substep", but before we comment on it, the article glosses over an important matter.

Sunday, October 12, 2014

Weekly Update



Housekeeping: I have added FUNDAMENTALS links on HOME page to:

 1. Quote of the Week
I am teaching a database design course next year. What do you think should be covered in an introductory course? --LinkedIn.com
I have a requirement for an ERwin data modeler (Logical, Physical, 3NF and Star Schema). --LinkedIn.com

2. To Laugh or Cry?
What would be key entities in Automotive Industry MDM

3. Online debunkings

4. Must read elsewhere
Out of the Tar Pit

5. And now for something completely different
Ig Nobel Prize Winners
Cry, don't laugh.
Hey There Little Electron, Why Won't You Tell Me Where You Came From
Fascinating.
Israel is holding back channel talks with the 'Palestinian Authority' relating to Gaza, in which it is making concessions and receiving nothing in return.
The Tower cites a Wall Street Journal report that indicates that Western negotiators are so desperate for a deal with Iran that they are offering more significant sanctions relief for a deal that would not stop Iran from developing nuclear weapons.
Why the West is PostWest: The Blackmailer Paradox (Aumann is Nobel laureate in economics).




Monday, October 6, 2014

Tools Too Good to Be True



My October post @All Analytics

The Wired article Ex-Googler Shares His Big-Data Secrets With the Masses touts a new tool that "mimics the way web giants like Google and Facebook rapidly analyze enormous amounts of online information." The article calls the tool "simple for analysts to query data from anywhere in a company with a single tool, regardless of where that data is stored, without the need to learn new programming languages."

Read it all. (Please comment there, not here)



 






Sunday, September 28, 2014

Weekly Update




1. Quote of the Week
Further to that point, in my mind you can have a database that is both relational and schema-less, in the sense that it is relational if the only thing in it is relations but it is schema-less if any data updating operation is allowed to change the number or degree etc of said relations, rather than that being reserved for so called data-definition operations. --LinkedIn.com
2. To Laugh or Cry?
Turning dirty data words into sweet talk
And an oldie, but goodie
Gardner to DBAs, BI Vendors Reinvent Yourselves
3. Online Debunkings
4. Elsewhere

An old classic:
Unskilled and Unaware of It
and a related consequence
How Our Botched Understanding of Science Ruins Everything
5. And now for something completely different
"Enjoy":
John Oliver: Nuclear Weapons
Fascinating:
Freaky Physics Experiment May Prove Our Universe Is A Two-Dimensional Hologram
About the PostWest:
PA: Israelis Must Return to Their Countries of Origin
How about the Arabs in Palestine, most of of whom originate in immigrants from Arab countries attracted to Palestine by jobs created by Jewish development?
If it's not Jews doing it, who cares. It's not so much care for Palestinians as it is hate of the Jews.
Any way, Nice people. Let's give them a state.
US Providing Indirect Military Aid to Hezbollah
Afghanistan and Iraq redux.
Go Easy on Iran So It Fights ISIS? That's Absurd
Indeed: Why should the US allow an enemy nuclear weapons, when fighting ISIS is Iran's own existential interest anyway? Let radical Shia and Sunni duel it out.
World Council of Churches Demands Israel Release Terrorists
Ah, yes, religion is the source of morality.




Sunday, September 21, 2014

New Paper on Domains



Pls see the PAPERS page for the current version of the paper, when it becomes available.

Sunday, September 14, 2014

Weekly Update




Housekeeping:

  • Added a LINKS page to the top site menu, with links to items I deem worth reading. Added a few from my old site and new ones will be added as I come across them.
  • Overhauled the FUNDAMENTALS list of sources at the right of the HOME page. It now includes links to the bibliographies of E. F. Codd, D. McGoveran, C. J. Date, H. Darwen and myself. 


1. Quote of the Week
The future in data modeling is Object Role Modeling (ORM).  It is a far superior way to approach data modeling (compared to any record-based methods such as relational) that avoids all the pitfalls of "Table Think" and the necessity of normalization. --LinkedIn.com
2. To Laugh or Cry?
Survey data model, what is the best approach?
3. Online Debunkings
Dr. Robin Bloor: Big Data is “nonsense”
4. Elsewhere
5. And now for something completely different
Senator Challenges Zuckerberg
American patriot.
The Exorcisms of Anneliese Michel
Fascinating.

@The PostWest:



Sunday, August 31, 2014

Weekly Update




1. Quote of the Week
I use the word grain in the same sense as Kimball although I use it across Facts and Dimensions. I like the term over things like Uniqueness, Constraint, Key because it is a term that business readily understand, and can be used prior to the formal identification of a final key. In Dimensional Modelling it is a best practice for the Fact tables to have such a grain (a composite key across associated Dimensions) and is also necessary for many Dimensions (to assist with updates, type-2 logic etc.) - to the point where it is a best practice to identify the grain (Row Natural Key, Source Id, etc.) of the Dimension. The use of a surrogate key on either Dimensions or Facts must be backed by this level of rigor if data integrity is to be maintained. It also forces modeller consideration of source system issues such as multi-source key uniqueness, reuse of keys, deletions, etc. To clarify a little on Dimensions, the grain of an example type-1 customer dimension would be 'a customer id', the grain of an example type 2 customer dimension would be 'a customer id + as at time'. So 'grain' means the defined uniqueness for a row in the table. Generally, this also has the advantage of calling out poorly designed structures that have not established their relational uniqueness correctly - the cause of the irritating duplicate row issue in a Surrogate Key-oriented Fact table.--LinkedIn.com
Got that?

2. To Laugh or Cry?

3. Online Debunkings

4. And now for something completely different
Inside The Mind Of Leonardo
Fascinating.

Two books that every American should read (but won't).

@The PostWest




Tuesday, August 26, 2014

Data Analysts: Know Your Business Rules



My August Post @All Analytics

To ensure data operations make sense and results correspond to the real world and are interpreted correctly, analysts need to know the business rules on the basis of which a database was designed. Here are the types of rules for which they should be looking.

Read it all. (Please comment there, not here)






Sunday, August 24, 2014

NEW: Paper revisions available




Pls see PAPERS page for the current version of this paper, when it becomes available.

Friday, August 15, 2014

Weekly Update




1. Quote of the Week
Invoice number is the key (key is not unique in this table). Although structure or key constraints doesn't enforce uniqueness of rows, the assumption is there will not be duplicate rows. --LinkedIn.com

2. To Laugh or Cry?
Relational Queries by Reference

3. Online Debunkings
What is Weak Subtype

5. And now for something completely different
Tim's Vermeer
Fascinating

@The PostWest

Sunday, August 10, 2014

The 'Real World' and Database Design




Conveying data fundamentals to practitioners, losing neither rigor, nor the audience is a difficult task. There are many experienced professionals with tool expertise, but poor foundation knowledge for which there is little regard. Even in academia education has been substituted by training, which is not the same thing.

One dilemma faced by an educator is the tension between the simplicity of examples effective for conveying general concepts or principles and the complexity of the reality to be represented in databases. The latter requires integration of many concepts and principles, as well as thorough business knowledge. This is part of the reason why I normally refrain from specific online modeling/design advice and limit myself to the general principles that must be adhered to in the process.

Wednesday, July 30, 2014

Big Data & Analytics: Table Interpretations



My July Post @All Analytics

When, for analytical purposes, you combine data extracted from database tables whose real-world meaning you don't know, you're asking for trouble. The meaning, after all, isn't in the tables. Rather, the meaning is contained in the business rules on the basis of which the tables were designed and which are approximated in the database with integrity constraints. Interpreting results on the basis of just visual inspection of the tables therefore involves guesswork and is almost certain to be wrong.

Read it all. (And please comment there, not here)



 



Sunday, July 27, 2014

Weekly Update UPDATED




1. Quote of the Week
Q: In a nutshell, what does RDF based Linked Data facilitate?
A: The ability to find and describe stuff using attributes (relations, properties, features, fields, characteristics). --LinkedIn.com

2. To Laugh or Cry?
Data Model now offers Relationship Modeling

3. Online Debunking
Data Vaults - Why Or Why Not

4. Elsewhere
 
5. And now for something completely different

In its decline, theWest is becoming impotent and increasingly irrelevant in world affairs e.g. Russia, Syria, Iraq, Afghanistan, China, Libya, you name it. It is taking hypocritically its frustration on Israel and the Jews--its classic scapegoating during crises--reinforced by fear from internal Islam.

I am restarting my PostWest blog and will link regularly to posts in this section.


Critical comments that
  • fail to show my facts to be false; or, if they are true, that my conclusions do not follow;
  • are posted anonymously
will not be published and addressed.







Sunday, July 6, 2014

Weekly Update




1. Quote of the Week
Don't confuse Data(base) Modeling with Business Modeling. All DBA are correct when they are talking about Database Modeling. If you want to ensure unique record on Business level, just add a unique composite index. (not as Key). But far to often, a unique record on business level is not ALWAYS unique (only most of the time) --LinkedIn.com

2. To Laugh or Cry?
Create database vs schema

3. Online Debunkings

4. Interesting Elsewhere

5. And now for something completely different
 And if you like what they're doing to San Francisco, you'll love what they'll do to other cities:
Google Exec Rises Ire in Portland
 Symptoms of societal malaise.



Monday, June 30, 2014

Big Data, Normalization & Analytics: Meaning & Constraints



My June post @All Analytics

Combining data extracts from databases for analytical purposes without knowing what the source database tables mean -- what exactly in the real world they represent -- can produce wrong results.

Read it all. (And please comment there, not here)



 




Sunday, June 29, 2014

Denormalization: Database Bias, Integrity Tradeoff and Complexity




The common and entrenched misconception about normalization was recently visible yet again in a LinkedIn exchange.
R: Unless the need is for ACID compliant transactions, denormalization is generally not considered logically, physically or whatever-ally-–so essentially a thoroughly normalized mode is relevant for a write-infrequently consumption of data and data integrity can be guaranteed by design.

Sunday, June 22, 2014

Weekly Update




I have corrected a mistake in For Codd's Sake: a mathematical relation is not the Cartesian product (CP) of the domains over which it is defined. Two readers correctly pointed out what I actually wrote myself in my business modeling paper:
Mathematically, a relation on domains—which are sets of values of a type—is a subset of the Cartesian product of the domains.
Note that the whole CP is also a subset, so it is also a relation, which happens to have useful applicability to business modeling and database design. In the database context, it can be pictured as the pool of all possible rows--past, present and future--for a R-tablevar defined by the domains' types. A database R-table is the set of actual rows at any point in time that is consistent with the set of all integrity constraints to which the R-table is subject (see Business Modeling for Database Design).


1. Quote of the Week
NoSQL usant correct m'y indeed totof n'y most of the dev ans devops who clearly thing nosql Means they will ne a le to do whatever they wants ans still have answers to their twisted query in a correct time. Those people see nosql as the mean to get ris of DBAs. And il not kiddin since it's happening right now un many companies i know of. --LinkedIn.com

2. To Laugh or Cry?
Architecting IMS for Big Data - a symbiotic relationship.

3. Online 

4. Interesting Elsewhere
IEEE Computer Issue on CAP Theorem
H/t Erwin Smout.


5. And now for something completely different

The PostWest.



Sunday, June 8, 2014

Weekly Update




1. Quote of the Week
Logical design is where the Architect defines entities (which will become tables in a database), attributes (which will become columns in a database), etc. This is typically the level that SMEs are most comfortable. I think that a Logical design may deal with data types and keys, but it does not cater to any specific platform or engine.
Physical design is where the Architect translates the logical design into tables, columns, datatype specifics like INT versus NUMERIC, indexes, partitions, etc. This is where "the rubber meets the road" and the logical design gets mapped into a form that can exist and be tested on a database server.
While I'm sure that someone will object to this link on religious grounds, the discussion  does a pretty good job of making the distinctions that concern me. --LinkedIn.com
2. To Laugh or Cry?
MyBatis Schema Migration System
H/t Ben Samuel, who adds:
"From the department of "we haven't really thought this feature through" comes this gem, one of several schema migration systems that allow "reverse migrations" or "downward migrations". Whereas a forward migration creates tables, columns, etc., a reverse migration drops them. The video proudly shows them "reverse migrating" their database until all tables are dropped. Another vendor patiently explains why they don't offer this feature."

3. Online Debunkings

4. Interesting Elsewhere
The Death Of Expertise

5. And now for something completely different

The "productive" business and tech elite:
God's given gift to humanity and pillars of society.










Sunday, June 1, 2014

Big Data, Normalization & Analytics



May Post @All Analytics

What you need to know for the purposes of this discussion is that tables that bundle multiple entity classes have certain drawbacks. Normalization is a design repair procedure that unbundles the classes -- the columns representing attributes pertaining to each class -- each into its own table. This is possible if and only if there is no data lost or made spurious in the process -- that is, when a bundling of table A is mathematically equivalent to the joins of its unbundled projection tables B and C.

Read it all. (And please comment there, not here)



 



Tuesday, May 27, 2014

Weekly Update





1. Quote of the Week
Codd's relational model is based on set theory, and set theory simply doesn't work for database systems. It can't, for example, model a gum ball machine. Gum balls, you see, have only one attribute, which is color (gum balls don't have names, serial numbers, bar codes, or URLs). If you put 200 gum balls in a gum ball machine, the gum ball machines contains 200 gum balls. If you try to put 200 gum balls in a gum ball relation, you get a relation of 5 gum balls (the number of colors) and 195 duplicate errors. If you then take 5 gum balls out of the gum ball machine, it still contains 195 gum balls. If you take 5 gum balls out of the gum ball relation, it goes empty. --Jim Starkey, LinkedIn.com

2. To Laugh or Cry?
How to store and document large data models

3. Online

4. Interesting elsewhere
Software engineers think they're amazingly great 

5. And now for something completely different
God keeps missing.




Sunday, May 18, 2014

For Codd's Sake -- UPDATED




UPDATE: Correction on 6/8/14


This is a response to comments by a reader on one of my posts.
L: I realize that you have taken much further what Codd wrote on the first page of his 1970 paper but it's still remarkable how many people in the data business are not able to refer to, let alone talk productively, about his "natural structure of data". And many treat RT as a fait accompli when it is still evolving, not to mention those who, as you've pointed out many times, treat SQL gizmos such as outer join as if they come from RT when they don't.

Sunday, May 11, 2014

Weekly Update




1. I will give the following presentation

Big Data, Analytics and Normalization
"Big Data may offer analytical insights, but with almost certainty will produce really big lies from 100% correct data", particularly when data are from external sources. This presentation will demonstrate
  • Why and how
  • How to protect yourself
Wednesday, 5/14, 7:00pm
Microsoft San Francisco office
835 Market St.

For more information contact MGinnebaugh@designmind.com.


2. Quote of the Week
Q: How do we do data modeling in NoSQL DB and Big Data??? 
A: Define the schema hierarchically so that the tables in the schema including ER form a forest using a parent relationship i.e. each table has at most one parent key. Now the data retrieval and storage is done using these parent or ancestor keys. Look for google datastore documentation for more details. --LinkedIn.com

3. To Laugh or Cry? and Online

David McGoveran's comments posted last week are a response to the following LinkedIn exchange initiated by Jim Starkey of Rdb and Interbase fame:
Is the Relational Data Model Spent?
Given who Jim is, my instinct is to cry rather than laugh. This is also the Online item, as I participated in the exchange. Jim did not respond publicly to my challenges and claimed in private that I was trolling. You decide, but if I am a troll, so is David.


4. Interesting elsewhere
Do graph databases deprecate relational databases?
H/t Erwin Smout.


5. And Now for Something Completely Different
No comment.



Sunday, April 27, 2014

UPDATE 2: David McGoveran: Comments on Jim Starkey's "Is the Relational Data Model Spent?"




UPDATE 1: I have added Jim Starkey's reply to David's initial response (with my brief comments) and David's reply to it below.

UPDATE 2: I have made a few minor corrections and fixed end-note formatting problems.


David McGoveran's First Response  
© 2014 David McGoveran – All Rights Reserved
Jim Starkey's opinions in Is the relational model spent?, a LinkedIn exchange he initiated, reflect those of many professionals who have used and even developed SQL DBMSs and their predecessors. While the concerns with so-called "commercial relational database systems" expressed by Jim are valid, they have nothing to do with the relational (data) model. They are the result of DBMS implementations by those who borrowed something from the relational model, but never understood it and so did not know how to take advantage of it to solve application problems.

Weekly Update - UPDATED




1. I will give the following presentation

Big Data, Analytics and Normalization
"Big Data may offer analytical insights, but with almost certainty will produce really big lies from 100% correct data", particularly when data are from external sources. This presentation will demonstrate
  • Why and how
  • How to protect yourself
Wednesday, 5/14, 7:00pm
Microsoft San Francisco office
835 Market St., 7th Floor
San Francisco

For more information contact MGinnebaugh@designmind.com.


2. My April column @All Analytics:
Missing Data, Databases & Analytics

3. Quote of the Week
Q: Is it necessary to follow standards during SQL programming? 
 
A: Standards and Best Practices usually come from common sense. I want to point out that it is God given potential which one must realise and be conscious to utilize it for His glory. --LinkedIn.com

4. To Laugh or Cry?
A data model of the SAP Bill of Material Explosion tables

5. Online 


6. Interesting elsewhere
Big Data, Little Happiness
(requires free registration)


7. And now for something completely different
The Death Of Expertise
Today being the anniversary of the Holocaust, I decided to add the following:
Berlusconi's holocaust jibe provokes German outrage
The irony of Italians badmouthing the Germans about the extermination of Jews. But this time the former spoke the truth:the latter cannot have it both ways.




Sunday, April 20, 2014

Forward to the Past: From Codd to SQL to NoSQL




As told by C. J. Date, sometime shortly after the introduction of SQL DBMS's in the industry, when non-relational products e.g. hierarchic and network reigned and the relational idea was a very hard sell, he and Michael Stonebraker (the author of Ingres and at the time a professor of Computer Science at University of California Berkeley) participated in a panel at a technical conference. The following is the (praphrased) exchange between them:
CJD: The reality is that most practitioners are too set in their non-relational ways and we cannot expect them to understand and appreciate the relational model. Rather, we must focus on the young generation of practitioners, who learn the relational model at university.

MS: Chris, you don't understand. I am teaching those youths: they were not around when we struggled with the huge problems of the pre-relational systems and they are reinventing all of them!

Friday, April 11, 2014

Weekly Update




1. Quote of the Week
Now that we have seen a lot of information about NoSQL databases, it is interesting to drop back and look around at how much NoSQL stuff we already have in our organizations. I had never thought of a file system as a database, but it is. The comparison is fascinating. File systems don’t impose any structure on the data that is stored in any given file. There is a key value relationship to each file. There is little control over concurrency beyond file locking. This is very similar to NoSQL, with locking only at the aggregate level. File systems are cheap; everyone has one and they hold huge amounts of data on multiple nodes. --Book review, NoCOUG Journal

2. To Laugh or Cry?
Find GUID in Database

3. Online exchanges

4. Interesting elsewhere (corrected first link and added a second):

5. And now for something completely different
Joe Biden wants to nominate Obama for sainthood
The VP of the world's "superpower".
Stanford opening new lab to study bad science
And tomorrow we'll need a lab to study bad science in the study of bad science.










Tuesday, April 1, 2014

Analytics = Manipulation of Data Structure



In What the $&@#^ is Applied Big Data, venture capitalist Greg Sands raises an issue about which I as a scientist (and not just a "data scientist") have often expressed concern. 
"The world is awash in data. Figuring out what to do with it is the problem. The press is littered with reports about Big Data. Many CIOs report that their CEOs have come to them and said, “We need some of that Big Data.” That often means make sure we’re collecting all the available data, often deploying a new Hadoop-based infrastructure to store and analyze it. After this elaborate process and extensive investment, they’ll start mining to figure out if there are critical insights that come out of the data. We see many entrepreneurs that start the same way. Aggregate data and look for a problem."

Monday, March 24, 2014

Simplicity: Forgotten, Misunderstood, Underrated Relational Objective



In a LinkedIn exchange I argued that an optimal generality-to-simplicity ratio (ability to represent a maximal range of reality with minimal complexity) and a 1:1 correspondence between informal business modeling constructs and formal logical database constructs are beneficial. And I claimed that insofar as data models that are formally defined are concerned, the relational model scores best on both.

One of the responses I got was
GE: Though I might come up with slightly different lists, in general, I agree with your expression of criteria for selecting a primary key and of generality and simplicity, but disagree with your conclusion that "RM scores better than any other modeling scheme."
Let me take his points one by one.

Sunday, March 23, 2014

Weekly Update




Note: I usually receive notifications of comments to my posts, which I then publish if they are not spam. Today, however, I noticed a whole bunch of old comments that were waiting for moderation, some of 2013, of which I was not notified. I have just published them and replied. Apologies to the commenters.

1. Quote of the Week
Schemaless describes the storage engine, not the data. Data has schema. No Data is ever Schemaless. Schemaless DBs merely describe a feature of themselves, not the data they store. Namely that they don't store and enforce this schema in addition to the data.
One advantage of this Is that you can change the schema "easily" - helps with up time. Now if you don't evolve the old data with each schema change you can end up with multiple schemas stored in your backend and no way of knowing which data is of which schema without some form of analysis of the data. Show me the Front end that can deal with evolving schemas without knowing about them ;) Point being Schema's always there wether any tier deals with it explicitly or not. Something has to manage it. --LinkedIn.com
2. To Laugh or Cry?
Big data means the reign of the relational database is over
3. Online
What is the best way to explain Normalization 1NF,2NF and 3NF
4. And now for something completely different
Samsung’s entire leadership team is paid less than individual executives at Google, Apple





Tuesday, March 18, 2014

Science, Religion, EAV and the Relational Model




Note: This is a 11/06/17 revision. Thanks to Erwin Smout for his review of a draft and suggesting improvements.

The claims that (1) the relational data model (RDM) is old and, by implication, obsolete -- the industry has purportedly "progressed" -- and (2) promoting it as a superior alternative to NoSQL, Hadoop and other "modern" data management technologies is "religious" in character are routine. They have popped again in a LinkedIn exchange and I responded as I usually do, by asking why is the promotion of a scientific approach deemed religious, while pushing ad-hoc alternatives is not?

Sunday, March 9, 2014

Weekly Update




1. My presentation at SQLSaturday:
The Last NULL in the Coffin: A Relational 2VL Solution to Missing Data
March 15, 2014 11:15am
Microsoft Technology Center, 1065 La Avenida, Building One, Mountain View, CA, 94043
2. Descriptions of all available courses are now posted on the SEMINARS page. Contact me if you are interested in public or private sessions, with possible customization for particular needs.

3. Quote of the Week
1: - a picture is only one representation of a data model: What is the adequate data structure to store a data model: Picture, Text (UML), formal (Gellish), Database) - is there an API to access, create and manipulate data models - do you handle hundreds of types (Entity, Relationship, Attribute): - do you handle one conceptual data model and derived consistent (external) submodels - document : which meta-attributes must be maintained to describe the elements of the data model. How to create different views of the data model (Pictures with different views, detailed printed documents)
2: - How to check the data model with instance data and queries based on the requirements.
3. - Use the ER-Model for Instance data (instead of the relational model) and thereby avoid the impedance mismatch:
4: A query language to access instance data which is based on the Entity-Relationship-Model and NOT on the relational model" --LinkedIn.com
Ugh!

4. To Laugh or Cry?
mysql - additional information on normalization 
5. Online
Do you use Composite Primary Keys to design a good, solid data model?
6. Elsewhere
http://image-store.slidesharecdn.com/a606a81e-9bd4-11e3-8016-22000a9aa8cc-large.png
The database field.

7. And now for something completely different
Zombie Studies Gain Ground on College Campuses








Wednesday, February 26, 2014

Anatomy of a Data Management Project: Distribution Independence



The term "distributed" is thrown around a lot these days. Hype notwithstanding, just as with analytics and data science, distribution in data management is nothing new.In fact, SQL vendors (IBM, Sybase, Ingres, Oracle) -- frequently criticized today for non-scalability -- tackled distribution decades ago. The non-relational systems preceding SQL were not amenable to it, and SQL is the closest to the relational model the industry allows you to get. 

Sunday, February 23, 2014

Thinking Logically: SQL, NoSQL and the Relational Model




I very much doubt that somebody who does not think logically can be a fully competent database professional. From a LinkedIn exchange, an attempt to "address some of my points by calling them out":
JL: You said: "... much of the underlying motivation of NoSQL stuff is anti-relational ..." Okay, so? Data management in graph/document/columnar dbs is not possible because they are "anti-relational"?
The point I made was that, name notwithstanding, NoSQL vendors/proponents are not just anti-SQL, they are actually anti-relational, an important difference.
  • What exactly does this have to do with whether graph/documents/columnar database systems "are possible or not"?
  • Columnar DBMS's can be relational and are not considered NoSQL products.
  • At issue is not the possibility of graph/document systems, but what they are appropriate for.

Sunday, February 16, 2014

Weekly Update




1. Quote of the Week
A view is a logical table based on one or more tables or another view.View can be thought as a virtual table which takes the output of a query and stores it ... A View can be based on a table or another view.--dwhlaureate.blogspot.in

2. To Laugh or Cry?
Will physical modelling and normalization make sense in next decade?

3. Online
Database design patterns

4. Elsewhere

I was reading the following
How Edward Snowden went from loyal NSA contractor to whistleblower
which is interesting in itself, when I came across this:
In mid-2006, Snowden landed a job in IT at the CIA. He was rapidly learning that his exceptional IT skills opened all kinds of interesting government doors. "First off, the degree thing is crap, at least domestically. If you 'really' have 10 years of solid, provable IT experience… you CAN get a very well-paying IT job," he wrote online in July 2006.
Should be familiar to my readers.


5. And now for something completely different  
Who says Congress is not representative?

Martin Kramer is, IMO, possibly, the most astute analyst of the Middle-East: 





Monday, February 10, 2014

"Denormalization for Performance": Don't Blame the Relational Model



 REVISED: 10/18/16

Many common misconceptions are excellent indicators of poor grasp, if any, of  the relational data model (RDM). One of the most entrenched is the notion of "denormalization for performance".

I will not get into the first four of the 5 Claims About SQL, Explained. I do not disagree with the facts, except to point out that the problems are not due the relational nature of SQL and its implementations, quite the opposite: it is due to their poor relational fidelity. I will focus on the fifth, "Should everything be normalized?"

Sunday, February 2, 2014

Weekly Update (Revised 2/3)




1. Quote of the Week
Observe the trend of NoSQL growth, revenue will trail along irrespective. In the industry, only with respect to non-OLTP applications, RDBMS is in "keep the lights on" state by necessity, it is either awaiting obsolescence/end of life, or replacement with NoSQL solution; no longer a "workhorse" - this was the discussion point. --LnkedIn.com

2. To Laugh or Cry?
Why semantic models like RDFOWL, TMDM, are not sufficient for the web of linked data

3. Online
Why don’t RDBMS products support sub-typing?

4. Elsewhere
Cisco unveils 'fog computing' to bridge clouds and the Internet of Thing
Could not have thought of a better name!


5. And now for something completely different 

Friday, January 24, 2014

Causality, Uncertainty & Actionability in Analytics



In analytics, it's much easier to "fish" for correlations and try to explain them post-hoc than to develop mutually exclusive hypotheses up-front and test empirically which one holds.Only the second approach is scientific, though, hence my skepticism about the hype of business analytics as "data science." 

Thursday, January 16, 2014

Weekly Update




1.  Quote of the Week
I do not understand your first point. Even a database designed with only 1NF can have integrity if other methods are used to ensure that integrity. [Higher n]ormal forms can guarantee the absence of various integrity issues, but the lack of a normal form does not guarantee the presence of the integrity issue. I remember writing pages of code to do just that in the early days of RDBMS products before normal forms (i.e., referential integrity) was strictly enforced by the DBMS. --LinkedIn.com
Note: My (first) point was that the minimal relational mandate is 1NF, but that full normalization (5NF) is desirable for practical reasons.


2. To Laugh or Cry?
What is difference between storing data in traditional and modern way in database?

3. Online
What is Surrogate Key, why it is used, is it Primary Key?
Apropos my just published paper on keys.


4. Data management is important, after all.
Point-of-sale malware infecting Target found hiding in plain sight
(Note the last sentence in the article).

So is database design
Poor Data Management Blinded Chase to Madoff Fraud

5. And now for something completely different
Looks like a pattern.



View My Stats