Sunday, August 31, 2014

Weekly Update




1. Quote of the Week
I use the word grain in the same sense as Kimball although I use it across Facts and Dimensions. I like the term over things like Uniqueness, Constraint, Key because it is a term that business readily understand, and can be used prior to the formal identification of a final key. In Dimensional Modelling it is a best practice for the Fact tables to have such a grain (a composite key across associated Dimensions) and is also necessary for many Dimensions (to assist with updates, type-2 logic etc.) - to the point where it is a best practice to identify the grain (Row Natural Key, Source Id, etc.) of the Dimension. The use of a surrogate key on either Dimensions or Facts must be backed by this level of rigor if data integrity is to be maintained. It also forces modeller consideration of source system issues such as multi-source key uniqueness, reuse of keys, deletions, etc. To clarify a little on Dimensions, the grain of an example type-1 customer dimension would be 'a customer id', the grain of an example type 2 customer dimension would be 'a customer id + as at time'. So 'grain' means the defined uniqueness for a row in the table. Generally, this also has the advantage of calling out poorly designed structures that have not established their relational uniqueness correctly - the cause of the irritating duplicate row issue in a Surrogate Key-oriented Fact table.--LinkedIn.com
Got that?

2. To Laugh or Cry?

3. Online Debunkings

4. And now for something completely different
Inside The Mind Of Leonardo
Fascinating.

Two books that every American should read (but won't).

@The PostWest




Tuesday, August 26, 2014

Data Analysts: Know Your Business Rules



My August Post @All Analytics

To ensure data operations make sense and results correspond to the real world and are interpreted correctly, analysts need to know the business rules on the basis of which a database was designed. Here are the types of rules for which they should be looking.

Read it all. (Please comment there, not here)






Sunday, August 24, 2014

NEW: Paper revisions available




Pls see PAPERS page for the current version of this paper, when it becomes available.

Friday, August 15, 2014

Weekly Update




1. Quote of the Week
Invoice number is the key (key is not unique in this table). Although structure or key constraints doesn't enforce uniqueness of rows, the assumption is there will not be duplicate rows. --LinkedIn.com

2. To Laugh or Cry?
Relational Queries by Reference

3. Online Debunkings
What is Weak Subtype

5. And now for something completely different
Tim's Vermeer
Fascinating

@The PostWest

Sunday, August 10, 2014

The 'Real World' and Database Design




Conveying data fundamentals to practitioners, losing neither rigor, nor the audience is a difficult task. There are many experienced professionals with tool expertise, but poor foundation knowledge for which there is little regard. Even in academia education has been substituted by training, which is not the same thing.

One dilemma faced by an educator is the tension between the simplicity of examples effective for conveying general concepts or principles and the complexity of the reality to be represented in databases. The latter requires integration of many concepts and principles, as well as thorough business knowledge. This is part of the reason why I normally refrain from specific online modeling/design advice and limit myself to the general principles that must be adhered to in the process.

Wednesday, July 30, 2014

Big Data & Analytics: Table Interpretations



My July Post @All Analytics

When, for analytical purposes, you combine data extracted from database tables whose real-world meaning you don't know, you're asking for trouble. The meaning, after all, isn't in the tables. Rather, the meaning is contained in the business rules on the basis of which the tables were designed and which are approximated in the database with integrity constraints. Interpreting results on the basis of just visual inspection of the tables therefore involves guesswork and is almost certain to be wrong.

Read it all. (And please comment there, not here)



 



View My Stats