Combining data extracts from databases for analytical purposes without knowing what the source database tables mean -- what exactly in the real world they represent -- can produce wrong results.
Monday, June 30, 2014
Big Data, Normalization & Analytics: Meaning & Constraints
My June post @All Analytics
Combining data extracts from databases for analytical purposes without knowing what the source database tables mean -- what exactly in the real world they represent -- can produce wrong results.
Combining data extracts from databases for analytical purposes without knowing what the source database tables mean -- what exactly in the real world they represent -- can produce wrong results.
Sunday, June 29, 2014
Denormalization: Database Bias, Integrity Tradeoff and Complexity
The common and entrenched misconception about normalization was recently visible yet again in a LinkedIn exchange.
R: Unless the need is for ACID compliant transactions, denormalization is generally not considered logically, physically or whatever-ally-–so essentially a thoroughly normalized mode is relevant for a write-infrequently consumption of data and data integrity can be guaranteed by design.
Sunday, June 22, 2014
Weekly Update
I have corrected a mistake in For Codd's Sake: a mathematical relation is not the Cartesian product (CP) of the domains over which it is defined. Two readers correctly pointed out what I actually wrote myself in my business modeling paper:
Mathematically, a relation on domains—which are sets of values of a type—is a subset of the Cartesian product of the domains.Note that the whole CP is also a subset, so it is also a relation, which happens to have useful applicability to business modeling and database design. In the database context, it can be pictured as the pool of all possible rows--past, present and future--for a R-tablevar defined by the domains' types. A database R-table is the set of actual rows at any point in time that is consistent with the set of all integrity constraints to which the R-table is subject (see Business Modeling for Database Design).
1. Quote of the Week
NoSQL usant correct m'y indeed totof n'y most of the dev ans devops who clearly thing nosql Means they will ne a le to do whatever they wants ans still have answers to their twisted query in a correct time. Those people see nosql as the mean to get ris of DBAs. And il not kiddin since it's happening right now un many companies i know of. --LinkedIn.com
2. To Laugh or Cry?
Architecting IMS for Big Data - a symbiotic relationship.
3. Online
- Database Design Patterns
- Will Pattern Discovery Systems Trump Analyst's Rules-Based Models
- How do we do data modeling in NoSQL DB and Big Data
4. Interesting Elsewhere
IEEE Computer Issue on CAP TheoremH/t Erwin Smout.
5. And now for something completely different
- Beyonce and Miley Cyrus studies offered at US colleges
- Better than Scientology: The new Zuckerberg-based religion
The PostWest.
Sunday, June 15, 2014
Sunday, June 8, 2014
Weekly Update
1. Quote of the Week
Logical design is where the Architect defines entities (which will become tables in a database), attributes (which will become columns in a database), etc. This is typically the level that SMEs are most comfortable. I think that a Logical design may deal with data types and keys, but it does not cater to any specific platform or engine.2. To Laugh or Cry?
Physical design is where the Architect translates the logical design into tables, columns, datatype specifics like INT versus NUMERIC, indexes, partitions, etc. This is where "the rubber meets the road" and the logical design gets mapped into a form that can exist and be tested on a database server.
While I'm sure that someone will object to this link on religious grounds, the discussion does a pretty good job of making the distinctions that concern me. --LinkedIn.com
MyBatis Schema Migration SystemH/t Ben Samuel, who adds:
"From the department of "we haven't really thought this feature through" comes this gem, one of several schema migration systems that allow "reverse migrations" or "downward migrations". Whereas a forward migration creates tables, columns, etc., a reverse migration drops them. The video proudly shows them "reverse migrating" their database until all tables are dropped. Another vendor patiently explains why they don't offer this feature."
3. Online Debunkings
- Is the Relational Data Model Spent?
- 4 Table Types you Need to Scale a Relational Database
- A data model of the SAP Bill of Material Explosion tables
- What are people's opinions on nosql
- Is Data Science a combination of Data Architecture and System Engineering?
4. Interesting Elsewhere
The Death Of Expertise
5. And now for something completely different
The "productive" business and tech elite:
- Snapchat CEO Evan Spiegel 'mortified and embarrassed' over leaked emails
- Police-friendly tech CEO charged with hit-and-run
- Billionaire claims he owns the road, the beach and the tides
- Mystery millionaire hides $100 bills around US city ...
- Perkins On 'Nazi' Quote Inequality
Sunday, June 1, 2014
Big Data, Normalization & Analytics
May Post @All Analytics
What you need to know for the purposes of this discussion is that tables that bundle multiple entity classes have certain drawbacks. Normalization is a design repair procedure that unbundles the classes -- the columns representing attributes pertaining to each class -- each into its own table. This is possible if and only if there is no data lost or made spurious in the process -- that is, when a bundling of table A is mathematically equivalent to the joins of its unbundled projection tables B and C.
Read it all. (And please comment there, not here)
What you need to know for the purposes of this discussion is that tables that bundle multiple entity classes have certain drawbacks. Normalization is a design repair procedure that unbundles the classes -- the columns representing attributes pertaining to each class -- each into its own table. This is possible if and only if there is no data lost or made spurious in the process -- that is, when a bundling of table A is mathematically equivalent to the joins of its unbundled projection tables B and C.
Read it all. (And please comment there, not here)