Thursday, July 9, 2015

The First Half of Database Science for Analysts



My July post @All Analytics:

One would expect “data scientists” to be keen on the dual scientific foundation of database management -- the relational data model (RDM) -- but they know little beyond “related tables” and, in fact, complain that more often than not data “do not fit” into them. Much of that is the result of poor education and an almost exclusive focus on software tool training. Even the analyst intent on acquiring foundation knowledge is more likely to be misled than enlightened by published information.

Read it all. (Please comment there, not here)

 


 

Sunday, July 5, 2015

The SQL and NoSQL Effects: Will They Ever Learn? UPDATED



UPDATE: I refer readers to Apache Cassandra … What Happened Next. Note that this was an optimal use case for NoSQL. Read it focused on the simplicity of the data model and particularly physical data independence relative to RDM. 

In Oracle and the NoSQL Effect, Robin Schumacher (RS), a former "data god" DBA and MySQL executive now working for a NoSQL vendor claims that Oracle’s recent fiscal Q4 miss--a fraction of what's to come--is due to its failure to recognize that
"... web apps ushered in a new model for development and distributed systems that ... [r]elational databases are fundamentally ill suited to handle ... Their master-slave architectures, methods for writing and reading data, and data distribution mechanisms simply cannot meet the key requirements of modern web, mobile and IoT applications. I tell you that not as an employee of a NoSQL company, but as a guy who has worked with RDBMS’s for over twenty-five years. In short, you simply can’t get there from here where relational technology is concerned, and that’s why NoSQL must be used for the applications we’re talking about.

Sunday, June 28, 2015

Weekly Update



1. Quote of the Week
My feeling is that the field of NoSQL was created EXACTLY so the data should not be normalized like in relational databases--which has the disadvantages that data needed for real time/online applications needed to be joined at runtime before being used by the application. Under the time constraints of an online system, this is unacceptable. Hence, application developers want to store persistently the data EXACTLY in the way application see it: pre-aggregated, potentially inconsistent, and potentially replicated. Bottom line, there is no "rule" of how you should store the data. Just look at your application needs. Not everyone has the same requirements as iTunes or Netflix, so you don't need to copy their design.
...
If this is a question for you... maybe you shouldn't be using a NoSQL database in the first place !? Why do you think you need one and good old relational databases aren't good for you? Just because it's "fashionable" ? My point is: if you knew exactly WHY you need a NoSQL database, you knew EXACTLY how to structure your data for it.
--LinkedIn.com
With consistency gone, whatever is left?

2. To Laugh or Cry?

Data Modeling in NoSQL
3. Online Debunkings 
4. Elsewhere 
5. Added to LINKS page:
6. And now for something completely different
 

Friday, June 19, 2015

Database Fundamentals for Analysts



My June post @All Analytics. 

This may prove to be a trend, and while it will ease data analysts’ work, it also requires them to know and understand databases better, rather than rely on IT staff. Since in my writings here and elsewhere I demonstrate that even database professionals do not have a sufficient grasp of data and database fundamentals -- those “invisible” aspects that are not in any DBMS manual, or that you cannot get from just working with a tool -- maybe this is a good time and place for database education for analysts.

Read it all. (Please comment there, not here)







Sunday, June 14, 2015

The Cookbook Approach to Data Management




15 years ago I posted The Myth of Market Based Education @the old dbdebunk.com. Last week I deplored the substitution of tool training for education and increasingly young age at which it substitutes for education, preventing any independent and critical thinking rather than instilling it:
... a systemic problem that perpetuates itself without a solution and worsens rather than improves, particularly with Google, Facebook, Twitter and Microsoft getting involved in the school and academic systems.
Shortly thereafter
...the San Francisco School Board unanimously voted Tuesday to ensure every student in the district gets a computer science education, with coursework offered in every grade from preschool through high school, a first for a public school district. Tech companies, including Salesforce.com, as well as foundations and community groups, are expected to pitch in funding and other technical support to create the new coursework, equip schools and train staff to teach it.
Basic computer literacy, perhaps, but computer science for pre-schoolers? Tech companies have a unique notion of the "science"--witness "data science"--they want to impart to young children. This week's quote is a description of it by one of my readers as experienced by his son:

1. Quote of the Week

My son, who is a sophomore in high school, had a class in Microsoft Excel and Access this semester. This "class" was created and delivered online, in the classroom, by Microsoft for the school systems. His "instructor" is a baseball coach. Anyway, he asked me for some help with a portion of the Access module on queries. The "lesson," a set of step by step instructions with no explanations, instructs the student to use the "find duplicates" query wizard. Directly following that was the "find unmatched" (meaning in their terms rows in one table that should also be in another table but are not) query wizard. This is yet another example proving your point.
I rest my case. 

2. To Laugh or Cry?
Small Data - Too many relationships spoil the model...
3. Online Debunkings
Something doesn't make sense
4. Interesting Elsewhere
How to interview an Oracle DBA candidate (NOT)
5. And now for something completely different

Sunday, June 7, 2015

Forward to the Past: Sounds Familiar?



Working on a book of 2000-2006 material from the old dbdebunk.com, I came across the following 10/29/04 exchange. MySQL has probably improved--although, adding features post-hoc to products that were not explicitly designed for such upgrading is always problematic--more complex and limited than necessary. However, education and foundation knowledge have become worse and, from a foundational perspective, so have products and practices.
JG:  fell asleep dreaming of column constraints. I woke up thinking of foreign keys. I've been married to MySQL for so long that I had no idea all of these other things were possible!

Using a database and not knowing about foreign keys? My immediate reaction was to be astounded. However, he just happens to have begun with the least-robust database product on the market, and his learning is (evidently) confined to whatever product he happens to be using.
Astounded? Nah, standard operating procedure.

Saturday, May 30, 2015

Weekly Update




1. Quote of the Week

In this paper we briefly review some of these issues and then concentrate on the problem of generalizing the formal framework of the relational data model to include null values. A basic problem with null values is that they have many plausible interpretations.
--Database Relations with Null Values, Bell Labs, 1983
No, that's not the basic problem.

2. To Laugh or Cry?

Relational table naming convention
3. Online Debunkings

4. Interesting Elsewhere

5. And now for something completely different

View My Stats