Wednesday, October 10, 2012

To Laugh or Cry?



UPDATE: By mistake I posted a link to a piece that I intended to debunk, which is why you could not see any comments. Here is the correct one.



The Death of the Relational Database,
whydoeseverythingsuck.com

An oldie, but goodie. Particularly some of the comments.



Quote of the Week



SQL Server and all relational databases aren't much more than glorified file handlers. If you accept them as that, then you'll see the falacy [sic] of trying to do things like inheritence [sic] and the wad of other stuff people think it should do. SQL is not a full fledged language nor should it be. It's a simple file handling language. Tables are really nothing more than files with lines (rows) and fields (columns). It's a very simple thing... why do people keep trying to make it more complex?
--Relational Model Advantages,SQLServer Central

Sunday, October 7, 2012

Data Fundamentals, Fads and "Big Data"



WS writes:
It is fairly clear to me that the "NoSQL" movement (or "big data" as some people seem to be calling it nowadays) has thrown out the relational baby with the SQL bathwater.
That is exactly the point I was making in my several posts on NoSQL. Since SQL is practically the only commercially implemented data language with any relational characteristics, most practitioners confuse it with the relational model. I quoted with approval David McGoveran that the NoSQL movement has its roots in the anti-relational sentiment expressed by object and XML proponents. Underlying it is the failure to appreciate the distinction between the fads that characterize the industry's mode of operation and relational technology. This failre is due to disregard for and lack of knowledge of data fundamentals. In fact, the relational model has been treated for a time as just another fad.
In particular I wanted to say that I agree very strongly with Eric Kaun's comments about the supposed inflexibility of relational schemas. In my experience it is actually very easy to change the schema, but often fiendishly difficult to assess the impact of these changes on all the applications that use the database.
For a while I just could not understand what the claims of "fixed relational schema" meant, because it is obvious that the relational approach increased schema flexibility via several types of data independence relative to products that preceded it. But recently it occurred to me that what complainers mean by it is not that the schemas are fixed, but rather that inflexibility means exclusive reliance on tables. Why not change the structural basis of the schema "as needed"? This certainly is an explicit fundamental rejection of the relational model (they cannot express even their complaint correctly).
In spite of the apparent sophistication of many developer environments, none of them, as far as I am aware, do anything as simple as representing the relation between application language expressions and database attributes. This would mean a simple query could find all the dependencies between applications and the database. As is often the case application programmers are ignoring the failings of their own version and dependency management and blaming the DBMS instead.
It depends on what the meaning of "sophisticated" is. It may apply to programming languages, techniques and tools, but not necessarily to data fundamentals such as logic, the different data models with their pros and cons, their practical implications and the history of the field. That explains why old technologies are reinvented and relabeled as new and the constant stream of fads. As to the application-DBMS functional distinction, many developers are not even aware of it and see nothing wrong with having to undertake database functions in applications. Hence the increasing number of "application-specific databases".
Regarding NoSQL, I suspect there is an element of cargo-cult thinking going on here. Google have been very successful (at least from a business perspective) using distributed processing so if you use the same methods as Google then you are bound to be successful too.
Obviously if you lack foundation knowledge you will opt for emulating others without a real understanding to what extent their success--to the extent that it is not a short-lived one--is applicable to your circumstances (see SQL/PostSQL/NoSQL). As an example, from my experience with Google's Blogger in particular and their other products in general (see my previous post) I would not be surprised if their poor ability to fix problems and provide expedient support (if at all!) is probably due to some extent to their reliance on ad-hoc, schema-less products. These problems have the potential to reach the same walls as those reached by pre-Codd non-relational technologies. Facebook, Amazon and other "Big Data" companies reveal similar problems.

The size of the data has nothing to do with the data model. Distribution has nothing to do with the data model. Consider, again, David McGoveran's list of circumstances which may justify a NoSQL product:
First, when discovery of relationships is more important than consistent processing and specific data results.
Second, if the data processing is meant to be inductive (e.g., suggestive) rather than deductive (i.e., precise).
Third, when the application is changing very fast, data complexity is great (variety or amount).
Fourth, if physical issues, like big data or a high degree of parallelism, are more crucial than data integrity. You must be willing to throw away data consistency in favor of performance and scalability.
Fifth, if you have a mission-critical one-off application for which a fixed data organization is ideal, in which case the costs and risks may be lower than licensing a vendor’s RDBMS [read: SQL] or trying to force an open-source RDBMS [read: SQL] to fit the need.
With knowledge of data fundamentals one is tempted to conclude that the intention here was to demonstrate that the set of applicable circumstances is very close to being the empty set. In the absence of such knowledge, the temptation to use these tools in order to avoid hard thinking and design effort upfront, while expecting is just too strong to resist.

Thursday, October 4, 2012

Weekly News




It's been months since a problem was discovered in Blogger's new interface: pages' URLs change when they are updated, which is a disaster for SEO traffic. Amazingly, Google initially asked for details experienced by users, lots were provided, but there was no response from Google or any information that would indicate when a solution would be available, except a comment very recently that they are working on a solution that might be available within a couple of weeks. Worse, it switched users to the new interface before it provided a solution (see Page URLs Change, Though Link References Do Not Change), although it looks like the update introduced the problem to the old interface too (my gut tells me it might be a data management problem).

There's more. As I wrote earlier, I was planning to drop the old dbdebunk.com site and point the this blog to that domain, so that links to the old site would at least reach the blog, if not the original page. Google provides instructions how to use a custom domain with Blogger, so I canceled the old site, but when I followed the instructions, I got an error. Then I was alerted to yet another problem with Blogger:
Known Issue: Custom domains in Google Sites

We are aware of an issue with custom domains in Google Sites. The functionality to add a new custom domain is currently unavailable - however, existing custom domains are working. We are working to resolve the issue as quickly as possible and expect a fix within the next few weeks
It was posted on September 19th and note the timeframe of a solution!
That a company like Google cannot resolve such problems for months (and is unresponsive to boot) despite the damage it causes its users and so many complaints is a a phenomenon on the rise. Companies like Facebook and Google have, on the one hand, huge institutional market power and on the other hand hugely complex software infrastructures that are based on proprietary, ad-hoc software (all those "BigData", NoSQL or even non-database applications goodies) that are extremely difficult to manage, maintain, progress and optimize. This creates the combination of difficulty to satisfy user needs with lack of responsiveness.

Luckily, my domain registrar was able to provide a temporary solution by forwarding dbdebunk.com links to dbdebunk.blogspot.ca without users seeing the .ca.

To Laugh or Cry?



Database design and terminology, dbForums.com

Quote of the Week



If you have a wide existing table containing lots of frequently null columns, it is likely you can make it more efficient if you re-arrange the physical table so that most of the nulls are contiguous to the right hand side of the table. If the table is over 254 columns in certain RDBMSes, you may profit from breaking the table into two pieces such that the overflow is less often referenced and when you need the whole it is via a view slicing the two physical tables together one-for-one.
--Database tables and NULLs, stackexchange.com

Monday, October 1, 2012

Normalization, Further Normalization, Ease of Use, Integrity and Performance



Revised: 10/15/16
"Normalization was invented in the 70's as a way to put some structure around how developers were storing data in a database, in addition to trying to save disk space. You need to remember this was a time when 1MB was billions of dollars and a comput er needing 1GB of data storage was inconceivable. It was a way to squeeze as much data into as small a space as possible." --Tom Phillips, social.technet.microsoft.com
Perhaps the lack of understanding of the relational model was so acute at the time when it was first published (1969-70), that it would not surprise me if a belief existed then that normalization would save storage space, even if I don't understand in what this belief was grounded. But there is no justification for such a belief to persist in 2012, no matter what else one thinks of normalization.

For the multiple advantages from full normalization (5NF) -- chief among them semantic correctness of query results (i.e., no anomalous side-effects)--see the just published THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, available via the BOOKS page).
View My Stats