Wednesday, September 19, 2012

Data Models and Usefulness



In my 3-part article on dynamic schema, NoSQL and the relational model I admitted that I did not know much about NoSQL products in general and MongoDB in particular. Nevertheless, it was not difficult to figure out what problems would be faced using a docubase for database management purposes.

Then Matt Rogish alerted me to Why I Migrated Away From MongoDB. I suggest you read it in the context of my article and see how close my suspicions were to reality. Here are some quotes worth pondering (emphasis mine):
Alas, not being aware of the mathematics behind relational algebra, I could not see clearly the trap I was falling into - document databases are remarkably hard to run aggregations on and aggregating the data and presenting meaningrful statistics on your receipts is one of the core features of digiDoc. Without the powerful aggregation features that we take for granted in RDBMSs, I would constantly be fighting with unweildy map-reduce constructs when all I want is SUM(amount) FROM receipts WHERE <foo> GROUP BY <bar>. 
People keep complaining that JOINs make your data hard to scale. Well, the converse is also true - Not having JOINs makes your data an intractable lump of mud. 
...when I last looked something as simple as case-insensitive search did not exist. The recommended solution was to have a field in the model with all your search data in it in lower case [FP: heh, heh]... And if I add a new field to the model? Time to regenerate all the search strings. I can only come to the conclusion that mongodb is a well-funded and elaborate troll.
...somewhere along the stack of mongodb, mongoid and mongoid-map-reduce, somewhere there, type information was being lost.
Then it might have been the lack of an enforced schema? Thinking about it though, schemas are wonderful. They take all the constraints about your data and put it in one place. Without a schema, this constraint checking would be spread all over my application. A document added a month ago and a document added yesterday could look completely different and I’d have no way of knowing. Such fuzzy schemaless data models encourage loose thinking and undisciplined object orientation.
Anybody with foundation knowledge would expect these problems. As I argued so many times: a data structure determines manipulation and, therefore, usefulness for a given informational purpose.

Oh, and in this context, see my AllAnalytics posts and the exchanges with readers: 

Knowing What a Database Is
Unstructured Data

Tuesday, September 18, 2012

Object Orientation, Logic and Database Management



WS writes:
On the subject of object orientation I often feel inclined to quote Leslie Lamport's comment in the introduction to his book "Specifying Systems": "If exposure to C++ hasn't completely destroyed your ability to think logically, you should have no trouble filling in the gaps in your mathematics education".
I tried for many years to understand OO, but without success. It didn't occur to me until I encountered the material on dbdebunk that the problem might not lie with me. OO's lack of a formal definition leads, I think inevitably, to what I have come to call PALC (Pointless Additional Layers of Complexity). Every addition to OO leads to new problems that require ever more complicated solutions, that in turn create a whole new set of problems.

Monday, September 17, 2012

To Laugh or Cry?




Is it really necessary to normalize your databases in third normal form?, social.technet.microsoft.com

Quote of the Week



The products commonly known as Oracle, Exadata, DB2, Sybase, SQL Server, Teradata, Sybase IQ, Netezza, Vertica, Greenplum, Aster, Infobright, SAND, ParAccel, Exasol, Kognitio et al. all either are or incorporate relational database management systems, aka RDBMS or relational DBMS.

2. In principle, there can be difficulties in judging whether or not a DBMS is “relational”. In practice, those difficulties don’t arise — yet. Every significant DBMS still falls into one of two categories:

    Relational:
        Was designed to do relational stuff* from the get-go, even if it now does other things too.
        Supports a lot of SQL.
    Non-relational:
        Was designed primarily to do non-relational things.*
        Doesn’t support all that much SQL.

*I expect the distinction to get more confusing soon, at which point I’ll adopt terms more precise than “relational things” and “relational stuff”.
--Curt Monash

Weekly DBDebunk News



1. I have added
  • a "Link to this blog" button and code at the top right
  • URL, HTML and BB/Forum link codes at the end of each post
If you like the blog or posts, please use them to link to them. This being a new blog, it will help with visibility. Thanks.

2. Shortly I will switch the blog to the old domain, dbdebunk.com. You don't need to do anything as the blogspot will simply forward you to that domain.

3. For those interested in databases and gaming, there is a forum thread at

https://forums.eveonline.com/default.aspx?g=posts&m=1919633

about problems with EDK, an older 'massive multiplayer online' (MMO) game based on mySQL, the latter supposedly being broken by the application/user load. The discussion is about the new DBMS to be considered and the choice between SQL and NoSQL (MongoDB about which I wrote in my three-part article, is mentioned).

I do not presume to know the technicalities and terminology of gaming systems and I am not in a position to judge whether current SQL systems can adequately satisfy gaming in general and EDK in particular. Perhaps readers who have knowledge in this area can enlighten us.

But if you read the exchange carefully you can detect the familiar sources of the (misleading) attraction of NoSQL: lack of familiarity with history and foundation knowledge, developers not transcending their application view of databases, the logical-physical confusion, the failure by SQL and its implementations to adhere to the relational model and so on. There is one developer with foundation knowledge, though, who is readily discernible.

We are regressing decades. I may not be around when somebody will have to "discover" the relational model in order to address the very same problems that Codd  thought he did 40+ years ago.

4. Matt Rogish has submitted this link: 

Why I Migrated Away From MongoDB
which I may comment on later.

Friday, September 7, 2012

Forward to the Past: "Out-clevering" the DBMS



When I see a title like "Software Engineering: What are some clever programming techniques that aren't used often enough?" my antennae come up. More often than not (1) programmers consider certain things clever only because they are not familiar with history (2) in its quest for solving problems engineering sometimes tries to out-clever science, which usually spells trouble.

Wednesday, September 5, 2012

Quote of the Week



Relational databases ... had poisoned whole generation of programmers thinking (incl. me). With no clue we approached everything through relational prism that we studied in university. MySQL was the only OSS alternative that cut enough corners to make relational DBMS fit the problems we tried to solve. For which the relational paradigm wasn't a good fit anyways.

So. then comes along the NoSQL stuff and you find out that you have much better tools than relational for these kinds of problems (simple websites, blogging sites, feed sites, social sites, etc.). Along the way you might see all the other problems that really need realational (ERP systems etc.). But now you need real relational not some half-baked all corners cut MySQL stuff. Yeah that also goes for InnoDB which might have become something if Oracle hadn't freezed it.

Then you try out Postgres (because it's free) and when it works for big workloads (yes it does in skilled hands) then you are hooked. If by chance you then check out Oracle ... it feels like a twelve ton dinosaur ("erm. no boolean type? whaat - implicit commit with DDL?"). If by any miracle you don't have the 1% of projects that need something special - Oracle RAC or Spatial or whatnot. Then Postgre is the most mainstream option available - there just is no other as popular, as functional and as cheap solution for these kinds of problem domains that really require relational DBMS.

PS: been using MySQL (both MyISAM/InnoDB) over 10 years, PostgreSQL over 5 years and Oracle over 5 years. --Hipsters Hacking on PosgreSQL, theRegister.com
View My Stats