Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.
Skyscrapers with Shack Foundations
(originally posted 06/04/2000)
“Well, it's really a judgment call and I think a lot of experience comes into it. It's a little bit like building a shack. Say you want to build a skyscraper, and you started out building a shack and you just keep trying to add onto it. After a while you have this severe structural problem ... So there is a fallacy to the build-upon-a-simple structure approach. Sometimes you get up to three stories and you have to do some major structural changes, and I just accept that.”
--Wayne Ratliffe, developer of dBase
“Client Servers were a tremendous mistake. And we are sorry that we sold it to you. Instead of applications running on the desktop and data sitting on the server, everything will be Internet based. The only things running on the desktop will be a browser and a word processor. What people want is simple, inexpensive hardware that functions as a window on to the Net. The PC was ludicrously complex with stacks of manuals, helplines and IT support needed to make it function. Client server was supposed to alleviate this problem, but it was a step in the wrong direction. We are paying through the nose to be ignorant.”
--Larry Ellison, CEO, Oracle Corp.
------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics
column. The site was discontinued in 2018. The content here is not available
anywhere else, so if you deem it useful, particularly if you are a regular
reader, please help upkeep it by purchasing publications, or donating. On-site
seminars and consulting are available.Thank you.
LATEST UPDATES
-12/24/20: Added 2021 to the POSTS page
-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page
LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations,
Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE
REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition
of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper
#1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my
latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide
Mauri).
USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or
acronyms of the terms listed on the FUNDAMENTALS
page. For detailed
instructions on how to understand and use the labels in conjunction with the
that page, see the ABOUT
page. The 2017 and 2016 posts,
including earlier posts rewritten in 2017 were relabeled accordingly. As other
older posts are rewritten, they will also be relabeled. For all other older
posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns
to dbdebunk, within which only links to sources external to AllAnalytics may
work or not.
SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as
To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on
LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the
only universally acceptable hatred left – as the (traditional) response to the
existential crisis of decadence and decline of Western civilization
(including the US).
- @ThePostWest on Twitter where I comment on global
#Antisemitism/#AntiZionism and the Arab-Israeli conflict.
------------------------------------------------------------------------------------------------------------------
The IT industry -- its database sector in particular -- operates like the fashion industry: it is driven by fads. And more often than not it profits from the accelerated obsolescence which characterize fads. DBMS vendors (Oracle's CEO in particular) were "wrong" more than once before. But it's the users, not the vendors, who paid through the nose when the industry, with help from the trade media, hypes and obscures unsound products and practices -- "new" proves frequently to be nothing but old we already discarded with new labels. The Internet and the browser are just the latest fad in a long series and are as much a panacea for information management as the PC, SQL, client/server, object orientation, "universal" and multidimensional DBMSs, data warehouses, and data mining were before them, which were preached with equal fervor.
The fact is that sound database technology and practices are prerequisites for effective and efficient information management, whether Internet-based or not. Sadly, however, the database field is in disarray, with Internet practices being not lesser, but actually worse offenders than preceding panaceas. While this is, to a degree, true of computing in general, in the database field the problems are so acute that -- claims to the contrary notwithstanding -- knowledge and, therefore, technology are actually regressing!
Even a cursory inspection of problems encountered in database practice reveals that most are due to the persistent failure by both DBMS vendors and users (DBAs and application developers) -- to educate themselves and rely on sound foundations. Abandonment of proper education makes fads and accelerating obsolescence possible in the first place! As Date explains in the Foreword to my book:
“SQL [DBMS] deficiencies are, it seems to me, directly due to the widespread lack of understanding (not least on the part of vendors), of fundamental database principles. Certainly it is undeniable that they flout those principles in numerous ways. And the practical consequences are all too obvious: First, users must understand where the deficiencies lie; second, they have to understand just why they are deficiencies; third, they have to understand how to work around them; and fourth, they have to devote time and effort in persuading the vendors to remedy them. The trouble is, of course, users too tend to be unaware of those same fundamental principles and, hence, find themselves unable to carry out their side of the "contract" (a "contract" that should not have been allowed, or agreed to in the first place, of course). It's a vicious cycle. What is more, this sad state of affairs is not likely to change, given the apparent lack of interest on the part of the trade press--itself ignorant of those same principles--in trying to improve matters.”
Consider, for example, the following two examples, one of a novice:
“I need to store 40 pieces of unrelated information. Is it better to create [one] table w[ith one] record [and] 40 fields, or create [one] table w[ith] 40 records [and one] field?”
The other of a consultant assessing a database constructed, supposedly, by experienced professionals:
“... finished testing a--gasp, choke--COBOL program for a software company whose main product is a well-known government contract accounting system ... Now th[e expletive deleted] database ... is replete with repeating groups, redundant fields, etc. On top of all that, because it is one of the central files to the entire system, there are literally hundreds of rules and relationships, all of which must be enforced by the dozens of subprograms that access it. I found so many violations of so many of these rules in this new subprogram, that I filled five single-spaced pages with comments and suggestions. And I probably missed [the more obscure problems]. Several [such problems], perhaps.”
They are not exceptions, but the rule. What should be obvious is that:
- The problems involve database (not application!) issues, and fundamental issues at that;
- They are common to any and all DBMSs and databases;
- Their consequences are hardly theoretical and quite severe.
- No amount of experience with DBMS products and development tools on any hardware platforms can, in itself, address them.
Yet it is practically impossible to get the attention of vendors, practitioners, or the trade media to anything other than product-specific cookbooks. Examples:
“I polled our [user group] membership last night about future topics. For the foreseeable future, we prefer to focus on Microsoft SQL Server 2000 topics exclusively.”
“I don't disagree with your statement that the "lack of attention to data fundamentals can cause horrendous problems" ... The major problem, as I see it, is that [database orient]ation ... is not what the user group is about. Yes, database design and use is definitely a part of our world, but our focus is on Sybase's development tools, such as PowerBuilder, PowerJ, Enterprise Application Server, etc.”
Education vs. Training
The fad-driven, tool-focused cookbook approach to data management is due in large part to the business culture in general, and the way in which IT practitioners (distinct from professionals!) are inducted into the field in particular. A vast majority is self-taught and start their database practice via use of some specific DBMS software (e.g. Oracle, Access, SQL Server) and tools (frequently imposed on them by their employer). Having not been exposed to fundamentals (principles, methods, theories), practitioners are either unaware of them, assume that they are acquired implicitly or, most commonly, deem them "just theory" rather than sound foundations. These fallacious perceptions are exacerbated by a growing generation of Internet practitioners who know little beyond HTML, Java and XML (not even DBMS or tool software), and who, therefore, think that's all there is to know.
But nothing different should be expected. The technical qualification for practically all database positions is mainly programming and at best some DBMS and development toolson specific platforms (hardware and operating systems), nothing else. Examples:
“Senior Database Architect
Qualifications: Minimum of 3 years with Oracle on Solaris. Working knowledge of Tuxedo. Use of database design tools such as ER/Win. Perl and scripting. Familiarity with Oracle 8, Oracle Parallel Server, Sun Clusters, C. At least 3 years of relevant experience.”
“Database Analyst III
Experience: Five to nine years developing applications using a major industry-standard relational database system (e.g., Oracle, Sybase, Ingres). Necessary Skills: Oracle DBMS Server and Oracle Application (Web) Server on Windows NT Server; Designer 2000; Developer 2000; Oracle Reports; Oracle Graphics; and PL/SQL. Also a plus: experience with UNIX, VMS, SQR, HTML, JAMA, or JavaScript.”
Not only isn't foundation knowledge -- as distinct from sheer experience with tools -- a requirement, but it is often a liability. Conceptual modeling and database design are bundled together with database administration, application development and physical implementation and assigned to the (mythical) position of "programmer/analyst", without realizing that they require fundamentally different skills and knowledge sets and orientation, which are rarely found in one person, and inherently interfere with one another (particularly with currently flawed DBMS products). If you wanted to build a house, would you hire a building contractor to design it?
Note: Actually programming can be a liability when it comes to database management.
Under industry pressure there is little database education to be had -- tool training and experience reign supreme and even academic computer science programs are becoming increasingly vocational in character. This is the response I got when offering to teach data fundamentals:
“We are very interested in additional Oracle instructors, if that is something you can teach.”
”Does (the course) cover accessing a database via CGI, i.e. VB, Java, Perl, C++ access to SQL Server or Access DB? We're a CS dept, so not so interested in the user-developer side of things.”
An analogy can serve to drive the perils of this state of affairs home. Suppose you must select a personal physician and have two candidates: one educated in, among other things, some anatomy, biology and chemistry, and one trained in identifying symptoms from a list and matching treatments from another. Chances are you will opt with the majority for the former rather than the latter, and for a very good reason: in the absence of knowledge and understanding of some health fundamentals, serious problems can be expected. This is generally clear in all applied fields with a scientific foundation except, it seems, database management.
Is there any wonder that practitioners, seasoned ones included, can't offer a useful definition of a database? That neither DBMS designers, nor technically proficient users have heard of crucial concepts such as data independence? That many believe that not only should duplicates not be prohibited, but that they are actually essential?
The consequences are visible all over the business -- and Internet -- world, databases and practically all DBMS products are riddled with flaws and unnecessary complications, but almost nobody can and does associate them with lack of foundation knowledge. Examples:
“You might ask what is wrong? Well, it is a client/server application, using a Sybase database (SQL Anywhere). The database server has a single login user DBA -- using the default password. Every application user connects to the database via this login level, and security is handled by the front end -- despite the fact that any semi-aware user could use MS Access to destroy any data. There are also about 300 tables in the database, with no indexes! Agreed there are primary key indexes created automatically by the database, but still... The front end is Visual Basic, which for me is OK, but there are at least three different data access methodologies, from ODBC-API to the latest ADO. But what is killing me, is that I seem to have been hired as a "bug-fixer", to me different than an engineer. They are in a position where release schedules are forcing a continual maintenance mode, rather than an admittedly necessary rebuild of some components.”
“In the short term you have two options a) disable referential integrity checking and make the change (not recommended unless you're willing to assume total responsibility for the data consistency checking yourself; and you have to ensure you have exclusive access to the database when you're doing this) b) use our [DBMSs] triggers and stored procedures to implement the referential integrity procedurally.”
A Vicious Cycle
Correcting this sad state of affairs is a nontrivial proposition, because it is a deep-seated, cultural and systemic vicious cycle that is extremely difficult to break. It is much easier (and profitable) to go with the flow, rather than uphill against it. Trade media, books, web sites, conferences, education programs and consultants ignore fundamentals, rely or employ exclusively on vendor sources and focus completely on product-specific "recipes", reinforcing, rather than combating the cookbook approach.
Database users need correct answers from databases. Yet the vast majority are unaware that, as Hugh Darwen states:
- A database is a set of axioms;
- The response to a query is a theorem;
- The process of deriving the theorem from the axioms is a proof;
- A proof is made by manipulating symbols according to agreed mathematical rules;
- A DBMS with a database is a deductive logic system: it derives facts (theorems) from database facts (axioms). The theorems are true (query results are correct) iff (1) the axioms (database facts) are true and (2) the derivation is logically valid and semantically consistent.
Neither are most practitioners aware that the truth of the database facts must be ensured by the users authorized to record them in the database, the database designer, and the DBMS semantic integrity enforcement, and the truth of the derived facts by the database manipulative function.
Because they are socialized into (and rewarded for) disregard for fundamentals as "theory" without practical value, practitioners are largely unaware that their tools and practices fail at these tasks. The result, as I have amply demonstrated and documented in my writings, is that a lot of what is being said, written, and especially done in the database management field -- whatever is left of it -- is increasingly confused, irrelevant, misleading, or outright wrong.
The above editorial evoked the following email to the editor and exchange:
“I just finished reading your article Skyscrapers With Shack Foundations and I just wanted to express my admiration. It's refreshing to read an article which addresses some of the strange attitudes I, for one, I find in the computer world, especially the seemingly absolute lack of understanding at a theory level. I tend to feel like the kid in the old fable "The Emperor's New Clothes" but just bite my lip because I don't want to be labeled a 'crank'.
I am one of 'them', a self-taught DB manager. I make no objection to your characterization of self-taught DB workers though, I agree completely. I came into the DB world with an extensive background in logic and set theory, so I was one of the lucky ones. At the risk of sounding like I'm bragging, I very quickly understood the concepts behind RDBMSs in general to a degree which a lot of experienced professionals seem to lack. Suffice to say that the two horror stories you mention in the article (the new designer with '40 pieces of unrelated data' and the experienced designer trying to clean up a pre-existing mess) are very familiar to me.
I don't want to be tacky by going into personal stuff too much, but I think you will see the relevance to your article here ... I'm searching for work now, after quite a few years working with a FoxPro-based database system. As the system was implemented ... well, 'implemented' might be too kind a word ... 'fragmented' comes closer ... as the company's system existed, there were parts in FoxPro, Excel, Access, older 'complete business systems', etc. What I quickly figured out was that the basic structure was, in all cases, the same. Furthermore, by building up my part of it and relying on SQL as much as possible, it was quite possible to build tools which would be portable across the then-irreducibly-separate parts of the system. I mean, all of them are just simply implementations of the SQL standard, which is basically just an implementation of set theory. They all work the same as far as actually relating data goes, they differ mostly in the GUI gadgets. (I know, from your point of view this is a crude simplification, but from where I was this was basically a revelation of the simplicity of the underlying logic.) [Editor Note: A very crude simplification and more, indeed]. Anyway, from that point it was fairly simple for me to integrate different parts of the system to my own, whether I needed to use Visual Basic, FoxPro scripting, Access, or whatever.
So I find myself looking for work, and potential employers are looking at my resume, seeing 'Foxpro' etc., and saying, "well, we don't use FoxPro databases, we use Microsoft SQL Server databases. Sorry." And I find myself trying to explain to them, as nicely as possible, because I'm trying to get a job from them and you're supposed to be nice to people you're trying to get a job from, "they all work basically the same, you see? The interface and tools vary, true, but the underlying dynamic remains the same. Tables of data with relations based on unique fields, etc." This was often met with a blank stare; in one case the response was "well ... yeah ... I guess ... if you've got ODBC drivers or something ... "
Anyway. I don't know if this will leave you nodding your head in agreement, or shaking your head in horror at my hackerish approach, but I was glad to see my own opinions mirrored in your article.”
Of course, SQL is not really relational, not all SQL DBMSs work the same and so on, but I agreed with his main thrust re IT industry and replied as follows:
“You may also want to check out my Against the Grain contrarian column.
[The admiration is well] deserved because [there is a horrendous] price to pay for calling a spade a spade -- not done much in this society -- and even your experience can't touch that [price]. We're in a very small minority, though.
You may be self-taught in db, but your background saved the day. Most practitioners don't have that. And, in fact, it's not their fault: this is an anti-intellectual society that not only does not reward independent, critical thinking, but actually punishes it! And it's pretty clear why: do you think there would have been acceptance, for example, of the result of the so-called [Bush-Gore] "election" if people were educated, informed, and could think critically and independently and reason? And would they have bought [what] corporations push?
I have many more relevant cases than [is required as evidence and that] I can handle). I wouldn't be saying all the things I say without lots of evidence.
That's why I [stopped looking for db work a long time ago]. And even if I had not, I wouldn't have gotten it. They don't want [thinkers -- that's "not practical" -- and certainly not critical thinkers -- that's dangerous] ... I refuse to fool myself and to believe that there are many db jobs around that will be [done right].
Clearly I'm in agreement. But it does not solve the problem, does it?”
Note on re-publication: Wayne Ratliff "accepted this", but did he ask tenants in the building if they do?
No comments:
Post a Comment