The term "distributed" is thrown around a lot these days. Hype
notwithstanding, just as with analytics and data science, distribution
in data management is nothing new.In fact, SQL vendors (IBM, Sybase, Ingres, Oracle) -- frequently
criticized today for non-scalability -- tackled distribution decades
ago. The non-relational systems preceding SQL were not amenable to it,
and SQL is the closest to the relational model the industry allows you
to get.
--------------------------------------------------------------------------------
I have been using the proceeds from my monthly blog @AllAnalytics to
maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been
discontinued. I appeal to my readers, particularly regular ones: If you
deem this site worthy of continuing, please support its upkeep. A
regular monthly contribution will ensure this unique material
unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so please take advantage of his generosity. Thanks.
---------------------------------------------------------------------------------
In Anatomy of a Data Management Project, I mentioned the Diaspora project,
which claimed to be a distributed alternative to Facebook. Developer
Sarah Mei writes, "Once you log in, Diaspora’s interface looks
structurally similar to Facebook’s... The main technical difference
between Diaspora and Facebook [which, she says, runs "on a single
logical server"], is invisible to end users: it's the 'distributed'
part."
I suspect the author means physical server. Remember my warning
against confusing levels of representation? A clear distinction between
physical and logical is particularly important insofar as distribution
is concerned.
What is of practical consequence is whether the details of the physical distribution on multiple computers are transparent
to users and applications, that is, they don't need to refer to those
details explicitly. Aside from simpler and easier development,
maintenance, and data access, when the distribution changes (e.g., more
computers are added or there is redistribution) existing queries and
applications continue to work unchanged. For such distribution independence,
all the necessary functionality -- transactions, consistency, recovery,
integrity, concurrency control, security, performance optimization, and
so on -- must be encapsulated in the DBMS, not left to developers in
each and every application.
More than a decade ago C. J. Date specified that DBMS functionality in 12 Objectives for Distributed Database Systems.
Insofar as users and applications are concerned, they all boil down to a
distributed DBMS behaving in all respects exactly like a
non-distributed DBMS. Alas, it would be an understatement to say that
this is a non-trivial task.
Relative to a non-distributed scheme, where the DBMS, the database, and
applications all reside on the same physical server, distributed schemes
involve either the database, or the database and the DBMS, operating
across several servers.
Satisfaction of the 12 objectives means that the local components are
treated both as databases and DBMSs in their own right, as well as
integrated dynamically into various combinations via communication and
cooperation, depending on user tasks. There are both central and local
database management. The more transparent the scheme, the more demanding
it is of DBMS designers, but the more flexible and easy it is for
users.
What this entails for data consistency, concurrency control, performance
optimization, security, data administration, and other database
functions is beyond the scope of this post. My aim is just to alert you
that "distributed" claims without details as to exactly what is
distributed and which objectives out of the 12 are satisfied, lack
information necessary to assess a product and its usefulness.
The claim that Diaspora's distribution is "invisible," therefore, is not
enough. Can you determine from the description above whether that claim
is valid?
No comments:
Post a Comment