Showing posts with label FD. Show all posts
Showing posts with label FD. Show all posts

Sunday, April 30, 2023

RELATIONSHIPS AND THE RDM V2 Part 3: SEMANTIC CONSTRAINTS



Note: This is a multipart re-write of a previous series that, when completed, is intended to replace it.

In Part 1 we documented the differences between mathematical and database relations (see table in Part 1). We attributed the fallacy that the RDM can express only one type of relationship -- between relations using FKs -- to the industry being unaware of the adaptation of math relations for database management. We intimated that some of the additional features of database relations express relationships other than between relations.

In Part 2 we identified the intra-group c-relationships (and the corresponding within-relation l-relationships) in our approach to conceptual modeling:

  • Properties-entities relationships

- general dependencies

  • Properties Relationships
  • Entities Relationships

- entity uniqueness
- functional dependencies (FD)
- entity supertype-subtypes relationships

and used a simple conceptual model (CM) of six entity groups to illustrate them:

Customers (cID, cname, FICO, discount)
Products (pID, pname, price)
Salesmen (sID, sname, sales, salary, commission)
Orders (oID, pID, cID, sID, date, amount)
Order Items (oID, iID, pID, quantity)

Database design is the use of a data model (DM) (here, RDM) to formalize conceptual models (CM) -- including c-relationships -- as logical models (LM) for database representation, so it must be able to convert the business rules (BR) that express those relationships in specialized natural language at the conceptual level to formal constraints in a FOPL-based data sublanguage at the logical level.

Our intention is to demonstrate that the RDM can express all c-these relationships, but we face a difficulty.

Sunday, April 16, 2023

RELATIONSHIPS & THE RDM V2 PART 2: INTRA-GROUP RELATIONSHIPS



 

Note: This is a multi-part complete re-write of a previous series which, when completed, is intended to replace it.

In Part 1 we attributed the fallacy that the RDM can express only one type of relationship -- between relations, using FKs -- to practitioners being unaware of the adaptation of math relations to database management and missing the additional features of database relations. We documented the differences in features between math and database relations (see the table in Part 1) and intimated that some of the additional features express relationships other than those between relations using FKs (which we leave out in this discussion).

In this Part 2 we identify the c-relationships and use a simple conceptual model (CM) of six entity groups:

Customers (cID, cname, FICO, discount)
Products (pID, pname, price)
Salesmen (sID, sname, sales, salary, commission)
Orders (oID, pID, cID, sID, date, amount)
Order Items (oID, iID, pID, quantity)

to illustrate them (to recall, we prepend 'relationship' with c- and l- when we use the term at the conceptual and logical levels, respectively).

Friday, December 2, 2022

NOBODY UNDERSTANDS FURTHER NORMALIZATION 4 (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

In Part 3 we set the matter straight about normalization to 1NF. In this part we do it wit respect to further normalization to 5NF. Non-1NF relations (i.e., with relation-valued attributes) are no longer part of industry practice, so we focus on 2NF-5NF violations. The term further normalization originates with Codd, who initially thought 1NF was sufficient and 2NF-5NF were discovered later (hence, further = beyond 1NF). The industry lumps both under normalization, but the two are distinct (e.g., only further normalization involves redundancy).

What's right/wrong with the following?

“So, what is this theory of normal forms? It deals with the mathematical construct of relations (which are a little bit different from relational database tables). First, second, and third normal forms are the basic normal forms in database normalization. Normalization in relational databases is a design process that minimizes data redundancy and avoids update anomalies. Basically, you want each piece of information to be stored exactly once; if the information changes, you only have to update it in one place. The normalization process consists of modifying the design through different stages, going from an unnormalized set of relations (tables), to the first normal form, then to the second normal form, and then to the third normal form.”
--Vertabelo.com

Sunday, October 23, 2022

NOBODY UNDERSTANDS NORMALIZATION 2 (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

(Continued from Part 1)

What's right/wrong about this database picture?

“So, what is this theory of normal forms? It deals with the mathematical construct of relations (which are a little bit different from relational database tables). The normalization process consists of modifying the design through different stages, going from an unnormalized set of relations (tables), to the first normal form, then to the second normal form, and then to the third normal form.”
--Vertabelo.com

Misconceptions

  • All database relations are, mathematically, relations, but not all mathematical relations are database relations.
  • The tabular structure play practically no role in RDM.
  • In practice there is no normalization (to 1NF) and there should not be further normalization (to 5NF).
  • Further normalization does not go from 2NF sequentially through 3NF and 4NF to 5NF.

Saturday, October 8, 2022

NOBODY UNDERSTANDS NORMALIZATION 1 (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

What's right/wrong with this database picture?

“Normalization in relational databases is a design process that minimizes data redundancy and avoids update anomalies. Basically, you want each piece of information to be stored exactly once; if the information changes, you only have to update it in one place. The theory of normal forms gives rigorous meaning to these informal concepts. There are many normal forms. In this article, we’ll review the most basic:
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
There are normal forms higher than 3NF, but in practice you usually normalize your database to the third normal form or to the Boyce-Codd normal form, which we won’t cover here.”

                                                                     --Vertabelo.com

Monday, September 12, 2022

DATABASE DESIGN: THE STATE OF KNOWLEDGE IN THE INDUSTRY



Can you identify all the fallacies and misconceptions in the following online exchange? What is the elephant in the room?
Q: “I have done data normalization on dummy data and would like to know if I did it correctly. If it is done correctly, I would also like to ask two things below, because it is about 3NF.

1NF: This table should be 1NF. 

2NF: I selected composite key (userID and Doors) as they represent minimal candidate key and got three tables applying FD rule.

 

3NF: Applying the rule of transitive dependency on 1st table in 2NF, I got out 4 tables (showing only first two, because the last two remain unchanged).

Questions: Is this database normalisation correct? If not could you point me where I did mistake? If answer on first question is True: Should the last table in 3NF be transformed into two tables, given it is not in correct Third normal form. Two non-key atributes have FD keycode -> accessGroup.”

Sunday, August 28, 2022

NOBODY UNDERSTANDS DATABASE DESIGN 1 (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

In a previous SMS post I debunked an attempt to express something important about database practice that was handicapped by lack of foundation knowledge. Here is another example.

“This Codd guy might have been onto something. Unfortunately, normalization is usually taught in a somewhat backwards, overly technical way. If you start with concepts, connections between them and details about them, you usually are already at a fairly high normal form without going through any formal normalization steps.”
--LinkedIn.com

Sunday, September 19, 2021

TYFK: Calculated Attributes -- Redundancy, Full Normalization and Relational Theory



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“If you have shopping cart, you probably have some field "TOTAL" somewhere that stores the final amount due for the customer. It so happens that such a thing violates relational theory...”

“Having a "TOTAL" field in your "order" table *might* violate relational theory, but if you make it so that only a trigger can update it based on what's in your "order_item" table, then I think it's fine. You still get data integrity and that is what matters.”

“I still fail to see what you mean by the "calculated TOTALS field" (attribute, really) violates the Relational Model.”

“The result of having the field ... is what is called a DELETE ANOMALY.”

“Most denormalizing means adding columns to tables that provide values you would otherwise have to calculate as needed.”

“There are four practical problems with a fully normalized database, three of which I have listed before. I will list them all here for completeness:
* No calculated values. Calculated values are a fact of life for all applications, but a normalized database lacks them. The burden of providing calculated values must be taken up by somebody somehow. Denormalization is one approach to this, though there are others.
--Database Programmer blog

“...I'm now working with IT to normalize part of the database to remove calculated fields...:
`lineitems`.`extended total` = `lineitems`.`units` * `biditems`.`price`.
`jobs`.`jobvalue` = the sum of related `lineitems`.`extended total` records
`orders`.`ordervalue` = the sum of related `jobs`.`jobvalue` records.”
--mySQL.com

Do calculated attributes (not fields!) violate relational theory and must be "normalized" out of them? Determining that requires foundation knowledge that is scarce in the industry, which has a poor and outdated understanding of the RDM.

Tuesday, August 31, 2021

TYFK: Normalized, Fully Normalized, Non-Normalized, Denormalized -- Clearing the Mess



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“A non-normalized database is a disorganized one, where nobody has bothered to work out where the facts should be stored. It is like a stack of paper files that has been tossed down the stairs. We are not interested in non-normalized databases.

A normalized database has been organized so that each fact is stored in exactly one place (2nf and greater) and no more than one fact is stored in each place (1nf). In a normalized database there is a place for everything and everything is in its place.

A denormalized database is a normalized database that has had redundancies deliberately re-introduced for some practical gain. Most denormalizing means adding columns to tables that provide values you would otherwise have to calculate as needed. Values are copied from table to table, calculations are made within a row, and totals, averages and other aggregrations are made between child and parent tables.”
--database-programmer.blogspot.com

Thursday, June 10, 2021

RE-WRITE



See: https://www.dbdebunk.com/2023/08/entities-properties-and-codds-sleight.html

Thursday, August 20, 2020

TYFK: Relations, Tables, Domains and Normalization



Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:

  • Data is stored in tables called relations.
  • Relations can be normalized. In normalized relations, values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same domain.”

Monday, July 20, 2020

OBG: Data Independence and "Physical Denormalization"




Note: I am re-publishing some of the articles and reader exchanges from the old DBDebunk (2000-06). How well do they hold up -- have industry knowledge and practice progressed? Judge for yourself and appreciate the difference between a sound foundation and the fad-driven cookbook approach.


January 2, 2001

  "... one of the "4 great lies" is "I denormalize for performance." You state that normalization is a logical concept and, since performance is a physical concept, denormalization for performance reasons is impossible (i.e., it doesn't make sense). What term would you use to describe changing the physical database design to be different from the logical design to enhance performance? Because normalization is a logical concept, you imply that this is not called denormalization."

Saturday, November 30, 2019

TYFK: 5NF, Association Relations and Join





Assume a conceptual model of a multigroup consisting of two related entity groups, Customers and Orders, where a customer can issue multiple orders. The conventional logical database design is:
CUSTOMERS
===============================================
| CID | NAME     | AGE | ADDRESS   | SALARY   |
-=====-----------------------------------------
|   1 | Ramesh   |  32 | Ahmedabad |  2000.00 |
|   2 | Khilan   |  25 | Delhi     |  1500.00 |
|   3 | Kaushik  |  23 | Kota      |  2000.00 |
|   4 | Chaitali |  25 | Mumbai    |  6500.00 |
|   5 | Hardik   |  27 | Bhopal    |  8500.00 |
|   6 | Komal    |  22 | MP        |  4500.00 |
|   7 | Muffy    |  24 | Indore    | 10000.00 |
-----------------------------------------------

ORDERS
===================================
| OID | DATE       | CID | AMOUNT |
-=====-----------------------------
| 102 | 2009-10-08 |   3 |   3000 |
| 100 | 2009-10-08 |   3 |   1500 |
| 101 | 2009-11-20 |   2 |   1560 |
| 103 | 2008-05-20 |   4 |   2060 |
-----------------------------------
where ORDERS.CID is an "embedded" foreign key (FK) referencing the primary key (PK) CUSTOMERS.CID.

Consider the query "For all orders, find the CID, name, OID, amount, and date" that applies a join of the two relations on CID. In SQL:

SELECT c.cid,c.name,o.oid,o.amount,o.date
FROM customers c
INNER JOIN orders o
ON c.cid = o.cid;
with the result displayed by the table:
====================================================
| C.CID | C.NAME   | O.OID | O.AMOUNT | O.DATE     |
-=======------------=======-------------------------
|     2 | Khilan   |   101 |     1560 | 2009-11-20 |
|     3 | Kaushik  |   102 |     3000 | 2009-10-08 |
|     3 | Kaushik  |   100 |     1500 | 2009-10-08 |
|     4 | Chaitali |   103 |     2060 | 2008-05-20 |
----------------------------------------------------
Note: A table is just a tabular display of a relation and the two should not be confused[1,2]. Bear in mind that SQL tables are not relations.

It may surprise you to know that both the design and the result are problematic from a relational standpoint.

Sunday, August 25, 2019

Meaning Criteria and Entity Supertype-Subtypes Relationships




Note: This is a re-write of a previous post.
"I have a database for a school ... [with] numerous tables obviously, but consider these:
CONTACT - all contacts (students, faculty): has fields such as LAST, FIRST, ADDR, CITY, STATE, ZIP, EMAIL;
FACULTY - hire info, login/password, foreign key to CONTACT;
STUDENT - medical comments, current grade, foreign key to CONTACT."
"Do you think it is a good idea to have a single table hold such info? Or, would you have had the tables FACULTY and STUDENT store LAST, FIRST, ADDR and other fields? At what point do you denormalize for the sake of being more practical? What would you do when you want to close out one year and start a new year? If you had stand-alone student and faculty tables then you could archive them easily, have a school semester and year attached to them. However, as you go from one year to the next information about a student or faculty may change. Like their address and phone for example. The database model now is not very good because it doesn’t maintain a history. If Student A was in school last year as well but lived somewhere else would you have 2 contact rows? 2 student rows?  Or do you have just one of each and have a change log. Which is best?"
How would somebody who "does not know past, or new requirements, modeling, and database design" and messes with a working database just because "he heard something about (insert your favorite fad here)" figure out correct from bad answers? Particularly if the answers suffer from the same lack of foundation knowledge as the question?

Friday, May 31, 2019

Normalization and Further Normalization Part 1: Databases Representing ... What?




Note: This is a re-write of older posts (which now link here), to bring them into line with the McGoveran formalization, re-interpretation, and extension[1] of Codd's RDM.
“A particular bug-bear and a mistake that +90% of "data modelers" make, is analyzing "point in time" views of the business data and "normalizing" those values hence failing to consider change over time and the need to reproduce historic viewpoints. Let’s say we start with this list of data-items for a Sales-Invoice (completely omitting details of what’s been sold):
SALES-INVOICE
 {Invoice-Date,
  Customer-Account-ID,
  Customer Name,
  Invoice-Address-Line-1,
  Invoice-Address-Line-2,
  Invoice-Address-Line-3,
  Invoice-Address-Line-4,
  Invoice-Address-Postcode,
  Net-Amount,
  VAT,
  Total-Amount
 };
Nearly every time, through the blind application of normalization we get this ... there’s even a term for it -- it’s called "over-normalization":
SALES-INVOICE
 {Invoice-Date,
  Customer-Account-Id
   REFERENCES Customer-Account,
  Net-Amount,
  VAT,
  Total-Amount
 };

CUSTOMER-ACCOUNT
 {Customer-Account-Id,
  Customer-Name,
  Invoice-Address
   REFERENCES Address
 };

ADDRESS
 {Address-Line-1,
  Address-Line-2,
  Address-Line-3,
  Address-Line-4,
  Postcode
 };”
A measure of scarcity of foundation knowledge in the industry are the attempts to correct a plethora of common misconceptions[2] that suffer from the very misconceptions they aim to correct. One of the most common fallacies is confusion of levels of representation[3] that takes two forms[4]. We have written extensively about the logical-physical confusion (LPC)[5,6,7,8] underlying "denormalization for performance"[9], and the conceptual-logical conflation (CLC) that lumps conceptual with data modeling[10,11,12], inhibiting understanding that the latter is formalization of the former. 

Sunday, April 28, 2019

Understanding Data Modeling Part 3: OO/UML, and "Graph Data Models"




In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the industry's "data modeling" mess that Friesendal has tried to catalog. In Part 2 we applied this knowledge to the first two modeling approaches considered by Friesendal, the E/RM and RDM. We apply it here to other two, OO/UML and "GDM".


Object Orientation and Unified Modeling Language


“A "counter revolution" against the relational movement was attempted in the 90’s. Graphical user interfaces came to dominate and they required advanced programming environments. Functionality like inheritance, sub-typing and instantiation helped programmers combat the complexities of highly interactive user dialogs. The corresponding Data Modeling tool is the Unified Modeling Language ...”

Saturday, April 20, 2019

Understanding Data Modeling Part 2: "E/RM" and "RDM"




In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the industry's modeling mess that Friesendal has tried to map. We now proceed to apply it to the various industry "data models" considered by Friesendal, and his understanding thereof. In this part, we apply this knowledge to the first two industry "data models" considered by Friesendal -- the E/RM and RDM.


"Entity-Relationship Model"


“One of the first formal attempts at a framework for Data Modeling was the Entity-Relationship data model paradigm proposed [in 1976] by Peter Chen. Notice that in the original Chen-style, the attributes are somewhat independent and the relationships between entities are named and carry cardinalities ("how many" participants in each end of the relationship) ... Attributes are related to their "owner" entity" in what other people called "functional dependencies".”

Wednesday, January 9, 2019

Data and Meaning Part 3: Database Design




We have seen in Part 2 that the meaning of data in a database is the conceptual model that the database is intended to represent, namely (1) the three types of objects -- entities of multiple types that form entity groups that form a multigroup -- and (2) the business rules (BR) that specify their properties:
  • Properties in context (PiC) shared by entities of each type;
  • Collective group properties (i.e., relationships among entity group members);
  • Multigroup properties (i.e., inter-group relationships).
Often somebody produces one or more tables and asks if there's "anything wrong" with them,  or "if they are in some specific normal form and, if not, how to normalize them". This reflects lack of foundation knowledge. 

Tuesday, January 1, 2019

Data and Meaning Part 2: Types of Business Rules



 
Per Part 1, meaning is captured during conceptual modeling as information about objects of interest, specifically their properties (some of which are relationships), specified in business rules (BR). Because they are expressed informally in natural language, objects and BRs must be formalized into computable form. Data modeling (we prefer logical database design) uses a formal data model to formalize informal conceptual models as formal logical models for database representation: it assigns the meaning in the former to symbols and expressions in the latter[2]. Using the RDM:

  • Objects -- entities, entity groups, and multigroups -- formalize as tuples, relations, and databases, respectively;
  • Properties formalize as domains, and when associated with entities of specific types, as attributes;
  • Group and multigroup properties -- relationships among entities, and among groups[3] -- formalize as constraints on and among relations enforceable by the DBMS.

Tuesday, September 18, 2018

Don't Conflate/Confuse Primary Keys, PK Constraints, and Indexes




“What is the difference between an index and a key? How are they related?”

“There seams to be some confusion between what a Primary Key is, and what an Index is and how they are used. The Primary Key is a logical object. By that I mean that is simply defines a set of properties on one column or a set of columns to require that the columns which make up the primary key are unique and that none of them are null. Because they are unique and not null, these values (or value if your primary key is a single column) can then be used to identify a single row in the table every time. In most if not all database platforms the Primary Key will have an index created on it. An index on the other hand doesn’t define uniqueness. An index is used to more quickly find rows in the table based on the values which are part of the index. When you create an index within the database, you are creating a physical object which is being saved to disk.”

“A primary key by default creates a clustered index. A unique constraint/key by default creates a non-clustered index.”

“An index is a (logically) ordered list of rows. For example, an index on LastName means all values are already sorted in LastName order. Usually index rows contain far fewer columns in them than the table itself (except the clustered index, which is the table). A key is a column or columns that defines the order of an index. For example, on an index ordered by (LastName,FirstName), then LastName and FirstName are the keys. Btw, a primary key is a physical object, not a logical one. The db engine needs physical rows in order to insure unique values in the index.”
--Difference between an index and a key?, SQLTeam.com
I have recently published a paper[1], and posted a multipart series[2] on relational keys. In the latter I stated as follows:
"As a relational feature, keys can only be properly understood within the formal foundation of the RDM, which is simple set theory (SST) expressible in first order predicate logic (FOPL) adapted and applied to database management. Yet that is precisely what is ignored and dismissed in the industry -- including by the authors of SQL[3]."
I have also written extensively on widespread logical-physical confusion (LPC)[4], recently specifically in the key-index context[5]. The replies above are examples -- if any more were needed -- that validate my repeated claim of lack of foundation knowledge in the industry -- can you tell what's wrong with, and what's correct in, them?
View My Stats