Showing posts with label FK. Show all posts
Showing posts with label FK. Show all posts

Sunday, March 26, 2023

RELATIONSHIPS AND THE RDM V2 PART 1: RELATIONS & DATABASE RELATIONS



with David McGoveran

 

Note: This a multi-part re-write of a previous series intended, when completed, to replace it. In the meantime you can consult the old version -- there is nothing wrong with it.

No matter how much I demonstrate the absence of foundation knowledge in the IT industry, the mountain of evidence and reasoning is dismissed -- how can everybody be wrong and only Fabian Pascal be right?

Well, consider the following (from a college prof no less):
“... we are not modeling objects/entities/attribute ... at all in the relational model, [but] a bunch of relationships ... hence perhaps Codd was correct in calling it a "relation", a bunch of relationships ... Interesting that most people think of relationships as being the distinguishing characteristic of a relational model and it is not ... [it] has no relationships since Codd decreed that all relationships must be represented by foreign keys, which are exactly the same as "attributes ... What [other] type(s) of relationships can be explicitly and formally defined in a relational data model? Of course there are many other relationships which can be inferred, such as between an attribute and an entity identifier. Please give me a precise reference to where Codd spoke of relationships [differently than i]n his 1985 piece published in ComputerWorld, [where] he said that the only way to represent a relationship (between relations) was through explicitly stored values (i.e., attributes, foreign keys) ... In my personal understanding, a relation is defined as a set of tuples. Then ... "in the relational model every relation represents a relationship". And then a quote from Chen: "each tuple of entities ... is a relationship". If I use the first and the second statements - I can say that a relationship is a set of tuples. The third statement says that a relationship is a tuple. So far, is a relationship a set of an element of a set? (Or may be a set of sets?) ... I argue that there is essentially no difference between relationships between entities of distinct classes and between properties of the same class. They both represent relationships. A property can represent a relationship between entities of distinct classes. If such relationships are represented by foreign keys and the relations representing the classes must be in 1NF, then relational databases can represent only M:1 relationships, a very unnecessary limitation when modeling some reality of interest ... The entity-relationship model is essentially a directed graph model, where relationships are prominent residents. Not so in the relational model (despite the name), where relationships (between relations, mind you) are not visible and in the SQL implementations is reduced to constraints. Relationships are about structure, which is as important as meaning (the semantics of the terms used in the universe being modeled).” --LinkedIn.com

The amount of nonsense squeezed into this rambling is mind boggling. No understanding of the RDM, confusion, abysmal reasoning and misuse of terms -- debunking it in its entirety would be practically impossible (believe me, I tried). Instead, I focus on a critical aspect of the RDM of which there is little grasp in the industry: I convey the fundamentals and leave it to the motivated reader to try their own debunking -- the most effective way I know to learn.

Saturday, February 4, 2023

CONCEPTUAL MODELING, LOGICAL DATABASE DESIGN AND PHYSICAL IMPLEMENTATION (sms)



Note: In "Setting Matters Straight" posts I debunk online pronouncements that involve fundamentals which I first post on LinkedIn. The purpose is to induce practitioners to test their foundation knowledge against our debunking, where we explain what is correct and what is fallacious. For in-depth treatments check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs). Questions and comments are welcome here and on LinkedIn.

“A conceptual data model usually just includes the main concepts (entities) required to store information and the relationships that exist between these entities. We don’t usually include any details about each piece of information. We can consider the conceptual stage as an initial model, without all the details required to create a database.

A logical data model is probably the most-used data model. It goes beyond the conceptual model; it includes entities, relationships, details on entities’ different attributes, and unique ways to identify entities (primary keys) and establish the relationships between them (foreign keys).

A physical data model is usually derived from a logical data model for a particular relational database management system (RDBMS), thus taking into account all technology-specific details. One big difference between logical and physical data models is that we now need to use table and column names rather than specifying entity and attribute names. This allows us to adapt to the limits and conventions of the desired database engine. We also provide the actual data types and constraints that allows us to store the desired information.”
--Vertabelo.com

Monday, May 10, 2021

TYFK: What Domains Are and Are Not



Note: Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can review references -- reflecting the current understanding of the RDM, distinct from whatever has passed for it in the industry to date -- which explain and correct the misconceptions. You can acquire further knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

“...a Data Domain refers to all the valid values which a data element (column) may contain. The rule for determining the domain boundary may be as simple as a data type with a list of possible values. For example, a database table that has information about people, with one record per person, might have an "age" column. This gender column might be declared as a SMALLINT data type, and allowed to have a value between 0 and 120. The data domain for the age column is hence 0 - 120. In a normalized data model, the reference domain is typically specified in a reference table. Following the previous example, the age reference table could have exactly 120 records, one per allowed value. Reference tables are formally related to other tables in a database by the use of foreign keys. A better way would be to enforce the data domain through a check constraint. For example, the age column would require positive numeric values between 0 and 120. I have found that the best way to figure out all of your data domains and constraints is to spend some time designing and normalizing all of your tables.”
--Quora.com

Misconceptions

  • There are no tables and, thus, no columns in relational databases;
  • Domains are not (programming) data types;
  • It is not the data model that is normalized;
  • A referenced relation does not reference domains;
  • A SQL CHECK constraint is not "better enforcement" of a referential constraint;
  • Constraints are not determined BY logical design;
  • Logical database design does not involve explicit normalization (to 1NF) or further normalization to 5NF.

Fundamentals

  • Relational databases consist of relations with attributes defined on domains; tables with columns visualize relations with attributes, but play no part in the RDM.
  • A relational domain represents a real world property and is a database object under DBMS control and, thus, is distinct from a programming data type which is an application object under programmer control that may not represent anything in the real world.
  • 1NF (normalization) and 5NF (full normalization) are properties of relations (which comprise logical models), not of the data model (i.e., the RDM).
  • An attribute which is a foreign key in a referencing relation references a primary key which is an attribute  in a referenced relation.
  • A constraint can be expressed in syntactically different ways by a data sublanguage. The CHECK constraint is a syntactic alternative in SQL to declare referential constraints.
  • Database relations are semantically constrained to be consistent with (i.e., represent faithfully) the corresponding conceptual model. Properties and properties in context (i.e., of specific entity types) are identified during conceptual modeling. Domain and attribute constraints respectively are specified during logical design to ensure consistency with the properties and properties in context they represent in the database.
  • Database design that adheres the three principles mandated by the RDM produces 1NF and 5NF databases that do not require explicit normalization and further normalization.

Note: The difference between relational domains and programming data types are specified in Codd's RM/V2 book. SQL tables are not relations and SQL data types are not relational domains.


Recommended reading

Domains: The Database Glue

Understanding Domains and Attributes

The Interpretation and Representation of Database Relations





Sunday, May 10, 2020

TYFK: What Is A Database Relationship?



Note: This is a re-write of an earlier post. About TYFK posts (Test Your Foundation Knowledge) see the post insert below.

“Here two or more table[s] are related with each other. This is Database relationship. Database relationship is used a lot ... [in] relational database management systems ... shortly called RDBMS. Here is Join_data [sic] table and Interview_data table. For creating a relational database management system both of the table[s] must have a common field. Here Employee_ID is a common field ... Database relationship types: One-To-One relation, One-To-many relation, Many-to-many relation. Minimum one common field is essential in all the tables. The data type of common field and field size will be same in all the tables.”
First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Saturday, November 30, 2019

TYFK: 5NF, Association Relations and Join





Assume a conceptual model of a multigroup consisting of two related entity groups, Customers and Orders, where a customer can issue multiple orders. The conventional logical database design is:
CUSTOMERS
===============================================
| CID | NAME     | AGE | ADDRESS   | SALARY   |
-=====-----------------------------------------
|   1 | Ramesh   |  32 | Ahmedabad |  2000.00 |
|   2 | Khilan   |  25 | Delhi     |  1500.00 |
|   3 | Kaushik  |  23 | Kota      |  2000.00 |
|   4 | Chaitali |  25 | Mumbai    |  6500.00 |
|   5 | Hardik   |  27 | Bhopal    |  8500.00 |
|   6 | Komal    |  22 | MP        |  4500.00 |
|   7 | Muffy    |  24 | Indore    | 10000.00 |
-----------------------------------------------

ORDERS
===================================
| OID | DATE       | CID | AMOUNT |
-=====-----------------------------
| 102 | 2009-10-08 |   3 |   3000 |
| 100 | 2009-10-08 |   3 |   1500 |
| 101 | 2009-11-20 |   2 |   1560 |
| 103 | 2008-05-20 |   4 |   2060 |
-----------------------------------
where ORDERS.CID is an "embedded" foreign key (FK) referencing the primary key (PK) CUSTOMERS.CID.

Consider the query "For all orders, find the CID, name, OID, amount, and date" that applies a join of the two relations on CID. In SQL:

SELECT c.cid,c.name,o.oid,o.amount,o.date
FROM customers c
INNER JOIN orders o
ON c.cid = o.cid;
with the result displayed by the table:
====================================================
| C.CID | C.NAME   | O.OID | O.AMOUNT | O.DATE     |
-=======------------=======-------------------------
|     2 | Khilan   |   101 |     1560 | 2009-11-20 |
|     3 | Kaushik  |   102 |     3000 | 2009-10-08 |
|     3 | Kaushik  |   100 |     1500 | 2009-10-08 |
|     4 | Chaitali |   103 |     2060 | 2008-05-20 |
----------------------------------------------------
Note: A table is just a tabular display of a relation and the two should not be confused[1,2]. Bear in mind that SQL tables are not relations.

It may surprise you to know that both the design and the result are problematic from a relational standpoint.

Sunday, April 28, 2019

Understanding Data Modeling Part 3: OO/UML, and "Graph Data Models"




In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the industry's "data modeling" mess that Friesendal has tried to catalog. In Part 2 we applied this knowledge to the first two modeling approaches considered by Friesendal, the E/RM and RDM. We apply it here to other two, OO/UML and "GDM".


Object Orientation and Unified Modeling Language


“A "counter revolution" against the relational movement was attempted in the 90’s. Graphical user interfaces came to dominate and they required advanced programming environments. Functionality like inheritance, sub-typing and instantiation helped programmers combat the complexities of highly interactive user dialogs. The corresponding Data Modeling tool is the Unified Modeling Language ...”

Saturday, April 20, 2019

Understanding Data Modeling Part 2: "E/RM" and "RDM"




In Part 1 we presented some foundation knowledge with which to debunk misconceptions lurking in the industry's modeling mess that Friesendal has tried to map. We now proceed to apply it to the various industry "data models" considered by Friesendal, and his understanding thereof. In this part, we apply this knowledge to the first two industry "data models" considered by Friesendal -- the E/RM and RDM.


"Entity-Relationship Model"


“One of the first formal attempts at a framework for Data Modeling was the Entity-Relationship data model paradigm proposed [in 1976] by Peter Chen. Notice that in the original Chen-style, the attributes are somewhat independent and the relationships between entities are named and carry cardinalities ("how many" participants in each end of the relationship) ... Attributes are related to their "owner" entity" in what other people called "functional dependencies".”

Saturday, March 9, 2019

Fourth Order Properties Part 2: Association Relations in Database Design - An Example




Part 1 outlined fundamentals of fourth order properties (4OP) of a multigroup arising from relationships among its group members due to 1:1, M:1, M1:M2, and, generally, M1:M2:M3:...:Mn relationships among the groups' entity members. Fundamentals are commonly missing from database practice, as reflected in the exchange:

“We have Building, Room, and Bed entities. Logically, if this is in the scope of some hypothetical hotel, then each one of those entities is dependent on their parent to exist ... you cannot have a bed without a room. Also, that room wouldn't exist without its parent, Building. So, why have I rarely seen this identifying relationship introduced? When I was learning databases, everything was apparently "non-identifying". When is this type of relationship necessary, if at all? I see the issue arises when that BED can exist without a BUILDING. If you were to INSERT into the BED table, you are constraint [sic] to provide a building_id, as the building_id is part of that BED's primary key. Couldn't you avoid an identifying relationship by giving each table its own surrogate primary key? Is this the correct representation  of an identifying relationship? I could avoid that by just giving each table its own ID. At the end of the day, this is about IDENTIFYING relationships, not their existence, which is how I've been logically determining if something is an "identifying relationship" If that were the case, then any 1:N relationship could be "identifying" but that's not how you define identifying or non-identifying.”
“Interesting -- I’d never heard this term before. I’ve heard it referred to as a cached ID though, as that 2nd ID isn’t required, but may be beneficial for performance purposes. For this example with 3 levels it’s not a huge joint statement, but for some systems with 12 tables the joins get unpleasant. I’ve never started a system with this additional id, but I have added one later on once the need was there and the profiling led to this being the best solution for our specific situation. Usually though, just creating a view that does the joins for me has been easier. I’ll be curious what has led others to use this approach.”
It's not really introduced because it's way more towards academic than functional.”
--Reddit.com
Knowledge of the fundamentals would have obviated the question, the ad-hoc terminology, and the answers. Note in particular how -- notwithstanding the conceptual and logical nature of the question -- the first answer typically delves directly into implementation[1]. The second answer does not merit attention, except as indicator of the sad state of the industry.

Given a conceptual model, we shall now compare the database design proposed in the question with (1) conventional industry practice that includes "embedded foreign keys", and (2) assuming a true RDBMS, the unified representation using association relations of not just M1:M2, or, generally, M1:M2:M3:...:Mn 4OP relationships, but also the 1:1 and M:1 special cases[2].


Saturday, March 2, 2019

Fourth Order Properties Part 1: Association Relations vs. Foreign Keys




 “We have Building, Room, and Bed entities. Logically, if this is in the scope of some hypothetical hotel, then each one of those entities is dependent on their parent to exist ... you cannot have a bed without a room. Also, that room wouldn't exist without its parent, Building. So, why have I rarely seen this identifying relationship introduced? When I was learning databases, everything was apparently "non-identifying". When is this type of relationship necessary, if at all? I see the issue arises when that BED can exist without a BUILDING. If you were to INSERT into the BED table, you are constraint [sic] to provide a building_id, as the building_id is part of that BED's primary key. Couldn't you avoid an identifying relationship by giving each table its own surrogate primary key? Is this the correct representation  of an identifying relationship? I could avoid that by just giving each table its own ID. At the end of the day, this is about IDENTIFYING relationships, not their existence, which is how I've been logically determining if something is an "identifying relationship" If that were the case, then any 1:N relationship could be "identifying" but that's not how you define identifying or non-identifying.”

“Interesting -- I’d never heard this term before. I’ve hears it referred to as a cached ID though, as that 2nd ID isn’t required, but may be beneficial for performance purposes. For this example with 3 levels it’s not a huge joint statement, but for some systems with 12 tables the joins get unpleasant. I’ve never started a system with this additional id, but I have added one later on once the need was there and the profiling led to this being the best solution for our specific situation. Usually though, just creating a view that does the joins for me has been easier. I’ll be curious what has led others to use this approach.”

“It's not really introduced because it's way more towards academic than functional.”
--Reddit.com

Such questions, and ad-hoc terms like "identifying relationships"[1] come up because practice is driven by intuition and experience (if any), without the benefit of foundation knowledge[2]. Whether practitioners know/like it or not, a database is a formal computable representation of an informal conceptual model[3] and, therefore, data modeling (i.e., logical database design)[4] is impossible without (1) a well-defined and complete conceptual model and (2) a formal data model with which to formalize it as a logical model[5]and the two should not be confused[6]. Otherwise all bets are off.

Here's how foundation knowledge should have informed modeling and design.

Sunday, August 5, 2018

No Such Thing As "Primary Key Tuning"




“The choice of good InnoDB primary keys is a critical performance tuning decision. This post will guide you through the steps of choosing the best primary key depending on your workload ... You would be surprised how many times I had to explain the importance of primary keys and how many debates I had around the topic as often people have preconceived ideas that translate into doing things a certain way without further thinking.”
--Yves Trudeau, Principal architect, Percona.com

I will be labeled "pedantic" and a "purist" for saying this, but there is no such thing as "PK tuning". "Choice of good PK for performance" reflects logical-physical confusion (LPC), which, in turn, is rooted in lack of foundation knowledge, and failure to grasp the RDM -- not just by database practitioners, but even by DBMS designers.

Sunday, June 17, 2018

Foreign Keys Part 2: Beware of Misconceptions




Note: This is the second part of a multipart re-write of several older posts to bring them into line with the McGoveran formalization and re-interpretation of Codd's real RDM, including revisions, refinements, and extesions of his own[1].

(Continued from Part 1)

Part 1 started with an online exchange triggered by the question “Do I Have to Use Foreign Keys? If I am already manipulating data properly, are foreign keys required? Do they have another purpose that I’m just not aware of?” Both the question and the replies exhibit misconceptions about FKs (there are misconceptions about almost everything in the RDM[2]) rooted in lack of foundation knowledge, so we provided some FK fundamentals. We are now in a position to debunk the replies.


Sunday, June 10, 2018

Foreign Keys Part 1: Understanding the Fundamentals




Note: This is the first part of a two-part re-write of several earlier posts, to bring them into line with McGoveran's formalization and re-interpretation of Codd's true RDM, which includes his own corrections, refinements and extensions[1]. For a more in-depth treatment see the series of papers available here.
“Do I Have to Use Foreign Keys? If I am already manipulating data properly, are foreign keys required? Do they have another purpose that I’m just not aware of? I appreciate the guidance!”
“... [we] wish to make a point. There is something which is bad design/good design/mandatory/optional. Please stop insisting that Primary and Foreign keys are mandatory. They are good design habits but by no means mandatory. However, life is much more complex than a Normalized DB structure. This includes tables serving as event logs; tables, serving as User maintained materialized query tables, tables, serving as supporting structures, reflecting state of complex transactional databases; persistent tables serving as Result Set or Session keepers. And I personally believe that if they were truly mandatory, Sybase, Oracle, SQL Server, Ingres, DB2, etc. would require them. Oh, sorry, forgot the SQL standard itself. This is not the relational model we're talking about. These are commercially available RDBMSs which, not surprisingly, DO tend to listen to their customers. If they didn't, they wouldn't be in business!! Since Sybrand is unlikely to get FKs required by the SQL standard or the major RDBMS vendors, it seems that mandatory means that his answer to the question "Do I have to use foreign keys?" is "You would if you worked in my shop!". I'm inclined to agree with that.”
“Databases can work with or without primary keys and foreign keys. The choice is yours... However ... enforcing referential integrity can be done by many methods ... TMTOWTDT = There is more than one way to do this ... It all depends on your approach... In the last ten years... every one is enforcing referential integrity with help of primary and foreign keys but before this ... a lot of applications were working without primary and foreign keys to enforce referential integrity and to avoid orphaned rows/avoid duplicate records.”
“We don't have every possible logical relationship enforced by the database. Sometimes you have to compromise for performance reasons, as too many foreign key validations can slow down high volume inserts. Other times you have to create breakpoints just to keep the web of relationships from becoming too tangled and connecting hundreds or thousands of tables.”
“I think it is preferable to have FK constraints as an additional security layer and they can be disable[d] during loading if required; however, you need to be 100% certain your ETL is enforcing the constraints. It is best to do both - have the ETL reject records which fail FK checks and report on these whilst also enforcing FK intergrity on insert/update, if appropriate. The only additional thing I can add is - when you delete from a FK enabled DB, make sure you do it in the correct order.”
“I think, you have to learn about data structures and logical data design (not only database, which is nowadays interpreted mainly as only RDBMS), to be clear about usage primary, alternate, and foreign keys, normal forms, data integrity-and database integrity, because your database will work suboptimally without these knowledge if it will work at all.”
Data practitioners have high levels of tool knowledge, but lack a good grasp of fundamentals, for which reason they cannot be considered data professionals. Now, do not get me wrong: I do not mean that good knowledge of tools is unimportant -- if you work with them you gotta know them real well -- but the ability to fully assess them, use them optimally, and compensate for any shortcomings is limited in the absence of foundation knowledge. So let's have some before tackling the exchange.

Sunday, March 19, 2017

New Paper: The Interpretation and Representation of Database Relations



The data management field cannot and will not progress without educated and informed users. Recently I announced UNDERSTANDING THE REAL RDM, a new series of papers that will
  • Offer to the data practitioner an accessible informal preview of David's work.
  • Contrast it with the the current common interpretation that emerged after EFC's passing and to demonstrate the practical implications of the differences.

Monday, January 16, 2017

Don't Design Databases Without Foundation Knowledge and Conceptual Models




"I have two tables, one is product which is a parent table with one primary key and I have another child table of product, which is a product_details table. But the child table is linking with parent table(product) with logical data instead of foreign key,as we are doing this relationship with the help of java code in the coding side, instead of depending on the data base, which make it as tight couple. To avoid tight coupling between the tables we are storing the primary key value in the child table.
CREATE TABLE `tbl_product` (
  `product_id` varchar(200) NOT NULL,
  `product_details_id` varchar(200) DEFAULT NULL,
  `currency` varchar(20) DEFAULT NULL,
  `lead_time` varchar(20) DEFAULT NULL,
  `brand_id` varchar(20) DEFAULT NULL,
  `manufacturer_id` varchar(150) DEFAULT NULL,
  `category_id` varchar(200) DEFAULT NULL,
  `units` varchar(20) DEFAULT NULL,
  `transit_time` varchar(20) DEFAULT NULL,
  `delivery_terms` varchar(20) DEFAULT NULL,
  `payment_terms` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`product_id`));

CREATE TABLE `tbl_product_details` (
  `product_details_id` varchar(200) NOT NULL,
  `product_id` varchar(200) DEFAULT NULL,
  `product_name` varchar(50) DEFAULT NULL,
  `landingPageImage` varchar(100) DEFAULT NULL,
  `product_description_brief` text CHARACTER SET latin1,
  `product_description_short` text CHARACTER SET latin1,
  `product_price_range` varchar(50) DEFAULT NULL,
  `product_discount_price` varchar(20) DEFAULT NULL,
  `production_Type` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`product_details_id`),
  UNIQUE KEY `product_id` (`product_id`));
Please suggest the Pros and Cons of the design, we are following this kind of relationship in my company, as the manager is saying it will give [us flexibility]. I know that if we lose the data from the table, we can't know the relationship between the two tables."--StackExchange.com

View My Stats