Saturday, February 25, 2017

This Week



1. What's wrong with this picture

"A quick-and-dirty definition for a relational database might be: a system whose users view data as a collection of tables related to each other through common data values.The whole basis for the relational model follows this train of thought: data is stored in tables, which are composed of rows and columns. Tables of independent data can be linked, or related, to one another if they each have columns of data that represent the same data value, called keys. This concept is so common as to seem trivial; however, it was not so long ago that achieving and programming a system capable of sustaining the relational model was considered a longshot with limited usefulness." --Kevin Kline, SQLBlog.com

Sunday, February 19, 2017

Simple Domains and Value Atomicity



09/19/23: For the latest on this subject see: FIRST NORMAL FORM - A DEFINITIVE GUIDE

 

11/09/22: Revised


Here's what's wrong with last week's picture, namely:

Q: "I'm currently trying to design a database and I'm not too sure about the best way to approach a dynamically sized array field of one of my objects. My first thought is to use a column in my object to store an array of integers. However the more I read, the more I think this isn't the best option. Concrete example wise, I have a player object that stores 0 to many items, which are represented by an integer. What is the best way to represent this?" 
A: "If a collection of values is atomic, store them together. Meaning, if you always care about the entire group, if you never search for nested values and never sort by nested values, then they should be stored together as a single field value. If not, they should be stored in a separate table, each value bring a row, each assigned the parent ID (foreign key) of a record on the other table that "owns" them as a group. For more info, search on the term "database normalization".

Some databases, support an array as a data type. For example, Postgres allows you to define a column as a one-dimension array, or even a two dimension array. If your database does not support array as a type of column definition, transform you data collection into an XML or JSON support if your database your database supports that type. For example, Postgres has basic support for storing, retrieving, and non-indexed searching of XML using XPath. And Postgres offers excellent industry-leading support for JSON as a data type including indexed support on nested values. Going this XML/JSON route can be an exception to the normalization rules I mentioned above." --StackOverflow.com

Focus on physical implementation ("dynamically sized array field") without well-defined conceptual and logical features it is supposed to represent ("a player object" is hardly enough) and confusion of levels of representation (a real world object does not "store" anything) are always a red flag, an indication of poor grasp of foundation knowledge. So let's introduce some.

Sunday, February 12, 2017

This Week



1. What's wrong with this picture

"If a collection of values is atomic, store them together. Meaning, if you always care about the entire group, if you never search for nested values and never sort by nested values, then they should be stored together as a single field value. If not, they should be stored in a separate table, each value bring a row, each assigned the parent ID (foreign key) of a record on the other table that "owns" them as a group. For more info, search on the term "database normalization".

Some databases, support an array as a data type. For example, Postgres allows you to define a column as a one-dimension array, or even a two dimension array. If your database does not support array as a type of column definition, transform you data collection into an XML or JSON support if your database your database supports that type. For example, Postgres has basic support for storing, retrieving, and non-indexed searching of XML using XPath. And Postgres offers excellent industry-leading support for JSON as a data type including indexed support on nested values. Going this XML/JSON route can be an exception to the normalization rules I mentioned above." --Response to the Quote of the Week listed next, StackOverflow.com


Tuesday, January 31, 2017

Outsmarting the DBMS: Analysts Should Beware



Revised 5/4/2020.


Last month I alerted you to the failure by data professionals to appreciate the importance, for a variety of critical reasons, of reliance on the DBMS rather than application code for integrity enforcement and data manipulation. The following is an example of the consequences: 
"If you have multiple boolean fields in a record, consider combining them into a single Integer field. For instance in a User record create a single UserType field instead of 6 separate field for IsTrainee, IsManager, IsTrainer, IsHR, IsSupplier, IsSupport. By assigning 1,2,4,8 and 16, 32 as "yes" values for these then we can say that a value of 3 in this UserType field tell us that they are both Trainee and a Manager; 36 that they are the Trainer, and they are responsible for Support. The advantage of combining these into one field is that is another type can be added (e.g., IsFirstAider=64) without adding a field."
Note: "File, "record," and "field" are physical implementation concepts. The logical design concepts are relation (visualizable as R-table), tuple (visualizable as row) and attribute (visualizable as column). By using the proper terms there is less likelihood of confusion of levels of representation rampant in the industry, which has deleterious consequences[1].

Sunday, January 29, 2017

This Week



1. What's Wrong with This Database Picture?

"I have a database for a school ... [with] are numerous tables obviously but consider these:
CONTACT - all contacts (students, faculty) has fields such as LAST, FIRST, MI, ADDR, CITY, STATE, ZIP, EMAIL;
FACULTY - hire info, login/password for electronic timesheet login, foreign key to CONTACT;
STUDENT - medical comments, current grade, foreign key to CONTACT.
Do you think it is a good idea to have a single table hold such info? Or, would you have had the tables FACULTY and STUDENT store LAST, FIRST, ADDR and other fields? At what point do you denormalize for the sake of being more practical?What would you do when you want to close out one year and start a new year? If you had stand-alone student and faculty tables then you could archive them easily, have a school semester and year attached to them. However, as you go from one year to the next information about a student or faculty may change. Like their address and phone for example. The database model now is not very good because it doesn’t maintain a history. If Student A was in school last year as well but lived somewhere else would you have 2 contact rows? 2 student rows?  Or do you have just one of each and have a change log. Which is best?" --comp.databases.theory

Sunday, January 22, 2017

Are You a Thinking Data Professional?



Note: The following was intended as a comment to my post Don't Design Databases without Foundation Knowledge and Conceptual Models  by Todd Everett. He is a reader I deem a "thinking data professional" -- always the qualitative rather than quantitative target of my writings and teachings. It merits to be a post in its own right to benefit others.

View My Stats