Data Modeling
Goals
Learn how to plan out databases
Discuss types of relationships between tables
Getting Started
Definition
Data modeling is the process of creating a representation (usually a diagram) of what data will be stored and how it will relate to other data.
General Process
figure out, broadly, what data is needed for the application to function correctly
separate that data out into groups that will eventually be tables
solidify the columns, data types, and relationships
How Much Detail?
Often times, you have to make decisions about how much detail the application needs—but it’s usually right to first model it out on paper in total. Start big and pare down as needed.
Normalization
Planning out a database to reduce repetition and simplify the data for understanding and maintenance. We can’t just store everything in one table.
Piling all of your data into a single table would be like stuffing all of your clothes into one dresser drawer. You want a swimsuit? Too bad, here is that shirt you’ve had since 8th grade.
The Problem
We need to store contact information in our database
Problem — some people have many phone numbers
Solution 1: Many Rows
user_id |
fname |
lname |
age |
phone1 |
---|---|---|---|---|
1 |
Jane |
Jay |
26 |
415 555 1212 |
2 |
Jane |
Jay |
26 |
415 555 1213 |
3 |
Nick |
Nest |
43 |
415 555 2000 |
Solution 2: Many Columns
user_id |
fname |
lname |
age |
phone1 |
phone2 |
---|---|---|---|---|---|
1 |
Jane |
Jay |
26 |
415 555 1212 |
415 555 1213 |
2 |
Nick |
Nest |
43 |
415 555 2000 |
<null> |
Two Primary Sins of Data Modeling
Repeating dependencies (for example, storing Jane’s name and age twice)
Representing multiple data for the same category in columns (phone1, phone2, …, phoneInfinity)
This lecture will address how to avoid these sins!
Normalization: In Short
“Every piece of information lives in exactly one place
and relates directly to the key.”
Relationships
Relationship Types
One-to-many
Many-to-many
One-to-one
Foreign Keys
In order to link tables to each other, we need to use values from one table inside the other. When we do so, that value is called a foreign key.
For the tables below, book_id in the printings table is a foreign key that references the books table book_id.
CREATE TABLE books (
book_id SERIAL PRIMARY KEY,
title VARCHAR(50)
);
CREATE TABLE printings (
printing_id SERIAL PRIMARY KEY,
book_id INTEGER NOT NULL REFERENCES books(book_id),
print_date TIMESTAMP
);
One-to-many
In one to many relationships, one row on table A can have many related rows on table B. And rows on table B can only relate to one row on table A. One user can have many posts. One product can have many reviews. One course can have multiple sessions.
The Problem
We have a database of books
Problem — we want to record each printing of a book
Can a book have many printings?
Can a printing have many books?
Diagram
One book, many printings
In Code
Using SQL, we can create one to many relationships. In the example below, each book should only be in the books table once, but its id can be referenced as many times as we want from the printings table.
CREATE TABLE books (
book_id SERIAL PRIMARY KEY,
title VARCHAR(50)
);
CREATE TABLE printings (
printing_id SERIAL PRIMARY KEY,
book_id INTEGER NOT NULL REFERENCES books(book_id),
print_date TIMESTAMP
);
Many-to-many
In many to many relationships, a row on table A can relate to many rows on table B. And table B can reference many rows on table A. Students can sign up for many classes and classes have many students in them. Recipes can contain many ingredients and ingredients can be used in many recipes. Stores can sell many products and products can be sold at many different stores.
The Problem
In our books database, users can comment on books
Problem —
Can a user comment on many books?
Can a book be commented on by many users?
Users can have many books, books can have many users…?
Many-to-many: A Lie?
Many-to-many is kind of a lie
There’s no actual relationship between books and users
You need a middle table (a comments table) to connect them
In reality, there are two one-to-many relationships
A book has many comments
A user has many comments
Comments Diagram
A book can have many comments (by many users)
A user can have many comments (on many books)
In Code
Using SQL, we can create many to many relationships. In the example below, books and users are two unrelated tables. The comments table references both of the others to bring information about them together.
CREATE TABLE books (
book_id SERIAL PRIMARY KEY,
title VARCHAR(50)
);
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
email VARCHAR(50)
);
CREATE TABLE comments (
comment_id SERIAL PRIMARY KEY,
book_id INTEGER NOT NULL REFERENCES books(book_id),
user_id INTEGER NOT NULL REFERENCES users(user_id),
body TEXT
);
Uniqueness of Many to Many Relationship
There is the possibility of the same user making multiple comments on one book, with each comment a separate row in the comments table. If you want to limit a user to a single comment on a book, you can impose a uniqueness condition on the combined fields user_id and book_id in the comments table. This is easily done in SQL.
Many-to-many: Genres
Another many-to-many example
The New Problem
In our books database, books have genres
Problem —
Can a book have many genres?
Can a genre have many books?
What do we call the middle table?
No obvious name like “comments”
Often use combination of table names: BookGenre
BookGenre Diagram
Middle Table vs. Association Table
Middle table (example: Comment)
Middle tables have meaningful fields
Association table (example: BookGenre)
No meaningful fields
Simply the “glue” between books and genres
Database can’t tell the difference between association and middle tables
Design decision
In Code
To create an association table, we use a lot of the same syntax. Notice that books_genres does not have any unique information. It just stores the relationships between books and genres.
CREATE TABLE books (
book_id SERIAL PRIMARY KEY,
title VARCHAR(50)
);
CREATE TABLE genres (
genre_id SERIAL PRIMARY KEY,
genre VARCHAR(50)
);
CREATE TABLE books_genres (
book_genre_id SERIAL PRIMARY KEY,
book_id INTEGER NOT NULL REFERENCES books(book_id),
genre_id INTEGER NOT NULL REFERENCES genres(genre_id)
);
One-to-one
In one to one relationships, a row on table A can relate to only one row on table B and the same from B to A. One person can have one SSN and one SSN can be assigned only to one person. One country can have one capitol city and each capitol city belongs to exactly one country. One car can have a single VIN number and that VIN can only be assigned to one car.
The Problem
In our book app, let’s say we have reader accounts and author accounts
We need to keep track of some user information for every account, no matter the type…
name
password
…and some additional information for the author accounts
bio
most popular book
We could create one user table with all the needed columns, but then there would be lots of null values on the reader accounts, which isn’t great. Instead, we can create two tables.
Diagram
In the author table, user_id references the user table, and we can use the UNIQUE keyword to make sure authors aren’t attached to more than one user.
When to Use One-to-one
One-to-one relationships are useful for:
Categories — “this is a special case of…”
Authors are a special type of users
Augment a table with additional data
There’s more to know about authors
Analogous to subclasses in object orientation
In Code
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
email VARCHAR(100),
name VARCHAR(75),
password VARCHAR(40)
);
CREATE TABLE authors (
author_id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(user_id),
bio TEXT,
popular_book VARCHAR(100)
);