Data Modeling

Goals

  • Learn how to plan out databases

  • Discuss types of relationships between tables

Getting Started

Definition

Data modeling is the process of creating a representation (usually a diagram) of what data will be stored and how it will relate to other data.

General Process

  • figure out, broadly, what data is needed for the application to function correctly

  • separate that data out into groups that will eventually be tables

  • solidify the columns, data types, and relationships

How Much Detail?

_images/scary.png

Often times, you have to make decisions about how much detail the application needs—but it’s usually right to first model it out on paper in total. Start big and pare down as needed.

Normalization

Planning out a database to reduce repetition and simplify the data for understanding and maintenance. We can’t just store everything in one table.

Piling all of your data into a single table would be like stuffing all of your clothes into one dresser drawer. You want a swimsuit? Too bad, here is that shirt you’ve had since 8th grade.

The Problem

We need to store contact information in our database

Problem — some people have many phone numbers

Solution 1: Many Rows

user_id

fname

lname

age

phone1

1

Jane

Jay

26

415 555 1212

2

Jane

Jay

26

415 555 1213

3

Nick

Nest

43

415 555 2000

Solution 2: Many Columns

user_id

fname

lname

age

phone1

phone2

1

Jane

Jay

26

415 555 1212

415 555 1213

2

Nick

Nest

43

415 555 2000

<null>

Two Primary Sins of Data Modeling

  1. Repeating dependencies (for example, storing Jane’s name and age twice)

  2. Representing multiple data for the same category in columns (phone1, phone2, …, phoneInfinity)

This lecture will address how to avoid these sins!

Normalization: In Short

“Every piece of information lives in exactly one place
and relates directly to the key.”

Relationships

Relationship Types

  • One-to-many

  • Many-to-many

  • One-to-one

Foreign Keys

In order to link tables to each other, we need to use values from one table inside the other. When we do so, that value is called a foreign key.

For the tables below, book_id in the printings table is a foreign key that references the books table book_id.

CREATE TABLE books (
    book_id SERIAL PRIMARY KEY,
    title VARCHAR(50)
);

CREATE TABLE printings (
    printing_id SERIAL PRIMARY KEY,
    book_id INTEGER NOT NULL REFERENCES books(book_id),
    print_date TIMESTAMP
);

One-to-many

In one to many relationships, one row on table A can have many related rows on table B. And rows on table B can only relate to one row on table A. One user can have many posts. One product can have many reviews. One course can have multiple sessions.

The Problem

We have a database of books

Problem — we want to record each printing of a book

  • Can a book have many printings?

  • Can a printing have many books?

Diagram

One book, many printings

_images/one-to-many.png

In Code

Using SQL, we can create one to many relationships. In the example below, each book should only be in the books table once, but its id can be referenced as many times as we want from the printings table.

CREATE TABLE books (
    book_id SERIAL PRIMARY KEY,
    title VARCHAR(50)
);

CREATE TABLE printings (
    printing_id SERIAL PRIMARY KEY,
    book_id INTEGER NOT NULL REFERENCES books(book_id),
    print_date TIMESTAMP
);

Many-to-many

In many to many relationships, a row on table A can relate to many rows on table B. And table B can reference many rows on table A. Students can sign up for many classes and classes have many students in them. Recipes can contain many ingredients and ingredients can be used in many recipes. Stores can sell many products and products can be sold at many different stores.

The Problem

In our books database, users can comment on books

Problem —

  • Can a user comment on many books?

  • Can a book be commented on by many users?

Users can have many books, books can have many users…?

_images/many-to-many.png

Many-to-many: A Lie?

Many-to-many is kind of a lie

  • There’s no actual relationship between books and users

  • You need a middle table (a comments table) to connect them

  • In reality, there are two one-to-many relationships

    • A book has many comments

    • A user has many comments

Comments Diagram

A book can have many comments (by many users)

A user can have many comments (on many books)

_images/comments.png

In Code

Using SQL, we can create many to many relationships. In the example below, books and users are two unrelated tables. The comments table references both of the others to bring information about them together.

CREATE TABLE books (
    book_id SERIAL PRIMARY KEY,
    title VARCHAR(50)
);

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(50)
);

CREATE TABLE comments (
    comment_id SERIAL PRIMARY KEY,
    book_id INTEGER NOT NULL REFERENCES books(book_id),
    user_id INTEGER NOT NULL REFERENCES users(user_id),
    body TEXT
);

Uniqueness of Many to Many Relationship

There is the possibility of the same user making multiple comments on one book, with each comment a separate row in the comments table. If you want to limit a user to a single comment on a book, you can impose a uniqueness condition on the combined fields user_id and book_id in the comments table. This is easily done in SQL.

Many-to-many: Genres

Another many-to-many example

The New Problem

In our books database, books have genres

Problem —

  • Can a book have many genres?

  • Can a genre have many books?

What do we call the middle table?

  • No obvious name like “comments”

  • Often use combination of table names: BookGenre

BookGenre Diagram

_images/bookgenre.png

Middle Table vs. Association Table

  • Middle table (example: Comment)

    • Middle tables have meaningful fields

  • Association table (example: BookGenre)

    • No meaningful fields

    • Simply the “glue” between books and genres

  • Database can’t tell the difference between association and middle tables

    • Design decision

In Code

To create an association table, we use a lot of the same syntax. Notice that books_genres does not have any unique information. It just stores the relationships between books and genres.

CREATE TABLE books (
    book_id SERIAL PRIMARY KEY,
    title VARCHAR(50)
);

CREATE TABLE genres (
    genre_id SERIAL PRIMARY KEY,
    genre VARCHAR(50)
);

CREATE TABLE books_genres (
    book_genre_id SERIAL PRIMARY KEY,
    book_id INTEGER NOT NULL REFERENCES books(book_id),
    genre_id INTEGER NOT NULL REFERENCES genres(genre_id)
);

One-to-one

In one to one relationships, a row on table A can relate to only one row on table B and the same from B to A. One person can have one SSN and one SSN can be assigned only to one person. One country can have one capitol city and each capitol city belongs to exactly one country. One car can have a single VIN number and that VIN can only be assigned to one car.

The Problem

In our book app, let’s say we have reader accounts and author accounts

We need to keep track of some user information for every account, no matter the type…

  • email

  • name

  • password

…and some additional information for the author accounts

  • bio

  • most popular book

We could create one user table with all the needed columns, but then there would be lots of null values on the reader accounts, which isn’t great. Instead, we can create two tables.

Diagram

_images/one-to-one.png

In the author table, user_id references the user table, and we can use the UNIQUE keyword to make sure authors aren’t attached to more than one user.

When to Use One-to-one

One-to-one relationships are useful for:

  • Categories — “this is a special case of…”

    • Authors are a special type of users

  • Augment a table with additional data

    • There’s more to know about authors

  • Analogous to subclasses in object orientation

In Code

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(100),
    name VARCHAR(75),
    password VARCHAR(40)
  );

CREATE TABLE authors (
    author_id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(user_id),
    bio TEXT,
    popular_book VARCHAR(100)
  );

THE END