Note: This document is a work in progress. You can help improve it.

Introduction to joins

Relational Databases and Data Normalization

In this lecture we are going to develop a database to keep information about movies. Along the way we will need to track information about the movie that isn't singular information. With a single table we store information in columns that is singular. We also want to avoid repeating information. For instance, the movie's title is a single piece of information while the list of actors in the cast are multiple pieces of information. Similarly, the rating (G, PG, etc) would be repeated information per row. We will learn how to create relations to store this information in separate tables and join / relate it back to the movies.

Primary Keys

Since we will be relating information between multiple tables we need a way to uniquely identify a row of data in a table. This is known as the table's PRIMARY KEY and is a special attribute of the column that says that it uniquely identifies the row and also cannot repeat.

You might have an existing column that you feel uniquely identify the row. For instance, you might think that the movie's title would uniquely identify the movie. However, we know that sometimes a movie's title changes during the production, or even uses the name of a movie that has existed in the past. So if we used the title to uniquely identify it we would run into many issues.

Fortunately databases provide their own way of supplying a unique value for each row in the database. We call this SERIAL column.

So when creating our Movie table we will use this schema:

CREATE TABLE movies (
  id               SERIAL PRIMARY KEY,
  title            TEXT NOT NULL,
  primary_director TEXT,
  year_released    INT,
  genre            TEXT
);

We now have a column id that the database will ensure is unique and identifies the row (PRIMARY KEY) and is automatically generated sequentially by the database (SERIAL)

Lets add some movies to our database:
INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('The Lost World', 'Steven Spielberg', 1997, 'sci-fi');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('Pirates of the Caribbean: The Curse of the Black Pearl', 'Gore Verbinski', 2003, 'fantasy');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('Harry Potter and Goblet of Fire', 'Mike Newell', 2005, 'fantasy');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('The Hobbit: An Unexpected Journey', 'Peter Jackson', 2012, 'fantasy');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('The Hobbit: The Desolation of Smaug', 'Peter Jackson', 2013, 'fantasy');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('The Hobbit: The Battle of the Five Armies', 'Peter Jackson', 2014, 'fantasy');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('The Lord of the Rings: The Return of the King', 'Peter Jackson', 2003, 'fantasy');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('The Lord of the Rings: The Fellowship of the Ring', 'Peter Jackson', 2001, 'fantasy');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('The Lord of the Rings: The Two Towers', 'Peter Jackson', 2002, 'fantasy');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('Hitchhikers Guide to the Galaxy', 'Garth Jennings', 2005, 'sci-Fi');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('Cujo', 'Lewis Teague', 1983, 'horror');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('It', 'Andrés Muschietti', 2017, 'horror');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('It', 'Tommy Lee Wallace', 1990, 'horror');

INSERT INTO movies (title,  primary_director, year_released, genre)
VALUES ('Howls Moving Castle', 'Hayao Miyazaki', 2005, 'fantasy');

Foreign Keys

In order to keep track of the rating for any given movie we will add a single table, named ratings that will store the name of the rating. Since we also want to uniquely identify the ratings, we'll ensure this table also has a serial primary key.

CREATE TABLE ratings (
  id SERIAL PRIMARY KEY,
  rating TEXT
);
Lets insert some ratings:
INSERT INTO ratings (rating) VALUES ('G');
INSERT INTO ratings (rating) VALUES ('PG');
INSERT INTO ratings (rating) VALUES ('PG-13');
INSERT INTO ratings (rating) VALUES ('R');

Lets also add a table to keep information about our actors. For this table we want to know the full name of the actor and their birthday. We'll also create an id that is a PRIMARY KEY and is SERIAL

CREATE TABLE actors (
  id        SERIAL PRIMARY KEY,
  full_name TEXT NOT NULL,
  birthday  DATE
);
Lets also insert some actors
INSERT INTO actors (full_name,  birthday)
VALUES ('Orlando Bloom', '1977-01-13');

INSERT INTO actors (full_name,  birthday)
VALUES ('Warwick Davis', '1970-02-03');

INSERT INTO actors (full_name,  birthday)
VALUES ('Martin Freeman', '1971-09-08');

So now we have a way to identify each movie, rating, and actor. Next we will talk about how to relate the three sets of information together.

One to Many

When we have a relationship such as the ratings for a movie, we say that this is a "One to Many" relationship. That is, a movie has one rating (e.g. the movie Bambi is rated G), but a rating applies to many movies (e.g. there are many movies with a G rating)

The ERD of this looks like:

                                                             +-----------------------+
             +----------------------------+                  |        RATINGS        |
             |         MOVIES             |                  |                       |
             |                            |                  |   id        SERIAL    |
             | id                  SERIAL +------------------+   rating    TEXT      |
             | title               TEXT   |  many        one |                       |
             | primary_director    TEXT   |                  +-----------------------+
             | year_released       INT    |
             | genre               TEXT   |
             +----------------------------+

Lets add a new column to our movies to indicate WHICH rating is associated to each row representing a movie.

The column we are adding is a rating_id that is an integer since this is the same data type as a SERIAL which we are going to relate to. The NULL indicates that we are allowed to have no value and the value to place in the column when no data is present is NULL. Next we indicate that this is a foreign key (we are relating this table) to the ratings table. We also specify the column in the other table, in this case id in ratings, we mean to match.

ALTER TABLE movies ADD COLUMN rating_id INTEGER NULL REFERENCES ratings (id);

Now we can specify the rating_id associated to each movie when we insert the movie.

UPDATE movies SET rating_id = 1 WHERE id in (8,9,10);

Joining tables

So now that we have these two tables, how do we join them together so that we can retrieve information about movies and their ratings or get information about a rating and the associated movies.

Query movies and also get their rating
SELECT *
FROM movies
JOIN ratings ON movies.rating_id = ratings.id;
+------+---------------------------------------------------+--------------------+-----------------+---------+-------------+------+----------+
| id   | title                                             | primary_director   | year_released   | genre   | rating_id   | id   | rating   |
|------+---------------------------------------------------+--------------------+-----------------+---------+-------------+------+----------|
| 8    | The Lord of the Rings: The Fellowship of the Ring | Peter Jackson      | 2001            | fantasy | 1           | 1    | G        |
| 9    | The Lord of the Rings: The Two Towers             | Peter Jackson      | 2002            | fantasy | 1           | 1    | G        |
| 10   | Hitchhikers Guide to the Galaxy                   | Garth Jennings     | 2005            | sci-Fi  | 1           | 1    | G        |
+------+---------------------------------------------------+--------------------+-----------------+---------+-------------+------+----------+
SELECT 3

This query will give us movies and their ratings, but only for movies that have a rating_id that matches an id from the ratings table. That is, any movie with a null value for rating_id (or a value that doesn't match an id) will not be in the results.

This default kind of join is called an INNER JOIN and can be seen in the diagram below

See this blog post for a more detailed description of the various joins.

If we want to see ALL the movies and include the ratings table information when there is a match we modify the query to use LEFT JOIN

Query all the movies and include ratings when possible
SELECT *
FROM movies
LEFT JOIN ratings ON movies.rating_id = ratings.id;
+------+--------------------------------------------------------+--------------------+-----------------+---------+-------------+--------+----------+
| id   | title                                                  | primary_director   | year_released   | genre   | rating_id   | id     | rating   |
|------+--------------------------------------------------------+--------------------+-----------------+---------+-------------+--------+----------|
| 1    | The Lost World                                         | Steven Spielberg   | 1997            | sci-fi  | <null>      | <null> | <null>   |
| 2    | Pirates of the Caribbean: The Curse of the Black Pearl | Gore Verbinski     | 2003            | fantasy | <null>      | <null> | <null>   |
| 3    | Harry Potter and Goblet of Fire                        | Mike Newell        | 2005            | fantasy | <null>      | <null> | <null>   |
| 4    | The Hobbit: An Unexpected Journey                      | Peter Jackson      | 2012            | fantasy | <null>      | <null> | <null>   |
| 5    | The Hobbit: The Desolation of Smaug                    | Peter Jackson      | 2013            | fantasy | <null>      | <null> | <null>   |
| 6    | The Hobbit: The Battle of the Five Armies              | Peter Jackson      | 2014            | fantasy | <null>      | <null> | <null>   |
| 7    | The Lord of the Rings: The Return of the King          | Peter Jackson      | 2003            | fantasy | <null>      | <null> | <null>   |
| 11   | Cujo                                                   | Lewis Teague       | 1983            | horror  | <null>      | <null> | <null>   |
| 12   | It                                                     | Andrés Muschietti  | 2017            | horror  | <null>      | <null> | <null>   |
| 13   | It                                                     | Tommy Lee Wallace  | 1990            | horror  | <null>      | <null> | <null>   |
| 14   | Howls Moving Castle                                    | Hayao Miyazaki     | 2005            | fantasy | <null>      | <null> | <null>   |
| 8    | The Lord of the Rings: The Fellowship of the Ring      | Peter Jackson      | 2001            | fantasy | 1           | 1      | G        |
| 9    | The Lord of the Rings: The Two Towers                  | Peter Jackson      | 2002            | fantasy | 1           | 1      | G        |
| 10   | Hitchhikers Guide to the Galaxy                        | Garth Jennings     | 2005            | sci-Fi  | 1           | 1      | G        |
+------+--------------------------------------------------------+--------------------+-----------------+---------+-------------+--------+----------+

The left join indicates that we want everything from the left table, in this case movies, and matches from the ratings table. For movies that have no matching ratings row the information will be null

Many to Many

For the list of actors in the cast we might say "A movie has many cast members" but also "A cast member has appeared in many movies." In this case we need to relate these two tables in a different way.

The ERD of this looks like:

      +--------------------------------+           +---------------------------+
      |            MOVIES              |           |         RATINGS           |
      |                                |           |                           |
      |    id                  SERIAL  |           |     id        SERIAL      |
      |    title               TEXT    |many    one|     rating    TEXT        |
      |    primary_director    TEXT    +-----------+                           |
      |    year_released       INT     |           +---------------------------+
      |    genre               TEXT    |
      |                                |
      +------------+-------------------+
                   |
                   | many
                   |
                   |
                   |                   +-------------------------+
                   |                   |        ACTORS           |
                   |                   |                         |
                   |              many |    id          SERIAL   |
                   +-------------------+    full_name   TEXT     |
                                       |    birthday    DATE     |
                                       |                         |
                                       +-------------------------+

In the case of a many-to-many relationship we cannot place the foreign keys on either of the tables. In this case we need a third table, commonly referred to as a join table to store the relationships. In this table, we will place two foreign keys, one to the left (movies) and the other to the right (to the actor.) We attempt to name this table based on the relationship between the two tables.

In this case we are trying to represent the relationship between a movie and the actors. We could call this relationship cast_members

CREATE TABLE cast_members (
  id       SERIAL PRIMARY KEY,
  movie_id  INTEGER REFERENCES movies (id),
  actor_id  INTEGER REFERENCES actors (id)
);
      +--------------------------------+           +---------------------------+
      |            MOVIES              |           |         RATINGS           |
      |                                |           |                           |
      |    id                  SERIAL  |           |     id        SERIAL      |
      |    title               TEXT    | many   one|     rating    TEXT        |
      |    primary_director    TEXT    +-----------+                           |
      |    year_released       INT     |           +---------------------------+
      |    genre               TEXT    |
      |                                |
      +-------------+------------------+
                    | one
                    |
                    |
                    |
                    |
                    | many
            +-------+---------------+               +-------------------------+
            |     CAST MEMBERS      |               |          ACTORS         |
            |                       | many      one |                         |
            |   id       SERIAL     +---------------+    id          SERIAL   |
            |                       |               |    full_name   TEXT     |
            |                       |               |    birthday    DATE     |
            |                       |               |                         |
            +-----------------------+               +-------------------------+
Update Orlando Bloom for Pirates & LOTR. The three LOTR movies are id 7, 8 and 9. The Pirates movie is id 2. Orlando Bloom's id is 1
INSERT INTO cast_members (movie_id, actor_id) VALUES (2,1);
INSERT INTO cast_members (movie_id, actor_id) VALUES (7,1);
INSERT INTO cast_members (movie_id, actor_id) VALUES (8,1);
INSERT INTO cast_members (movie_id, actor_id) VALUES (9,1);
Update Warick Davis for Harry Potter and Hitchhikers. Harry Potter's movie id is 3, and Warrik's actor id is 2
INSERT INTO cast_members (movie_id, actor_id) VALUES (3,2);
Update Martin Freeman (actor id is 3) for the Hobbit (movie ids are 4, 5, and 6) & Hitchhikers (movie id 10)
INSERT INTO cast_members (movie_id, actor_id) VALUES (4,3);
INSERT INTO cast_members (movie_id, actor_id) VALUES (5,3);
INSERT INTO cast_members (movie_id, actor_id) VALUES (6,3);
INSERT INTO cast_members (movie_id, actor_id) VALUES (10,3);

Query for the casts and actors

In order to tie movies to actors we need to join the movies first to the cast_members and then join the cast_members to the actors. Since the cast_members table has relations to each of these tables, it acts as the link in the chain between the two tables.

Notice for the actors entry Orlando Bloom there are TWO entries in cast_members since he has appeared in two of our movies

SELECT movies.title, actors.full_name
FROM movies
LEFT JOIN cast_members ON cast_members.movie_id = movies.id
LEFT JOIN actors on actors.id = cast_members.actor_id;
+--------------------------------------------------------+----------------+
| title                                                  | full_name      |
|--------------------------------------------------------+----------------|
| Pirates of the Caribbean: The Curse of the Black Pearl | Orlando Bloom  |
| The Lord of the Rings: The Return of the King          | Orlando Bloom  |
| The Lord of the Rings: The Fellowship of the Ring      | Orlando Bloom  |
| The Lord of the Rings: The Two Towers                  | Orlando Bloom  |
| Harry Potter and Goblet of Fire                        | Warwick Davis  |
| The Hobbit: An Unexpected Journey                      | Martin Freeman |
| The Hobbit: The Desolation of Smaug                    | Martin Freeman |
| The Hobbit: The Battle of the Five Armies              | Martin Freeman |
| Hitchhikers Guide to the Galaxy                        | Martin Freeman |
| The Hobbit: An Unexpected Journey                      | Martin Freeman |
| The Hobbit: The Desolation of Smaug                    | Martin Freeman |
| The Hobbit: The Battle of the Five Armies              | Martin Freeman |
| Hitchhikers Guide to the Galaxy                        | Martin Freeman |
| Cujo                                                   | <null>         |
| It                                                     | <null>         |
| It                                                     | <null>         |
| The Lost World                                         | <null>         |
| Howls Moving Castle                                    | <null>         |
+--------------------------------------------------------+----------------+

Resources