Have you ever visited imdb.com?
The site is full of movies and almost every imaginable piece of information about them: the actors, the directors, the crew, the release date, the runtime, the ratings, the reviews, and more!
Have you ever wondered how all that data about the movies is stored? Turns out that it’s stored in databases! Databases allow us to store and manipulate data, and SQL is the language of databases.
SQL stands for Structured Query Language. Basically, creating databases through SQL allows us to create meaningful data structures: we can group data together that are part of a larger whole, and we can form relationships between data in different groups.
Want to know how to create a database in SQL? Read on, my friend!
Let’s run with the movie example from earlier. From the command line, the below code will create the database “movie_library”.
Then we need to add a table into our database. What is stored in a movie library? Movies, of course! What kind of information do we want to store about movies? Well, movies have a title, a director, the year it was released, the genre, etc. So our SQL commands to create a movie table might be:
CREATE TABLE movies ( id SERIAL PRIMARY KEY, title TEXT NOT NULL, primary_director TEXT, year_released INT, genre TEXT );
What’s that “id” and “SERIAL PRIMARY KEY” all about? Well, we want one unique piece of information about each movie in our table so that we can target it specifically when we want to update it or delete it. You might think the title would be unique enough, but no! Movies can have the same title. Also, a movie’s title might change after the database is created. That’d be bad. So in the majority of cases, developers use “id” as a unique identifier for each row in the table. The “SERIAL PRIMARY KEY” identifies what kind of data the id is. The “SERIAL” word tells SQL to generate the id for us for each new entry; we don’t have to do it manually! That’s convenient! The “PRIMARY KEY” part is what tells SQL “this piece of data is what uniquely identifies this row.”
The title is defined as “TEXT NOT NULL”. The “TEXT” word tells what kind of data the title is. It’s just text. “NOT NULL” tells SQL that there must be a title entry in the row. Normally it’s okay to leave a blank cell in a row — it’ll just be filled in as null. But in this case, the title cannot be null. You’ll notice that the year_released is an “INT”. That just means that it’s an integer — a whole number.
Phew! That was the setup for the table (the columns). Now let’s insert some actual movies (rows):
INSERT INTO movies (title, primary_director, year_released, genre) VALUES ('The Lost World', 'Steven Spielberg', 1997, 'sci-fi'); INSERT INTO movies (title, primary_director, year_released, genre) VALUES ('Pirates of the Caribbean: The Curse of the Black Pearl', 'Gore Verbinski', 2003, 'fantasy'); INSERT INTO movies (title, primary_director, year_released, genre) VALUES ('Harry Potter and Goblet of Fire', 'Mike Newell', 2005, 'fantasy'); INSERT INTO movies (title, primary_director, year_released, genre) VALUES ('The Hobbit: An Unexpected Journey', 'Peter Jackson', 2012, 'fantasy'); INSERT INTO movies (title, primary_director, year_released, genre) VALUES ('The Hobbit: The Desolation of Smaug', 'Peter Jackson', 2013, 'fantasy'); INSERT INTO movies (title, primary_director, year_released, genre) VALUES ('The Hobbit: The Battle of the Five Armies', 'Peter Jackson', 2014, 'fantasy'); INSERT INTO movies (title, primary_director, year_released, genre) VALUES ('The Lord of the Rings: The Return of the King', 'Peter Jackson', 2003, 'fantasy'); INSERT INTO movies (title, primary_director, year_released, genre) VALUES ('The Lord of the Rings: The Fellowship of the Ring', 'Peter Jackson', 2001, 'fantasy'); INSERT INTO movies (title, primary_director, year_released, genre) VALUES ('The Lord of the Rings: The Two Towers', 'Peter Jackson', 2002, 'fantasy');
The INSERT INTO keywords tell SQL that we’re going to put a row into our table. Then we give SQL the name of our table, “movies”. Then we tell SQL which columns we’d like to fill out (title, primarydirector, yearreleased, genre). Finally, we provide the values for those columns: VALUES ('The Lord of the Rings: The Two Towers', 'Peter Jackson', 2002, 'fantasy');.
The table will end up looking like this:
What if we wanted to give each of our movies a rating? Well, ratings are kind of general information; they don’t belong to any specific movie, but are related to many movies. Why don’t we create a separate table for the ratings, and then connect that table to our movies table? That’s basically a “One to Many” relationship. A movie can only have one rating, but a ratings can be held by many movies.
Okay — let’s create our ratings table to get started:
CREATE TABLE ratings ( id SERIAL PRIMARY KEY, rating TEXT );
Now let’s add some ratings to our table:
INSERT INTO ratings (rating) VALUES ('G'); INSERT INTO ratings (rating) VALUES ('PG'); INSERT INTO ratings (rating) VALUES ('PG-13'); INSERT INTO ratings (rating) VALUES ('R');
Our ratings table looks like this:
Now we need to connect the movies table to the ratings table. For this, we need what is called a “Foreign Key.” The Foreign Key is what relates one table to another. In this case, our foreign key will be the column “ratingid” in our movies table. Why call it “ratingid”? Well, we want to indicate that we’re connecting to the ratings table! And we want to match movies to ratings based on the rating’s id. So if a movie has a rating_id of 2, it’d have a rating of PG.
Let’s add the rating_id column to the movies table:
ALTER TABLE movies ADD COLUMN rating_id INTEGER NULL REFERENCES ratings (id);
You’ll notice the “NULL” above. That just means that we don’t absolutely need a ratingid. Maybe the movie isn’t rated yet. That’s okay. That field can just be null. “REFERENCES” is what connects the ratingid column to the ratings table. And finally, “ratings (id)” tells SQL that we want our rating_id column to match the id column of the ratings table.
So now our movies table looks like this:
Let’s add some ratings! For the sake of ease, movies 1–3 will be rated G. Movies 4–6 will be rated PG. Movies 7–9 will be rated PG-13.
UPDATE movies SET rating_id = 1 WHERE id in (1,2,3); UPDATE movies SET rating_id = 2 WHERE id in (4,5,6); UPDATE movies SET rating_id = 3 WHERE id in (7,8,9);
Now our movies table has rating_ids:
Finally! The Foreign Key is all set up! Now we can get to the good part: connecting the tables. When we connect tables in SQL, it’s called “JOIN”ing them. In the below code, we specify that we want to JOIN the tables ON the movies.rating_id column and the ratings.id column:
SELECT * FROM movies JOIN ratings ON movies.rating_id = ratings.id;
Notice how the movies.rating_id column and ratings.id column contain the same values:
What if we only want the movie title and its rating? Easy! Just select only those two columns and join the tables:
SELECT movies.title, ratings.rating FROM movies JOIN ratings ON movies.rating_id = ratings.id;
That’s that! We successfully created a movie library database. Then we created a table for the movies and a table for the ratings. Finally, we connected the movies to their corresponding ratings by joining the two tables.