BookSearch: Seeding
Mix.install([
{:jason, "~> 1.4"},
{:kino, "~> 0.9", override: true},
{:youtube, github: "brooklinjazz/youtube"},
{:hidden_cell, github: "brooklinjazz/hidden_cell"},
{:ecto, "~> 3.9.5"},
{:faker, "~> 0.17.0"}
])
Navigation
Home Report An Issue Blog: CommentsBlog: SeedingReview Questions
- How do we create and run seed files?
- What are some benefits of seeding data?
- What is idempotency and why is it useful to have idempotent seed files?
Overview
Seeding
Database seeding refers to the process of populating a database with initial data. This is often done when a new database is created, or when a database needs to be reset to a known state for testing or development purposes.
Seeding a database with initial data can be useful in a number of situations. Here are a few examples:
- When setting up a new application: Seeding the database can help to ensure that the application has the necessary data it needs to function properly. This can be particularly useful when working with a team, as it ensures that everyone has the same initial data to work with.
- When testing: Seeding a database with known data can make it easier to test the functionality of an application. For example, you might seed the database with data that represents different edge cases or error conditions, so you can verify that the application handles these situations correctly.
- When testing incomplete features: Seeding a database allows us to create data before we’ve added the necessary UI so that we can test portions of a feature in isolation.
- When developing: Seeding a database with initial data can save time during development by eliminating the need to manually enter data each time you reset the database.
- When deploying: Seeding the database with initial data can be useful when deploying an application to a new environment, as it ensures that the application has the necessary data it needs to function properly in the new environment.
Overall, seeding a database with initial data can help to improve the reliability and ease of use of an application.
Seeding With Phoenix And Ecto
By default, Phoenix creates a priv/repo/seeds.exs
file which will seed our database.
priv
is a directory that keeps all resources that are necessary in production but are not directly part of your source code. We typically keep database scripts in the priv
folder.
# Script For Populating The Database. You Can Run It As:
#
# Mix Run Priv/repo/seeds.exs
#
# Inside The Script, You Can Read And Write To Any Of Your
# Repositories Directly:
#
# MyApp.Repo.insert!(%MyApp.SomeSchema{})
#
# We Recommend Using The Bang Functions (`insert!`, `update!`
# And So On) As They Will Fail If Something Goes Wrong.
We can run this file from our project folder with the following command. This is the same way we run any .exs
file in our mix project.
mix run priv/repo/seeds.exs
This file is automatically executed whenever we run mix ecto.setup
or mix ecto.reset
.
We can see this in our mix commands in mix.exs
.
defp aliases do
[
setup: ["deps.get", "ecto.setup"],
"ecto.setup": ["ecto.create", "ecto.migrate", "run priv/repo/seeds.exs"],
"ecto.reset": ["ecto.drop", "ecto.setup"],
test: ["ecto.create --quiet", "ecto.migrate --quiet", "test"],
"assets.deploy": ["esbuild default --minify", "phx.digest"]
]
end
We’ll be using mix ecto.reset
frequently to re-run our seed file with a clean database.
Faker
The Faker library is a Elixir library that generates fake data for a variety of purposes. It can be used to generate fake data for testing, development, or demonstrations of an application.
Faker provides a simple and intuitive interface for generating fake data in Elixir. It includes a wide range of functions for generating fake data, such as names, addresses, phone numbers, and more.
Here’s an example of how you might use the Faker library to generate some fake data in Elixir:
iex> Faker.Person.name()
"John Smith"
iex> Faker.Phone.EnUs.phone()
"555-555-5555"
iex> Faker.Address.street_address()
"123 Main St."
The Faker library can be particularly useful when working with Elixir’s Ecto library, as it provides a way to generate fake data to seed a database with initial data.
For more, see the Faker Documentation
BookSearch: Seeding
To learn more about seeding the database, we’re going to seed some data for our existing BookSearch application.
If you need clarification during this reading, you can reference the completed BookSearch/seeding branch of the DockYard Academy example BookSearch
project.
Run The BookSearch Project
Ensure your BookSearch
project is up to date with BookSearch: Books from the previous lesson. If not, you can clone the BookSearch/books project.
If you cloned the project, checkout into the books
branch.
git checkout books
All tests should pass.
mix test
Reset to ensure we have a clean database for this exercise.
mix ecto.reset
If you encounter issues with your database in your tests, you can drop and recreate the database using the MIX_ENV environment variable to run commands in the test environment.
MIX_ENV=test mix ecto.drop
After this lesson, it will no longer be safe to run mix ecto.reset
for the test environment because by default mix ecto.reset
seeds our database, which we typically don’t want for our tests.
Create An Author
We’re going to create an author when we seed our database. We can alias the Authors
context inside of our seed file and use the context functions normally.
Add the following to priv/repo/seeds.exs
.
alias BookSearch.Authors
# Create An Author Without Any Books.
Authors.create_author(%{name: "Andrew Rowe"})
Reset the database to clean our database and run the seed file.
mix ecto.reset
Open the project in IEx.
iex -S mix
View the list of authors to see the author our seed file created.
iex> BookSearch.Authors.list_authors()
[
%BookSearch.Authors.Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
id: 1,
name: "Andrew Rowe",
books: #Ecto.Association.NotLoaded,
inserted_at: ~N[2022-12-18 05:15:28],
updated_at: ~N[2022-12-18 05:15:28]
}
]
Create A Book
Add the following to seeds.exs
to create a book when we seed the database.
# Add With Your Existing Aliases
alias BookSearch.Books
# Create A Book Without An Author.
Books.create_book(%{title: "Beowulf"})
Reset the database to clean our database and run the seed file.
mix ecto.reset
View the list of books to see the book our seed file created.
iex> BookSearch.Books.list_books()
[
%BookSearch.Books.Book{
__meta__: #Ecto.Schema.Metadata<:loaded, "books">,
id: 1,
title: "Beowulf",
author_id: nil,
author: #Ecto.Association.NotLoaded,
inserted_at: ~N[2022-12-18 06:08:54],
updated_at: ~N[2022-12-18 06:08:54]
}
]
Create A Book And An Associated Author
Now we’re going to create a book with an associated author.
Neither our Authors
context or Books
context allows us to create associated records yet.
For now, we can manually create the associated data by creating a changeset and adding the association using Ecto.Changeset.put_assoc/4.
# Add With Your Existing Aliases
alias BookSearch.Books.Book
alias BookSearch.Repo
# Create An Author And A Book.
{:ok, author} = Authors.create_author(%{name: "Patrick Rothfus"})
%Book{}
|> Book.changeset(%{title: "Name of the Wind"})
|> Ecto.Changeset.put_assoc(:author, author)
|> Repo.insert!()
Reset the database to clean our database and run the seed file.
mix ecto.reset
In the IEx shell, query the books table and preload the author to see the book and the associated author.
$ iex -S mix
iex> BookSearch.Repo.all(BookSearch.Books.Book) |> BookSearch.Repo.preload([:author])
[
%BookSearch.Books.Book{
__meta__: #Ecto.Schema.Metadata<:loaded, "books">,
id: 1,
title: "Beowulf",
author_id: nil,
author: nil,
inserted_at: ~N[2022-12-18 06:28:37],
updated_at: ~N[2022-12-18 06:28:37]
},
%BookSearch.Books.Book{
__meta__: #Ecto.Schema.Metadata<:loaded, "books">,
id: 2,
title: "Name of the Wind",
author_id: 2,
author: %BookSearch.Authors.Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
id: 2,
name: "Patrick Rothfus",
books: #Ecto.Association.NotLoaded,
inserted_at: ~N[2022-12-18 06:28:37],
updated_at: ~N[2022-12-18 06:28:37]
},
inserted_at: ~N[2022-12-18 06:28:37],
updated_at: ~N[2022-12-18 06:28:37]
}
]
We can also query our authors table to test our association from the authors perspective.
iex> BookSearch.Repo.all(BookSearch.Authors.Author) |> BookSearch.Repo.preload([:books])
[
%BookSearch.Authors.Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
id: 1,
name: "Andrew Rowe",
books: [],
inserted_at: ~N[2022-12-18 06:28:37],
updated_at: ~N[2022-12-18 06:28:37]
},
%BookSearch.Authors.Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
id: 2,
name: "Patrick Rothfus",
books: [
%BookSearch.Books.Book{
__meta__: #Ecto.Schema.Metadata<:loaded, "books">,
id: 2,
title: "Name of the Wind",
author_id: 2,
author: #Ecto.Association.NotLoaded,
inserted_at: ~N[2022-12-18 06:28:37],
updated_at: ~N[2022-12-18 06:28:37]
}
],
inserted_at: ~N[2022-12-18 06:28:37],
updated_at: ~N[2022-12-18 06:28:37]
}
]
Using Seeds For Manual Testing
Now that we have an author, a book, and an associated book/author, let’s use them to test our application.
Start the Phoenix server if it’s not already running.
$ mix phx.server
Visit http://localhost:4000/books to view the list of books.
Listing Books With Authors
It would be nice if our list of books included the author in it as well. Lets add that feature to demonstrate the benefit of our seeded data.
Modify our index.heex.html
file in the books
folder to include an author column.
<h1>Listing Books</h1>
<table>
<thead>
<tr>
<th>Author</th>
<th>Title</th>
<th></th>
</tr>
</thead>
<tbody>
<%= for book <- @books do %>
<tr>
<td><%= book.author.name %></td>
<td><%= book.title %></td>
<td>
<span><%= link "Show", to: Routes.book_path(@conn, :show, book) %></span>
<span><%= link "Edit", to: Routes.book_path(@conn, :edit, book) %></span>
<span><%= link "Delete", to: Routes.book_path(@conn, :delete, book), method: :delete, data: [confirm: "Are you sure?"] %></span>
</td>
</tr>
<% end %>
</tbody>
</table>
<span><%= link "New Book", to: Routes.book_path(@conn, :new) %></span>
The page will crash with the error: key :name not found in: #Ecto.Association.NotLoaded
.
That’s because we haven’t loaded the author association for this page. To load the association, we can preload the authors in our BookController
.
def index(conn, _params) do
books = Books.list_books() |> BookSearch.Repo.preload([:author])
render(conn, "index.html", books: books)
end
Now our page will crash with a different error: key :name not found in: nil. If you are using the dot syntax, such as map.field, make sure the left-hand side of the dot is a map
. Progress!
This is where having a varied set of seed data is useful. We get this error because not all of our books have authors. It would have been easy to miss this error if we only had a book with an author, but we caught it since we have a book without an author.
To resolve this issue, we can check if the author exists. Replace line 15
where we display the book’s author’s name with the following:
<td><%= if book.author, do: book.author.name %></td>
Visit http://localhost:4000/books to view the list of books with their authors.
Show Books With Authors
Click on the Show
button for a book, or visit http://localhost:4000/books/1 to view the show page for a book.
It would also be nice to display the book’s author here. We’ll anticipate the errors this time and make sure to preload our author
in the BookController
in book_controller.ex
.
def show(conn, %{"id" => id}) do
book = Books.get_book!(id) |> BookSearch.Repo.preload([:author])
render(conn, "show.html", book: book)
end
We’ll also check if the author exists in the template.
<h1>Show Book</h1>
<ul>
<%= if @book.author do %>
<li>
<strong>Author:</strong>
<%= @book.author.name %>
</li>
<% end %>
<li>
<strong>Title:</strong>
<%= @book.title %>
</li>
</ul>
<span><%= link "Edit", to: Routes.book_path(@conn, :edit, @book) %></span> |
<span><%= link "Back", to: Routes.book_path(@conn, :index) %></span>
Visit http://localhost:4000/books/1 to see Beowulf without an author, and http://localhost:4000/books/2 to see Name of the Wind with an author.
If you have any issues, here’s what the final seeds.exs
file should look like.
alias BookSearch.Authors
alias BookSearch.Books
alias BookSearch.Books.Book
alias BookSearch.Repo
# Create An Author Without Any Books
Authors.create_author(%{name: "Andrew Rowe"})
# Create A Book Without An Author.
Books.create_book(%{title: "Beowulf"})
# Create An Author With A Book.
{:ok, author} = Authors.create_author(%{name: "Patrick Rothfus"})
%Book{}
|> Book.changeset(%{title: "Name of the Wind"})
|> Ecto.Changeset.put_assoc(:author, author)
|> Repo.insert!()
Idempotency
Idempotency refers to the property of certain operations in which they can be applied multiple times without changing the result beyond the initial application. An operation is idempotent if applying it multiple times has the same effect as applying it once.
In the context of database operations, idempotency is often used to describe the behavior of database scripts. A database script is considered idempotent if running it multiple times has the same effect as running it once. This means that the script should be designed in such a way that it can be safely re-run, even if it has already been applied to the database.
Proving Our Script Is Not Idempotent
Our seeds.exs
script is not idempotent. We’ve been hiding this issue by always resetting the database, but now we’ll prove it. In your terminal, reset the the database and run the seed file. This seeds our database twice because mix ecto.reset
also runs the seeds.exs
script.
$ mix ecto.reset
$ mix run priv/repo/seeds.exs
In your IEx shell, view the list of authors. You’ll notice duplicates because we ran the seed file twice.
iex> BookSearch.Authors.list_authors()
[
%BookSearch.Authors.Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
id: 1,
name: "Andrew Rowe",
books: #Ecto.Association.NotLoaded,
inserted_at: ~N[2022-12-18 07:23:11],
updated_at: ~N[2022-12-18 07:23:11]
},
%BookSearch.Authors.Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
id: 2,
name: "Patrick Rothfus",
books: #Ecto.Association.NotLoaded,
inserted_at: ~N[2022-12-18 07:23:11],
updated_at: ~N[2022-12-18 07:23:11]
},
%BookSearch.Authors.Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
id: 3,
name: "Andrew Rowe",
books: #Ecto.Association.NotLoaded,
inserted_at: ~N[2022-12-18 07:24:38],
updated_at: ~N[2022-12-18 07:24:38]
},
%BookSearch.Authors.Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
id: 4,
name: "Patrick Rothfus",
books: #Ecto.Association.NotLoaded,
inserted_at: ~N[2022-12-18 07:24:38],
updated_at: ~N[2022-12-18 07:24:38]
}
]
Refactoring Our Seeds To Be Idempotent
If we want to achieve idempotency, which often makes our scripts more reliable, we could first check if the data exists. Replace your existing seeds.exs
file with this idompotent version.
alias BookSearch.Authors
alias BookSearch.Authors.Author
alias BookSearch.Books
alias BookSearch.Books.Book
alias BookSearch.Repo
# Create An Author Without Any Books
case Repo.get_by(Author, name: "Andrew Rowe") do
%Author{} = author ->
IO.inspect(author.name, label: "Author Already Created")
nil ->
Authors.create_author(%{name: "Andrew Rowe"})
end
# Create A Book Without An Author.
case Repo.get_by(Book, title: "Beowulf") do
%Book{} = book ->
IO.inspect(book.title, label: "Book Already Created")
nil ->
Books.create_book(%{title: "Beowulf"})
end
# Create An Author With A Book.
{:ok, author} =
case Repo.get_by(Author, name: "Patrick Rothfuss") do
%Author{} = author ->
IO.inspect(author.name, label: "Author Already Created")
{:ok, author}
nil ->
Authors.create_author(%{name: "Patrick Rothfuss"})
end
case Repo.get_by(Book, title: "Name of the Wind") do
%Book{} = book ->
IO.inspect(book.title, label: "Book Already Created")
nil ->
%Book{}
|> Book.changeset(%{title: "Name of the Wind"})
|> Ecto.Changeset.put_assoc(:author, author)
|> Repo.insert!()
end
Reset the database.
$ mix ecto.reset
Run the seeds.exs
file again so we can confirm it’s idempotent.
$ mix run priv/repo/seeds.exs
Display the list of authors in the IEx shell to confirm there are no duplicates.
$ iex -S mix
iex> BookSearch.Authors.list_authors()
[
%BookSearch.Authors.Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
id: 1,
name: "Andrew Rowe",
books: #Ecto.Association.NotLoaded,
inserted_at: ~N[2022-12-18 07:39:20],
updated_at: ~N[2022-12-18 07:39:20]
},
%BookSearch.Authors.Author{
__meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
id: 2,
name: "Patrick Rothfuss",
books: #Ecto.Association.NotLoaded,
inserted_at: ~N[2022-12-18 07:39:20],
updated_at: ~N[2022-12-18 07:39:20]
}
]
Idempotency is not always a requirement of a seed file, but it’s useful to be aware of for your future projects.
Case Specific Seed Files
Sometimes we want to create special seed files to reproduce specific situations. For the sake of example, we’re going to make a seed file that will seed our database with large amounts of data.
Install Faker
To make adding fake data easier, we’re going to install Faker
. Add the following to your list of dependencies in mix.exs
. As the latest Faker version may change, check the version on Hex.pm/faker
{:faker, "~> 0.17.0"}
Install dependencies.
mix deps.get
Open the IEx shell and confirm you can use the Faker
module.
$ iex -S mix
iex> Faker.Person.name()
"Lester McKenzie"
We’re going to use Faker to create names of authors.
Faker.Person.name()
We can also use it to generate a sentence.
Faker.Lorem.sentence()
We can also specify a certain number of words for our sentence.
Faker.Lorem.sentence(10)
Create A New Seed File
There is nothing special about the priv/repo/seeds.exs
file other than being run by default in our mix ecto.setup
command.
Create a new priv/repo/seed_large_dataset.exs
file which is going to create a large amount of data. We’re not going to make this script idempotent since it could be useful to re-run to add even more data.
alias BookSearch.Authors
alias BookSearch.Books
alias BookSearch.Books.Book
alias BookSearch.Repo
# Authors Without Books
Enum.each(1..10, fn _ ->
Authors.create_author(%{name: Faker.Person.name()})
end)
# Books Without Authors
Enum.each(1..10, fn _ ->
Books.create_book(%{title: Faker.Lorem.sentence()})
end)
Enum.each(1..10, fn _ ->
{:ok, author} = Authors.create_author(%{name: Faker.Person.name()})
Enum.each(1..10, fn _ ->
%Book{}
|> Book.changeset(%{title: Faker.Lorem.sentence()})
|> Ecto.Changeset.put_assoc(:author, author)
|> Repo.insert!()
end)
end)
Reset the database.
$ mix ecto.reset
Run the seed_large_dataset.exs
file.
$ mix run priv/repo/seed_large_dataset.exs
Now we have large amounts of data we can use to test our application if desired.
Large Text
Often we find bugs in our UI when we use large amounts of data. We can use the Faker.Lorem module for generating specific amounts of text.
Let’s create a new priv/repo/seed_large_text.exs
file which adds an author and a book with large amounts of text for their name
and title
.
alias BookSearch.Authors
alias BookSearch.Books
alias BookSearch.Books.Book
alias BookSearch.Repo
# Author Without Books
Authors.create_author(%{name: Faker.Lorem.sentence(10)})
# Book Without Author
Books.create_book(%{title: Faker.Lorem.sentence(10)})
# Author With A Book
{:ok, author} = Authors.create_author(%{name: Faker.Lorem.sentence(10)})
Enum.each(1..10, fn _ ->
%Book{}
|> Book.changeset(%{title: Faker.Lorem.sentence(10)})
|> Ecto.Changeset.put_assoc(:author, author)
|> Repo.insert!()
end)
Reset the database.
$ mix ecto.reset
Run the seed file.
$ mix run priv/repo/seed_large_text.exs
Visit http://localhost:4000/books to see how our UI handles large text content.
Looking good!
Push To GitHub
Ensure all of your tests continue to pass.
mix test
ONLY If you cloned the book_search
project: you’ll have to re-initialize it as a git project so you have ownership over the project. The following command removes the git folder and re-initializes it.
$ rm -rf .git
$ git init
Create a GitHub Repository and follow the instructions to connect your local book_search
project.
Then stage and commit your changes to GitHub from the book_search
folder.
git add .
git commit -m "create seed files"
git push
Commit Your Progress
DockYard Academy now recommends you use the latest Release rather than forking or cloning our repository.
Run git status
to ensure there are no undesirable changes.
Then run the following in your command line from the curriculum
folder to commit your progress.
$ git add .
$ git commit -m "finish BookSearch: Seeding reading"
$ git push
We’re proud to offer our open-source curriculum free of charge for anyone to learn from at their own pace.
We also offer a paid course where you can learn from an instructor alongside a cohort of your peers. We will accept applications for the June-August 2023 cohort soon.