Two proven tricks to keep your database up to date across the environments

Control your database structure and data with liquibase

Published on 18/03/2021 by igor.kolomiyets in Technical Tips

Imagine yourself in the middle of a busy IT company that develops a software system. In a large number of cases, such a system will be using some form of the database at the backend.

Splash

This means that you will likely face a major problem during development, test and subsequent production use: how to keep the database uniform and up to date across various environments?

Another big issue that comes out the above is how to provide developers with the disposable copy of the database that is preloaded with some test data.

If you struggle to solve either of the above issues you’re in the right place. I explain how to deal with both of these problems using the combination of the unlikely tools: Liquibase and Docker.

Liquibase allows you to apply the DDL and DML statements to a database instance in a particular order and ensure that they only applied once. This means that the combination of such scripts brings the target database to a specific end state. If the same set of scripts has been applied to a number of environments it guarantees that these environments will be exactly the same.

Liquibase also allows you to selectively apply particular context related scripts to some environments and do not apply them to others. This would allow us easily preload the development database with test data. Hence, developers and testers can use it immediately without the need of creating test data themselves.

Let’s have a look at this approach with the following simple example.

Our test database has the following structure:

CREATE TABLE DEPARTMENTS (
    ID INTEGER PRIMARY KEY,
    NAME VARCHAR(40) NOT NULL
);

CREATE TABLE EMPLOYEE (
    ID INTEGER PRIMARY KEY,
    DEPARTMENT_ID INTEGER NOT NULL,
    FIRST_NAME VARCHAR(40),
    LAST_NAME VARCHAR(40) NOT NULL 
);

ALTER TABLE EMPLOYEE ADD CONSTRAINT EMPLOYEE_DEPARTMENT_FK
    FOREIGN KEY (DEPARTMENT_ID) REFERENCES DEPARTMENTS(ID);

For the test purpose, we also want to insert the following data to the database:

INSERT INTO department (id, name) VALUES (1, 'Marketing');
INSERT INTO department (id, name) VALUES (2, 'Sales');

INSERT INTO employee (id, department_id, first_name, last_name) VALUES (1, 1, 'John', 'Smith');
INSERT INTO employee (id, department_id, first_name, last_name) VALUES (2, 1, 'Diana', 'Lee');
INSERT INTO employee (id, department_id, first_name, last_name) VALUES (3, 2, 'Dave', 'Mead');
INSERT INTO employee (id, department_id, first_name, last_name) VALUES (4, 2, 'Jane', 'Kean');

In order to wrap these two scripts into, what is in Liquibase terminology is called the changesets, all you need to do is to add the following header to each file:

--liquibase formatted sql

--changeset <name>:<id>

We need to make sure that the combination of the changeset name and id is unique across the project.

In the situation where we want some scripts only to be applied in the specific scenarios, liquibase uses so-called contexts, in essence, another identifier that marks a changeset and then used as a parameter when we call liquibase to make changes in the target database.

In order to make a particular changeset applicable to only a specific context, we have to modify the second line of the header adding context: <context_id> at the end of it.

So, in our example we modify the above files so they will look like this:

--liquibase formatted sql

--changeset igor.kolomiyets:create_schema
CREATE TABLE department (
    id integer PRIMARY KEY,
    name varchar(40) NOT NULL
);

CREATE TABLE employee (
    id integer PRIMARY KEY,
    department_id integer NOT NULL,
    first_name varchar(40),
    last_name varchar(40) NOT NULL
);

ALTER TABLE employee ADD FOREIGN KEY (department_id) REFERENCES department (id);

--liquibase formatted sql

--changeset igor,kolomiyets:test_data context:test_data
INSERT INTO department (id, name) VALUES (1, 'Marketing');
INSERT INTO department (id, name) VALUES (2, 'Sales');

INSERT INTO employee (id, department_id, first_name, last_name) VALUES (1, 1, 'John', 'Smith');
INSERT INTO employee (id, department_id, first_name, last_name) VALUES (2, 1, 'Diana', 'Lee');
INSERT INTO employee (id, department_id, first_name, last_name) VALUES (3, 2, 'Dave', 'Mead');
INSERT INTO employee (id, department_id, first_name, last_name) VALUES (4, 2, 'Jane', 'Kean');

In essence, the first script will always be applied since it does not specify context. The second script will be applied when test_data context is specified during liquibase update.

The last step is to create the Liquibase changelog file which references the above changesets. The changelog is the XML file that in our example will look like the following:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog 
                   http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.4.xsd">

    <include file="scripts/schema.sql" relativeToChangelogFile="true" />
    <include file="scripts/test-data.sql" relativeToChangelogFile="true" />
</databaseChangeLog>

Note, that the changesets will be applied in the order they are listed in the changelog file.

To finish it up there are two more files will be required: liquibase.properties file and the JDBC driver. Let’s say that we want to apply it to the PostgreSQL database so we need the driver that could be downloaded from here: https://jdbc.postgresql.org/.

Then, we need to create the liquibase.properties file with the following content:

changeLogFile=deploy.xml
classpath=postgresql-42.2.2.jar
driver=org.postgresql.Driver
url=jdbc:postgresql://localhost/test
username=test
password=Password1
contexts=init

This assumes that postgres JDBC driver is in the same directory. Bear in mind that database and user must be created before you can apply the changes using liquibase. Also note that the default context that is specified in the properties file is init. However, the only context that we defined was the test_data. It was done deliberately, so in this case liquibase will apply only those changesets that either have init context specified in the header or those that have no context specified at all.

In order to apply the changelog just run the following command in the command prompt:

/<path_to_liquibase>/liquibase update

Now let’s wrap this up into docker containers. We are going to use a multi-stage build so we can create two different images from one set of files: one to provide a developer’s database with the test data and another to use as they database update container for the other environments.

The Dockerfile for this setup will have three stages:

First stage deploy copies all necessary files to the image that will enable then running liquibase update in the container. The ancillary script that is used as the default entrypoint for this container will use environment variables to set up the liquibase.properties file to point to correct environment.

FROM postgres:13.1 AS deploy

ENV POSTGRES_DRIVER_VERSION 42.2.18
ENV LIQUIBASE_VERSION 4.2.0

RUN apt-get -y update \
 && apt-get -y install default-jre

ADD https://github.com/liquibase/liquibase/releases/download/v${LIQUIBASE_VERSION}/liquibase-${LIQUIBASE_VERSION}.tar.gz /liquibase/

RUN cd /liquibase \
 && gzip -dc liquibase-${LIQUIBASE_VERSION}.tar.gz | tar xvf - \
 && rm liquibase-${LIQUIBASE_VERSION}.tar.gz \
 && mkdir /app \
 && chown postgres /app

ADD entrypoint.sh /app/
ADD https://jdbc.postgresql.org/download/postgresql-${POSTGRES_DRIVER_VERSION}.jar /app/postgresql.jar
ADD scripts /app/scripts
ADD deploy.xml /app/deploy.xml

WORKDIR /app

RUN chmod 755 entrypoint.sh

ENTRYPOINT ["./entrypoint.sh"]

CMD ["/liquibase/liquibase", "update"]

Stage number two build uses the image created by deploy, starts database server, initialises database and user and applies the liquibase changesets to the database in the container. In this case, it uses test_data context, to preload data. Then it backs up database using pg_dumpall tool and stops the server.

FROM deploy AS build

ENV POSTGRES_PASSWORD Password1

ADD init.sh /app/
ADD liquibase.properties /app/
RUN chmod 755 /app/init.sh /app/postgresql.jar

USER postgres

RUN /app/init.sh postgres \
    && cd /app \
    && /liquibase/liquibase --contexts=test_data update \
    && pg_dumpall > /app/backup.sql \
    && pg_ctl stop

The third stage uses vanilla postgres image and backup file from the second stage. Then it starts server again and restores data from the backup.

FROM postgres:13.1

ENV POSTGRES_PASSWORD Password1

COPY --from=build /app/backup.sql /
ADD restore.sh /
RUN chmod 755 /restore.sh

USER postgres

RUN /restore.sh

USER root

RUN rm /restore.sh /backup.sql

USER postgres

It is important that when all the stages are invoked during build, the docker build command is running when DOCKER_BUILDKIT is enabled!! Otherwise, the data are not restored (for whatever reason)

We will use the above Dockerfile to build two images here:

one that we can use to apply database changes to other environments
another with the test database that we just can start and use immediately

To do this, first run the docker build stopping at the deploy stage:

docker build -t test-db-deployment --target deploy .

Then build the image using all three stages:

docker build -t test-db .

To apply the database changes to a specific environment run the test-db-deployment image. It requires three environment variables to be set:

URL — Target database URL
USERNAME and PASSWORD — Target Database credentials

Bear in mind that neither of these variables has default values, hence if they are not set container will just fail.

For example, you can run the database update using the following docker command (providing thar database host is db and database is testdb:

docker run --name database-update -e URL=jdbc:postgresql://db/testdb -e USERNAME=test -e PASSWORD=password test-db-deployment

Container starts, runs the liquibase update and quits. You can check the logs to confirm that update succeeded. Then just remove the container.

Test database can just be started, you just have to follow vendor instructions and provide necessary environment variables (if any of them are required). For example, to run postgres it is required to set POSTGRES_PASSWORD variable to set the password for the postgres user.

To start test database run the following command:

docker run --name test-database -e POSTGRES_PASSWORD=password -d test-db-deployment

What’s next?

The above example shows a handy technique to bring the database server to a specific state using disposable containers. They could be started manually, using scheduler or if you use Kubernetes then above image could be used to create a Job pod that will apply changeset to a particular instance of the database server.

The trick here is to know which image to use with which environment? This could be achieved using various tools. For example, using workflow artifacts in the GitHub Action or tools like artifactz.io. We will talk about various options another time.

Full source code is available in the GitHub Repository.