How-to: Containerize an application with Singularity | Computational Approaches to Biological Problems

Fri 19 October 2018
How-to
J. Lucas Boatwright
#bioinformatics, #education, #singularity, #containers

In this article I'm going to cover the benefits of using containerized applications. Specifically, I'm going to focus on using Singularity which is the more secure alternative to Docker (which many people have heard of).

Not only is Singularity more secure, it was designed with high-performance computing in mind. This means that containers you make locally may also be executed on your HPC system. This is generally not the case for Docker containers.

So, why should you use Singularity/containers?

Containers are for those of us that have ever:

1. installed a program only to find that there are a stream of dependency issues

2. tried to reproduce someone else's research from deprecated software

3. had to start up a virtual machine every time you need to run specialized software

4. wanted a portable environment

As a bioinformatician, all of these problems have occurred for me at some point. So, I highly recommend the use of containers for resolving all of these problems.

To start, you'll need to install Singularity. It's not complicated but it's also not a simple 'apt-get install' either. So, I recommend you follow their installation guide as this will vary depending on your operating system.

Singularity containers are built from a Singularity Recipe. These have the following format (slightly simplified):

*Header* - describes the core operating system

Sections - identified below

    %help - prints helpful informations

    %setup - commands executed on the host system outside of the container after the base OS has been installed

    %files -  copy files from your host system into the container (Files are copied before any %post commands are run)

    %labels - To store metadata with your container (stored at /.singularity.d/labels.json)

    %environment - environment variables which are sourced at runtime and not at build time

    %post - commands executed within the container after the base OS has been installed at build time

    %runscript - commands executed at runtime

    %test - commands run at the very end of the build process for build validation

Honestly, I rarely use all of them. You can largely get away with just using %post and %runscript along with the required Header.

Now, I realize that all of this information is available on Singularity's webpage and with more details. So, rather than reiterate, I'm going to provide some working examples here (and you can find more here and I also recommend BioContainers):

How to containerize a simple python package via pip:

Bootstrap: docker
From: python:2

%post
    apt-get update --fix-missing && apt-get install -y python-pip
    pip install multiqc

%runscript
    exec multiqc "$@"

How to containerize a GitHub application:

Bootstrap: docker
From: debian:latest

%post
    apt-get update --fix-missing && apt-get install -y git make g++ libz-dev
    git clone https://github.com/gpertea/stringtie
    cd stringtie
    make release

%runscript
    exec /stringtie/stringtie "$@"

How to containerize a miniconda environment (environment.yml can be from an existing environment or made from scratch):

Bootstrap: docker
From: continuumio/miniconda3

%files
    environment.yml

%environment
    PATH=/opt/conda/envs/$(head -1 environment.yml | cut -d' ' -f2)/bin:$PATH

%post
    echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
    echo "source activate $(head -1 environment.yml | cut -d' ' -f2)" > ~/.bashrc
    /opt/conda/bin/conda env create -f environment.yml

%runscript
    exec "$@"

The way I set up my containers, they're typically run similar to how the installed program should run. For example:

# Print the help information for FASTQC
singularity run fastqc.simg -h

One may also put executables on the PATH variable (as in my conda example above) or use the Singularity %apps option to have multiple subcommands (as in the bioscripts example under the SingularityRecipes repo on my GitHub page.

One last recommendation, one program per container is much more modular for long-term maintenance. I hope this helps get you started with your own Singularity containers!