- Fri 19 October 2018
- How-to
- J. Lucas Boatwright
- #bioinformatics, #education, #singularity, #containers
In this article I'm going to cover the benefits of using containerized applications. Specifically, I'm going to focus on using Singularity which is the more secure alternative to Docker (which many people have heard of).
Not only is Singularity more secure, it was designed with high-performance computing in mind. This means that containers you make locally may also be executed on your HPC system. This is generally not the case for Docker containers.
So, why should you use Singularity/containers?
Containers are for those of us that have ever:
1. installed a program only to find that there are a stream of dependency issues
2. tried to reproduce someone else's research from deprecated software
3. had to start up a virtual machine every time you need to run specialized software
4. wanted a portable environment
As a bioinformatician, all of these problems have occurred for me at some point. So, I highly recommend the use of containers for resolving all of these problems.
To start, you'll need to install Singularity. It's not complicated but it's also not a simple 'apt-get install' either. So, I recommend you follow their installation guide as this will vary depending on your operating system.
Singularity containers are built from a Singularity Recipe. These have the following format (slightly simplified):
*Header* - describes the core operating system
Sections - identified below
%help - prints helpful informations
%setup - commands executed on the host system outside of the container after the base OS has been installed
%files - copy files from your host system into the container (Files are copied before any %post commands are run)
%labels - To store metadata with your container (stored at /.singularity.d/labels.json)
%environment - environment variables which are sourced at runtime and not at build time
%post - commands executed within the container after the base OS has been installed at build time
%runscript - commands executed at runtime
%test - commands run at the very end of the build process for build validation
Honestly, I rarely use all of them. You can largely get away with just using %post and %runscript along with the required Header.
Now, I realize that all of this information is available on Singularity's webpage and with more details. So, rather than reiterate, I'm going to provide some working examples here (and you can find more here and I also recommend BioContainers):
How to containerize a simple python package via pip:
Bootstrap: docker
From: python:2
%post
apt-get update --fix-missing && apt-get install -y python-pip
pip install multiqc
%runscript
exec multiqc "$@"
How to containerize a GitHub application:
Bootstrap: docker
From: debian:latest
%post
apt-get update --fix-missing && apt-get install -y git make g++ libz-dev
git clone https://github.com/gpertea/stringtie
cd stringtie
make release
%runscript
exec /stringtie/stringtie "$@"
How to containerize a miniconda environment (environment.yml can be from an existing environment or made from scratch):
Bootstrap: docker
From: continuumio/miniconda3
%files
environment.yml
%environment
PATH=/opt/conda/envs/$(head -1 environment.yml | cut -d' ' -f2)/bin:$PATH
%post
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
echo "source activate $(head -1 environment.yml | cut -d' ' -f2)" > ~/.bashrc
/opt/conda/bin/conda env create -f environment.yml
%runscript
exec "$@"
The way I set up my containers, they're typically run similar to how the installed program should run. For example:
# Print the help information for FASTQC
singularity run fastqc.simg -h
One may also put executables on the PATH variable (as in my conda example above) or use the Singularity %apps option to have multiple subcommands (as in the bioscripts example under the SingularityRecipes repo on my GitHub page.
One last recommendation, one program per container is much more modular for long-term maintenance. I hope this helps get you started with your own Singularity containers!