Skip to the content.

Bio::ToolBox - Installation

Home Install Libraries Applications Examples FAQ

Advanced Installation

This is an advanced installation guide for getting a complete installation.

TLDR Brief guide

This is a no-nonsense, quick guide for those who already know what they’re doing on an established Linux system with a modern Perl installation, and know how to adjust accordingly for their system. If that doesn’t describe you, skip ahead to the Detailed guide.

Detailed guide

This assumes installation on a Linux work station with available standard compilation tools. Installation on MacOS (x86_64) is also possible with Xcode Command Line Tools installed; see see MacOS Notes for additional guidance.

Perl installations and locations

As a Perl package, BioToolBox needs to be installed under a Perl installation. It is not dependent on a specific Perl release version, although later releases (5.16 or newer) are preferred. Nearly every unix-like OS (Linux, MacOS) includes a system Perl installation. If not, one can be installed, either from the OS package manager or from source.

Follow one of these options.

Home library

When you want to use the system-installed Perl (often /usr/bin/perl), but do not have write permissions to the system, you can install packages in your home directory. To do this, you should first install local::lib, which sets up a perl5 directory for local (home) module installations. The path is set appropriately by adding a statement to your home .profile or other equivalent file as described in the documentation. This can also be used for targeted, standalone installations; adjust accordingly. For example, the following command will install local::lib and the CPAN Minus application

curl -L https://cpanmin.us | perl - -l $HOME/perl5 local::lib App::cpanminus \
&& echo 'eval "$(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)"' >> ~/.profile \
&& . ~/.profile

Custom installation

When the system Perl is old (because many vendor OS Perl installations are sadly out of date), or you want or need to install a newer, modern Perl, but cannot or do not want to overwrite the system Perl, then you can and should install a new Perl version. This can be installed anywhere you have read/write access, including your home directory or wherever. While a new Perl version can be manually downloaded and installed from the main Perl site, there are easier ways.

An alternate package manager may be used to install a Perl version in a generally available location. For example, MacOS users can easily install a modern Perl using Homebrew. Similarly, Linux (and evidently Microsoft Windows Subsystem for Linux) users can use Linuxbrew. These typically install the latest production release with a single command.

To install a Perl in your home directory (or other location) with a simple, but powerful, tool, use the excellent PerlBrew. This tool can painlessly compile, install, and manage one or more Perl release versions side-by-side, allowing you to easily switch between releases with a simple command. It also manages multiple local::lib installations, in case you want to isolate packages.

BioToolBox does not utilize threading (it uses forks for parallel execution), so if you have a choice, compile a non-threaded Perl for a slight performance gain.

System installation

For privileged installations (requiring root access or sudo privilege) you probably already know what to do. You can use the --sudo or -S option to cpanm. Note that installing lots of packages in the OS vendor system perl is generally not recommended, as it could interfere with other vital OS functions or programs that expect certain versions or modules to be present. It’s best to use one of the other two methods.

External libraries

There are two external C libraries that are required for reading Bam and BigWig files. These are commonly used bioinformatics tools maintained by separate organizations, and the Perl modules only provide the XS bindings to these libraries. As such, it’s best to install these up front separately before attempting the Perl module installation. Note that both Perl modules Bio::DB::HTS and Bio::DB::Big include INSTALL.pl scripts within their bundles that can compile these external libraries for you in a semi-automated fashion. Proceed here if you wish to have more control over what and where these are installed.

Perl modules

Using a simple CPAN package installer such as CPAN Minus, i.e. cpanm, is highly recommended for ease and simplicity in installing modules from CPAN. It can install directly from CPAN or take a URL or downloaded archive file. Follow the link to find out how to install cpanm. Other CPAN package managers are available too, if that’s your preference.

The following Perl packages should be explicitly installed. Most of these will bring along a number of dependencies (which in turn bring along more dependencies). In the end you will have installed dozens of packages.

An example of installing these Perl modules with cpanm is below. This assumes that you have local::lib or a writable Perl installation in your $PATH. Adjust accordingly.

curl -O -L https://github.com/tjparnell/bioperl-live/releases/download/minimal-v1.7.8/Minimal-BioPerl-1.7.8.tar.gz
cpanm Minimal-BioPerl-1.7.8.tar.gz
cpanm --configure-args="--htslib /usr/local" Bio::DB::HTS
cpanm Template::Tiny
curl -o bio-db-big-master.tar.gz -L https://github.com/Ensembl/Bio-DB-Big/tarball/master
cpanm --configure-args="--libbigwig /usr/local" bio-db-big-master.tar.gz
cpanm Parallel::ForkManager Set::IntervalTree Set::IntSpan::Fast Bio::ToolBox

External applications

Some programs, for example bam2wig.pl and data2wig, requires external utilities for converting text formats to binary formats, for example wig files to bigWig. External utilities are preferred because they’re more efficient and spread the load on modern multi-CPU environments. You may download these from the UCSC Genome Browser utilities section for either Linux or macOS. Copy them to your bin directory in your PATH, for example $HOME/bin, $HOME/perl5/bin, or /usr/local/bin. Be sure to make them executable by running chmod +x on each file.

An example for downloading on Linux:

for name in wigToBigWig bedGraphToBigWig bigWigToWig bigWigToBedGraph bedToBigBed bigBedToBed; \
do curl -o $HOME/bin/$name http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/$name \
&& chmod +x $HOME/bin/$name; done;

NOTE Current versions of these utilities do not support directly piping data into the utility using the stdin file name. You will need to either find an older binary version, compile your own from older source code (see below), or update BioToolBox; version 2.02 now supports a work-around.

Legacy Perl modules

These are additional legacy Perl modules that are supported (for example, if you still have a GBrowse installation), but are either not required or have been superseded by other modules.

Some notes are below for anyone who may need to install these.

Database support

The Bio::DB::SeqFeature::Store is a convenient SeqFeature annotation database backed by a SQL engine. It used to be part of the BioPerl distribution prior to release 1.7.3, but is now split into its own distribution. If you wish to use annotation databases, you will need a SQL driver, such as DBD::SQLite (recommended for individuals) or DBD::mysql (for fancy multi-user installations).

Sam library

The Bio::DB::Sam library only works with the legacy Samtools version, which included both the C libraries, headers, and executables; use version 0.1.19 for best results. You will need to compile the Samtools code, but you do not have to install it (the library is not linked). Before compiling, edit the Makefile to include the cflags -fPIC and (most likely) -m64 for 64 bit OS. Export the SAMTOOLS environment variable to the path of the Samtools build directory, and then you can proceed to build the Perl module; it should find the necessary files using the SAMTOOLS environment variable. You may obtain the latest source from Github or by downloading a tarball. Note that this project and file contains multiple Perl adapters and cannot be used directly with cpanm, for example.

UCSC BigFile library

The Bio::DB::BigWig and Bio::DB::BigBed modules are part of the same distribution, Bio-BigFile. Only use the code from the GitHub repository, as it should be compatible with recent UCSC libraries, whereas the distribution on CPAN is out of date.

NOTE The UCSC library, when it encounters an error, will immediately terminate the Perl process, with no chance of trapping the error. The newer libBigWig C library used with Bio::DB::Big (detailed above) does not exhibit this behavior, plus it’s considerably easier to install. Encountering errors rarely happens, however, because all bioinformatic data is always perfectly formatted and well behaved, right?

You will need the UCSC source code; the userApps source code is sufficient, rather than the entire browser code. Versions 375 and 398, at the time of this writing, works successfully, but more recent versions appear to have increasing problems with successful compilation – YMMV.

NOTE If you are compiling the command line utilities, such as wigToBigWig, be aware that in version 439 and later, these utilities no longer accept stdin as a file input. The Bio::ToolBox::big_helper module uses this feature for convenience in applications such bam2wig. You can compile your own following these steps, but you do not need to install Bio::DB::BigWig.

This requires at least OpenSSL and libpng libraries to compile the required library. For the command line utilities, if desired, you will also need MySQL libraries; MariaDB, for now, seems adequate as far as I can tell.

On Linux, this is mostly not a problem as these libraries and development files are readily available through the package manager. If you’re building on macOS, see the notes in the macOS notes page.

For purposes of installing the Perl adapter, only the library needs to be compiled. It does not need to be installed, as nothing is linked. So, you do not need to run the full make command. In the userApps folder, run

cd path/to/userApps
make installEnvironment

This will generate kent/src/inc/localEnvironment.mk for your local machine. Edit this file to add at the end

CFLAGS = -fPIC

If you have odd or non-standard locations for some libraries, for example in a computing cluster where the development files are brought in using environment modules, you may be able to set additional paths in this localEnvironment.mk file or by directly hacking kent/src/inc/common.mk. NOTE Be careful about setting a generic path to the libraries, particularly if you also have htslib installed, since the UCSC userApp provides its own (presumably modified) htslib library, which will conflict with a system available library.

To proceed with library compilation, follow the following steps. Note exporting environment variables which aid in building the Perl adapter.

cd path/to/kent/src
export KENT_SRC=$(PWD)
cd htslib
make
cd ../lib
make
cd

To install the Perl adapter, download the tarball from Github (same source as Bio::DB::Sam above) and follow the steps below.

curl -o GBrowse-Adaptors.tar.gz -L https://github.com/GMOD/GBrowse-Adaptors/tarball/master
tar -xf GBrowse-Adaptors.tar.gz
cd GBrowse-Adaptors-master-85c29de/Bio-BigFile
export MACHTYPE=local
perl Build.PL
./Build
./Build test
./Build install

If the environment variables have been set correctly and the library compiled successfully, then the Perl build process should proceed smoothly. There may be various warnings emitted during the build process, which can usually be ignored.

Once you have compiled the main kent/src/lib library, you can proceed with compiling the command line utilities, if you desire. You can build just the ones you want by going into each utility subdirectory within kent/src/utils/ and issuing make. For example:

cd /path/to/userApps
mkdir bin
cd kent/src/utils/wigToBigWig
make

The compiled executable should be copied into userApps/bin, and you can move it from there to wherever.