Skip to the content.

Bio::ToolBox - bam2wig

Home Install Libraries Applications Examples FAQ

bam2wig.pl

A program to convert Bam alignments into a wig representation file.

SYNOPSIS

bam2wig.pl [–options…] <file.bam>

bam2wig.pl –extend –rpm –mean –out file –bw file1.bam file2.bam

Required options:
 -i --in <filename.bam>        repeat if multiple bams, or comma-delimited list

Reporting options (pick one):
 -s --start                    record at 5' position
 -d --mid                      record at midpoint of alignment or pair
 -a --span                     record across entire alignment or pair
 -e --extend                   extend alignment (record predicted fragment)
 --cspan                       record a span centered on midpoint
 --smartcov                    record paired coverage without overlaps, splices
 --ends                        record paired endpoints
 --coverage                    raw alignment coverage

Alignment reporting options:
 -l --splice                   split alignment at N splices
 -t --strand                   record separate strands as two wig files
 --flip                        flip the strands for convenience
 
Paired-end alignments:
 -p --pe                       process paired-end alignments, both are checked
 -P --fastpe                   process paired-end alignments, only F are checked
 --minsize <integer>           minimum allowed insertion size (30)
 --maxsize <integer>           maximum allowed insertion size (600)
 --first                       only process paired first read (0x40) as single-end
 --second                      only process paired second read (0x80) as single-end
 
Alignment filtering options:
 -K --chrskip <regex>          regular expression to skip chromosomes
 -B --blacklist <file>         interval file of regions to skip (bed, gff, txt)
 -q --qual <integer>           minimum mapping quality (0)          
 -S --nosecondary              ignore secondary (0x100) alignments (false)
 -D --noduplicate              ignore duplicate (0x400) alignments (false)
 -U --nosupplementary          ignore supplementary (0x800) alignments (false)
 --intron <integer>            maximum allowed gap (intron) size in bp (none)
 
 Shift options:
 -I --shift                    shift reads in the 3' direction
 -x --extval <integer>         explicit extension size in bp (default is to calculate)
 -H --shiftval <integer>       explicit shift value in bp (default is to calculate) 
 --chrom <integer>             number of chromosomes to sample (4)
 --minr <float>                minimum pearson correlation to calculate shift (0.5)
 --zmin <float>                minimum z-score from average to test peak for shift (3)
 --zmax <float>                maximum z-score from average to test peak for shift (10)
 -M --model                    write peak shift model file for graphing
 
Score options:
 -r --rpm                      scale depth to Reads Per Million mapped
 -m --mean                     average multiple bams (default is addition)
 --scale <float>               explicit scaling factor, repeat for each bam file
 --fraction                    assign fractional counts to multi-mapped alignments
 --splfrac                     assign fractional count to each spliced segment
 --format <integer>            number of decimal positions (4)
 --chrnorm <float>             use chromosome-specific normalization factor
 --chrapply <regex>            regular expression to apply chromosome-specific factor

Wig format:
 --bin <integer>               bin size for span or extend mode (10)
 --bdg                         bedGraph, default for span and extend at bin 1
 --fix                         fixedStep, default for bin > 1
 --var                         varStep, default for start, mid
 --nozero                      do not write zero score intervals in bedGraph
 
Output options:
 -o --out <filename>           output file name, default is bam file basename
 -b --bw                       convert to bigWig format (supports bdg, fix, var)
 --bwapp /path/to/wigToBigWig  path to external converter (default searches \$PATH)
 -z --gz                       gzip compress text output 
 
General options:
 -c --cpu <integer>            number of parallel processes (4)
 --temp <directory>            directory to write temporary files (output path)
 -V --verbose                  report additional information
 -v --version                  print version information
 -h --help                     show full documentation

OPTIONS

The command line flags and descriptions:

Input

Reporting Options

Alignment reporting options

Paired-end alignments

Alignment filtering options:

Shift options

Score Options

Wig format

Output Options

General options

DESCRIPTION

This program will enumerate aligned sequence tags and generate a wig, or optionally BigWig, file. Alignments may be counted and recorded in several different ways. Strict enumeration may be performed and recorded at either the alignment’s start or midpoint position. Alternatively, either the alignment or fragment may be recorded across its span. Finally, a basic unstranded, unshifted, and non-transformed alignment coverage may be generated.

Both paired-end and single-end alignments may be counted. Alignments with splices (e.g. RNA-Seq) may be counted singly or separately. Alignment counts may be separated by strand, facilitating analysis of RNA-Seq experiments.

For ChIP-Seq experiments, the alignment position may be shifted in the 3 prime direction. This effectively merges the separate peaks (representing the ends of the enriched fragments) on each strand into a single peak centered over the target locus. Alternatively, the entire predicted fragment may be recorded across its span. This extended method of recording infers the mean size of the library fragments, thereby emulating the coverage of paired-end sequencing using single-end sequence data. The shift value is empirically determined from the sequencing data or provided by the user. If requested, the shift model profile may be written to file.

The output wig file may be either a variableStep, fixedStep, or bedGraph format. The wig file may be further converted into a compressed, indexed, binary bigWig format, dependent on the availability of the appropriate conversion utilities.

The type of wig file to generate for your Bam sequencing file can vary depending on your particular experimental application. Here are a few common sequencing applications and my recommended settings for generating the wig or bigWig file.

TEXT REPRESENTATION OF RECORDING ALIGNMENTS

To help users visualize how this program records alignments in a wig file, drawn below are 10 alignments, five forward and five reverse. They may be interpreted as either single-end or paired-end. Drawn below are the numbers that would be recorded in a wig file for various parameter settings. Note that alignments are not drawn to scale and are drawn for visualization purposes only. Values of X represent 10.

AUTHOR

Timothy J. Parnell, PhD
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.