Skip to the content.

Bio::ToolBox

Home Install Libraries Applications Examples FAQ

data2wig.pl

A program to convert a generic data file into a wig file.

SYNOPSIS

data2wig.pl [–options…] <filename>

File options:
-i --in <filename>                    input file: txt, gff, bed, vcf, etc
-o --out <filename>                   output file name
-H --noheader                         input file has no header row
-0 --zero                             file is in 0-based coordinate system

Column indices:
-a --ask                              interactive selection of columns
-s --score <index>                    score column, may be comma list
-c --chr <index>                      chromosome column
-b --begin --start <index>            start coordinate column
-e --end --stop <index>               stop coordinate column
--attrib <name>                       GFF or VCF attribute name of score

Wig options:
-p --step [fixed|variable|bed]        type of wig file 
--bed --bdg                           alternative shortcut for bedGraph
--size <integer>                      step size for fixedStep
--span <integer>                      span size for fixed and variable

Conversion options:
-f --fast                             fast mode, no error checking
--name <text>                         optional track name
--(no)track                           generate a track line
--mid                                 use the midpoint of feature intervals
--format <integer>                    format decimal points of scores
-m --method [mean | median | sum | max] combine multiple score columns

BigWig options:
-B  --bw --bigwig                     generate a bigWig file
-d --db <database>                    database to collect chromosome lengths
--chromof <filename>                  specify a chromosome file
--bwapp </path/to/wigToBigWig>        specify path to wigToBigWig

General options:
-z --gz                               compress output text files
-v --version                          print version and exit
-h --help                             show extended documentation

OPTIONS

The command line flags and descriptions:

File options

Column indices

Wig options

Conversion options

BigWig options

General options

DESCRIPTION

This program will convert any tab-delimited data text file into a wiggle formatted text file. This requires that the file contains not only the scores bu also chromosomal coordinates, i.e. chromosome, start, and (optionally) stop. The program should automatically detect these columns (if appropriately labeled) or they can be specified. An option exists to use the midpoint of a region, e.g. microarray probe.

The wig file format is specified by documentation supporting the UCSC Genome Browser and detailed here: http://genome.ucsc.edu/goldenPath/help/wiggle.html. Three formats are supported, ‘fixedStep’, ‘variableStep’, and ‘bedGraph’. The format may be requested or determined empirically from the input file metadata. Genomic bin files generated with BioToolBox scripts record the window and step values in the metadata, which are used to determine the span and step wig values, respectively. The variableStep format is otherwise generated by default. The span is, by default, 1 bp.

Wiggle files cannot tolerate multiple datapoints at the same identical position, e.g. multiple microarray probes matching a repetitive sequence. An option exists to mathematically combine these positions into one value.

Strand is not inherently supported in wig files. If you have stranded data, they should be split into separate files. The BioToolBox script split_data_file.pl can be used for this purpose.

A binary BigWig file may also be further generated from the
text wiggle file. The binary format is preferential to the text version for a variety of reasons, including fast, random access and no loss in data value precision. More information can be found at this location: http://genome.ucsc.edu/goldenPath/help/bigWig.html. Conversion requires BigWig file support, supplied by the external wigToBigWig or bedGraphToBigWig utility available from UCSC.

AUTHOR

Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.