Skip to the content.

Bio::ToolBox - data2bed

Home Install Libraries Applications Examples FAQ

data2bed.pl

A program to convert a data file to a bed file.

SYNOPSIS

data2bed.pl [–options…] <filename>

File Options:
-i --in <filename>                    input file: txt, gff, vcf, etc
-o --out <filename>                   output file name
-H --noheader                         input file has no header row
-0 --zero                             file is in 0-based coordinate system

Column indices:
--bed [3|4|5|6]                       type of bed to write
-a --ask                              interactive selection of columns
-c --chr <index>                      chromosome column
-b --begin --start <index>            start coordinate column
-e --end --stop <index>               stop coordinate column
-n --name <text | index>              name column or base name text
-s --score <index>                    score column
-t --strand <index>                   strand column

BigBed options:
-B --bb --bigbed                      generate a bigBed file
-d --db <database>                    database to collect chromosome lengths
--chromof <filename>                  specify a chromosome file
--bwapp </path/to/bedToBigBed>        specify path to bedToBigBed

General Options:
--sort                                sort output by genomic coordinates
-z --gz                               compress output file
-Z --bgz                              bgzip compress output file
-v --version                          print version and exit
-h --help                             show extended documentation

OPTIONS

The command line flags and descriptions:

File Options

Column indices

BigBed options

General options

DESCRIPTION

This program will convert a tab-delimited data file into a BED file, according to the specifications here http://genome.ucsc.edu/goldenPath/help/customTrack.html#BED. A minimum of three and a maximum of six columns may be generated. Thin and thick block data (columns greater than 6) are not written.

Column identification may be specified on the command line, chosen interactively, or automatically determined from the column headers. GFF source files should have columns automatically identified.

All lower-numbered columns must be defined before writing higher-numbered columns, as per the specification. Dummy data may be filled in for Name and/or Score if a higher column is requested.

Browser and Track lines are not written.

Following specification, all coordinates are written in interbase (0-based) coordinates. Base (1-based) coordinates (the BioPerl standard) will be converted.

Score values should be integers within the range 1..1000. Score values are not converted in this script. However, the biotoolbox script manipulate_datasets.pl has tools to do this if required.

An option exists to further convert the BED file to an indexed, binary BigBed format. Jim Kent’s bedToBigBed conversion utility must be available, and either a chromosome definition file or access to a Bio::DB database is required.

AUTHOR

Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.