Skip to the content.

Bio::ToolBox

Home Install Libraries Applications Examples FAQ

get_relative_data.pl

A program to collect data in bins around a relative position.

SYNOPSIS

get_relative_data.pl [–options] –in <filename> –out <filename>

get_relative_data.pl [–options] -i <filename> <data1> <data2…>

Options for data files:
-i --in <filename>                  input file: txt bed gff gtf refFlat ucsc
-o --out <filename>                 optional output file, default overwrite 

Options for new files
-d --db <name>                      annotation database: mysql sqlite
-f --feature <type>                 one or more feature types from db or gff

Options for data collection:
-D --ddb <name|file>                data or BigWigSet database
-a --data <dataset|filename>        data from which to collect: bw bam etc
-m --method [mean|median|stddev|    statistical method for collecting data
          min|max|range|sum|count|   default mean
          pcount|ncount]
-t --strand [all|sense|antisense]   strand of data relative to feature (all)
--force_strand                      use the specified strand in input file
--avoid                             avoid neighboring features
--avtype [type,type,...]            alternative types of feature to avoid
--long                              collect each window independently
-r --format <integer>               number of decimal places for numbers

Bin specification:
-w --win <integer>                  size of windows, default 50 bp
-n --num <integer>                  number of windows flanking reference, 20
--up <integer>                        or number of windows upstream
--down <integer>                      and number of windows downstream
-p --pos [5|m|3|p]                  reference position, default 5'

Post-processing:
-U --sum                            generate summary file
--smooth                            smoothen sparse data

General Options:
-g --groups                         write columns group index file for plotting
-z --gz                             compress output file
-c --cpu <integer>                  number of threads, default 4
--noparse                           do not parse input file into SeqFeatures
-v --version                        print version and exit
-h --help                           show extended documentation

OPTIONS

The command line flags and descriptions:

Options for data files

Options for new files

Options for data collection

Bin specification

Post-processing

General options

DESCRIPTION

This program will collect data around a relative coordinate of a genomic feature or region. The data is collected in a series of windows flanking the feature start (5’ position for stranded features), end (3’ position), or the midpoint position. The number and size of windows are specified via command line arguments, or the program will default to 20 windows on both sides of the relative position (40 total) of 50 bp size, corresponding to 2 kb total (+/- 1 kb). Windows without a value may be interpolated (smoothed) from neigboring values, if available.

Stranded data may be collected. If the feature does not have an inherent strand, one may be specified to enforce stranded collection or a particular orientation.

When features overlap, or the collection windows of one feature overlaps with another feature, then data may be ignored and not collected (–avoid).

EXAMPLES

These are some examples of some common scenarios for collecting data.

AUTHOR

Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.