Skip to the content.

Bio::ToolBox

Home Install Libraries Applications Examples FAQ

get_binned_data.pl

A program to collect data in bins across a list of features.

SYNOPSIS

get_binned_data.pl [--options] --in <filename> --out <filename>
 
get_binned_data.pl [--options] -i <filename> <data1> <data2...>
 
 Options for data files:
 -i --in <filename>                  input file: txt bed gff gtf refFlat ucsc
 -o --out <filename>                 optional output file, default overwrite 
 
 Options for new files:
 -d --db <name>                      annotation database: mysql sqlite
 -f --feature <type>                 one or more feature types from db or gff
 
 Options for data collection:
 -D --ddb <name|file>                data or BigWigSet database
 -a --data <dataset|filename>        data from which to collect: bw bam etc
 -m --method [mean|median|stddev|    statistical method for collecting data
       min|max|range|sum|count|      default mean
       pcount|ncount]
 -t --strand [all|sense|antisense]   strand of data relative to feature (all)
 -u --subfeature [exon|cds|          collect over gene subfeatures 
       5p_utr|3p_utr] 
 --long                              collect each window independently
 -r --format <integer>               number of decimal places for numbers
 
 Bin specification:
 -b --bins <integer>                 number of bins feature is divided (10)
 -x --ext <integer>                  number of extended bind outside feature
 -X --extsize <integer>              size of extended bins
 --min <integer>                     minimum size of feature to divide
 
 Post-processing:
 -U --sum                            generate summary file
 --smooth                            smoothen sparse data
 
 General options:
 -g --groups                         write columns group index file for plotting
 -z --gz                             compress output file
 -c --cpu <integer>                  number of threads, default 4
 --noparse                           do not parse input file into SeqFeatures
 -v --version                        print version and exit
 -h --help                           show extended documentation

OPTIONS

The command line flags and descriptions:

Options for data files

Options for new files

Options for data collection

Bin specification

Post-processing

General options

DESCRIPTION

This program will collect data across a gene or feature body into numerous percentile bins. It is used to determine if there is a spatial distribution preference for the dataset over gene bodies. The number of bins may be specified as a command argument (default 10). Additionally, extra bins may be extended on either side of the gene (default 0 on either side). The bin size is determined as a percentage of gene length.

EXAMPLES

These are some examples of some common scenarios for collecting data.

AUTHOR

Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.