Skip to the content.

Bio::ToolBox

Home Install Libraries Applications Examples FAQ

get_features.pl

A program to collect and filter annotated features from source files.

SYNOPSIS

get_features.pl –in <filename> –out <filename>

get_features.pl –db <name> –out <filename>

Source data:
-d --db <name | filename>     database: name, file.db, or file.sqlite
-i --in <filename>            input annotation: GFF3, GTF, genePred, etc

Selection:
-f --feature <type>           feature: gene, mRNA, transcript, etc
-u --sub                      include subfeatures (true if gff, gtf, refFlat)

Filter features:
-l --list <filename>          file of feature IDs to keep
-K --chrskip <regex>          skip features from certain chromosomes
-x --exclude <tag=value>      exclude features with specific attribute value
-n --include <tag=value>      include features with specific attribute value
--biotype <regex>             include only specific transcript biotype
--tsl [best|best1|best2|      specify minimum transcript support level 
       best3|best4|best5|       primarily Ensembl annotation 
       1|2|3|4|5|NA]  
--gencode                     include only GENCODE tagged genes

Adjustments:
-b --start=<integer>          modify start positions
-e --stop=<integer>           modify stop positions
-p --pos [5|m|3|53|p]         relative position from which to modify
--collapse                    collapse subfeatures from alt transcripts

Report format options:
-B --bed                      write BED6 (no --sub) or BED12 (--sub) format
-G --gff                      write GFF3 format
-g --gtf                      write GTF format
-r --refflat                  write UCSC refFlat format
-t --tag <text>               include specific GFF attributes in text output
--coord                       include coordinates in text output

General options:
-o --out <filename>           output file name
--sort                        sort output by genomic coordinates
-z --gz                       compress output
-Z --bgz                      bgzip compress output
-v --version                  print version and exit
-h --help                     show full documentation

OPTIONS

The command line flags and descriptions:

Source data

Selection

Filter features

Adjustments

Report format options

General options

DESCRIPTION

This program will extract a list of features from a database or input annotation file and write them out to a file. Features may be selected using their feature type (the 3rd column in a GFF or GTF file). When selecting features from a database, types may be selected interactively from a presented list. Features may be filtered based on various GFF attributes typically found in Ensembl annotation, including transcript_biotype, transcript_support_level, and GENCODE basic tags. Features may also be filtered by chromosome.

Collected features may be written to a variety of formats, including GFF3, GTF, refFlat, simple 6-column BED, or a simple text format. With GFF, GTF, and refFlat formats, subfeatures such as exons are automatically included (which may also be turned off). With a simple text format, the source database or parsed input file are recorded in the header metadata for use in subsequent programs. Coordinates may be optionally included in the text file, which preempts using parsed features in other tools.

Coordinate adjustments

Coordinates of the features may be adjusted as desired when writing to text or BED file formats. Adjustments may be made relative to either the 5 prime, 3 prime, both ends, or the feature midpoint. Positions are based on the feature strand. Use the following examples as a guide.

Note that positions are always in base coordinates, and the resulting regions may be 1 bp longer depending on whether the reference base was included or not.

AUTHOR

Timothy J. Parnell, PhD
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.