Bio::ToolBox - get_feature_info
Home | Install | Libraries | Applications | Examples | FAQ |
get_feature_info.pl
A program to collect feature information from a BioPerl SeqFeature::Store db.
SYNOPSIS
get_feature_info.pl <filename>
File options:
-i --in <filename> input file of list db features
-o --out <filename> optional output file
Database options:
-d --db <name> annotation database: mysql sqlite
or annotation file: gtf gff ucsc
-a --attrib <attribute1,attribute2,...> list of attributes to collect
-t --type <primary_tag> specify a feature type as needed
General options:
-z --gz compress output file
-v --version print version and exit
-h --help show extended documentation
Attributes include:
Chromosome
Start
Stop
Strand
Score
Name
Alias
Note
Type
Primary_tag
Source
Length
Midpoint
Phase
RNA_count (number of transcript subfeatures)
Exon_count (number of exon subfeatures)
Gene_length (sum of all merged, collapsed, transcript exon lengths)
Transcript_length (sum of exon lengths)
Parent (name)
Primary_ID
<tag>
OPTIONS
The command line flags and descriptions:
File options
-
–in <filename>
Specify an input file containing either a list of database features or genomic coordinates for which to collect data. The file should be a tab-delimited text file, one row per feature, with columns representing feature identifiers, attributes, coordinates, and/or data values. The first row should be column headers. Text files generated by other BioToolBox scripts are acceptable. Files may be gzipped compressed.
-
–out <filename>
Optionally specify an alternate output file name. The default is to overwrite the input file.
Database options
-
–db <name>
Specify the name or SQLite file of a Bio::DB::SeqFeature::Store annotation database from which the information may be derived. This may be stored in the metadata comments of the input file, or an alternative file may be provided.
Alternatively, specify an annotation file, e.g. GTF, GFF3, or UCSC gene table, that may be parsed into memory.
The input file should include Name or ID columns that match features in the provided database. If a Type column is not present, then a type should be provided with the
--type
option. Note that mixing and matching files and
databases may not always work as well as intended. -
–attrib <attribute>
Specify the attribute to collect for each feature. Standard GFF attributes may be collected, as well as values from specific group tags. These tags are found in the group (ninth) column of the source GFF file. Standard attributes include the following
- Chromosome
- Start
- Stop
- Strand
- Score
- Name
- Alias
- Note
- Type
- Primary_tag
- Source
- Length
- Midpoint
- Phase
- RNA_count (number of transcript subfeatures)
- Exon_count (number of exon subfeatures)
- Gene_length (sum of all merged, collapsed, transcript exon lengths)
- Transcript_length (sum of exon lengths)
- Parent (name)
- Primary_ID
- <tag>
If attrib is not specified on the command line, then an interactive list will be presented to the user for selection. Especially useful when you can’t remember the feature’s tag keys in the database.
-
–type <primary_tag>
When the input file does not have a type column, a type or primary_tag may be provided. This is especially useful to restrict the database search when there are multiple features with the same name.
General options
-
–gz
Indicate whether the output file should (not) be compressed by gzip. If compressed, the extension ‘.gz’ is appended to the filename. If a compressed file is opened, the compression status is preserved unless specified otherwise.
-
–version
Print the version number.
-
–help
Display this help.
DESCRIPTION
This program will collect attributes for a list of features from the database. The attributes may be general attributes, such as chromsome, start, stop, strand, etc., or feature specific attributes stored in the original group field of the original source GFF file.
AUTHOR
Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.