Skip to the content.

Bio::ToolBox

Home Install Libraries Applications Examples FAQ

pull_features.pl

A program to pull out a specific list of data rows from a data file.

SYNOPSIS

pull_features.pl –data <filename> –list <filename> –out <filename>

File options:
-d --data <filename>          Source of all data rows or features
-l --list <filename>          List of specific row or feature names
-o --out <filename>           Output file, or basename for group files

Column index options:
-x --dindex <index>           Data column index of row name to lookup
-X --lindex <index>           List column index of row name to lookup
-g --gindex <index>           Group column index for lookup

Output options:
-r --order [list | data]      Order of items in output based on
-U --sum                      Generate a summary file
--sumonly                     Skip output, just make a summary file
--start <integer>             First data column to make a summary file
--stopi <integer>             Last data column to make a summary file
--log                         Summarized data is in log2 space

General options:
-v --version                  print version and exit
-h --help                     show full documentation

OPTIONS

The command line flags and descriptions:

File options

Column index options

Output options

General options

DESCRIPTION

Given a list of requested unique feature identifiers, this program will pull out those features (rows) from a datafile and write a new file. This program compares in function to a popular spreadsheet VLOOKUP command. The list is provided as a separate text file, either as a single column file or a multi-column tab-delimited from which one column is selected. All rows from the source data file that match an identifier in the list will be written to the new file. The order of the features in the output file may match either the list file or the data file.

If the list file has a second group column, then the rows for each group will be written to separate files, with the output file name appended with the group identifier. Use the gindex option to specify the group column.

The program will also accept a Cluster gene file (with .kgg extension) as a list file with group information, where the clusters are the groups.

The program will optionally regenerate a summed data file, in which values in the specified data columns are averaged and written out as rows in a separate data file. Compare this function to the summary option in the biotoolbox scripts get_relative_data.pl or average_gene.pl.

AUTHOR

Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.