Man Linux: Main Page and Category List

NAME

       ffe - flat file extractor

SYNOPSIS

       ffe [options]...

DESCRIPTION

       ffe  is  a  program  for  extracting  fields from flat file records and
       displaying them in different formats. ffe relies on  the  configuration
       file to control input file structure and the output format.

OPTIONS

       ffe accepts the following options:

       -c, --configuration=file
              Read the configuration from file, default is ~/.fferc.

       -s, --structure=STRUCTURE
              Input file is processed using the structure STRUCTURE.

       -p, --print=FORMAT
              Use  output  format  FORMAT  for  printing.  All printing can be
              suppressed using format  no.  Original  data  is  printed  using
              format raw.

       -o, --output=NAME
              Write output to NAME instead of standard output.

       -f, --field-list=LIST
              Print  only  fields  and  constants specified in comma separated
              list LIST.

       -e, --expression=EXPRESSION
              Print only those records for which the EXPRESSION  evaluates  to
              true.

       -a, --and
              Expressions  are  combined  with logical and, default is logical
              or.

       -v, --invert-match
              Print only those records which don’t match the expression.

       -l, --loose
              An invalid input line does not cause program to abort.

       -r, --replace=FIELD=VALUE
              Replace FIELDs contents with VALUE in output. VALUE can  contain
              same directives as output option data.

       -d, --debug
              All invalid input lines are written to file ffe_error_<pid>.log.

       -I, --info
              Show the structure information in configuration file and exit.

       -?, --help
              List all available options and their meanings and exit.

       -V, --version
              Show version of program and exit.

       All remaining arguments are names of input files; if no input files are
       specified, then the standard input is read.

   Expressions (option -e, --expression)
       Expression  can  be  used  to  select  specific records comparing field
       values.

       If the value starts with string "file:" then the rest of the  value  is
       considered  as  a file name. Every line in the file is used as value in
       comparison. Record will be selected if one  or  more  values  evaluates
       true.

       Expression notation:

       field=value
              A  record  will  be  selected if the field field is equal to the
              value value.

       field^value
              A record will be selected if the field  field  starts  with  the
              value value.

       field~value
              A  record will be selected if the field field contains the value
              value.

       field!value
              A record will be selected if the field field is not equal to the
              value value.

       field?value
              A record will be selected if the field field matches the regular
              expression in value.

FFE CONFIGURATION

       ffe uses the configuration file for extracting fields  from  the  input
       file  and  for  formatting  the fields for output. Every line or binary
       block  of  the  input  file  is  considered  as   a   record.   Default
       configuration  file is ~/.fferc but another file can be given with ’-c’
       option.

       Configuration file for ffe is a text file. The file may  contain  empty
       lines.   Commands   are  case-sensitive.  Comments   begin   with   the
       #-character and end at the  end  of  the  line.  The  string  and  char
       definitions can be enclosed in double quotation ’"’ characters. char is
       a single character. string and char can contain following escape codes:
       ’\a’,’\b’,’\t’,’\n’,’\v’,’\f’,  ’\r’,  ’\"’ and ’\#’. Character ’\’ can
       be escaped as ’\\’.

       Command Substitution allows the output of a command to replace parts of
       the configuration file. Syntax for command substitution is:
       ‘command‘
       The  command  is  executed  and  the  ‘command‘ is substituted with the
       standard output of the command, with  any  trailing  newlines  deleted.
       Command substitutions may not be nested.

       Before executing the command ffe sets few environment variables:

       FFE_STRUCTURE
              The name of the structure given using -s,--structure.

       FFE_OUIPUT
              The name of the output file given using -o,--output.

       FFE_FORMAT
              The name of the output format given using -p,--print.

       FFE_FIRST_FILE
              The name of the first input file.

       FFE_FILES
              A list of all input files.

       If variable is already set it will not be replaced.

   Input file structure
       Input file structures are specified with keyword structure:

       structure name {options...}

       Options must be ended with newline, options are:

       type fixed|binary|separated [char] [*]
              Fields  in  the input are fixed length text fields, fixed length
              binary fields or text fields separated by char. If *  is  given,
              multiple  sequential  separators  are considered as one. Default
              separator is comma.

       quoted [char]
              Fields may be quoted with char, default quotation mark is double
              quotation  mark  ’"’.  A quotation mark is assumed to be escaped
              as \char or doubling the mark as charchar in input. Non  escaped
              quotation marks are not preserved in output.

       header first|all|no
              Controls  the  occurrence  of the header line. Default is no. If
              set as first or all, the first line of the first input  file  is
              considered  as  header  line containing the names of the fields.
              First means that only the first file has  a  header,  all  means
              that all files have a header, although the names are still taken
              from the header of  the  first  file.  Header  line  is  handled
              according   the   record   definition,  meaning  that  the  name
              positions, separators etc. are the same as for the fields.

       output name
              All records  belonging  this  structure  are  printed  according
              output format name. Default is to use output named as ’default’.

       record name {options...}
              Defines one record for a  structure.  A  structure  can  contain
              several record types.

   Record options:
       id position string

       rid position regexp
              Identifies a record in the input file. Records are identified by
              the string or by the  regular  expression  in  regexp  in  input
              record  position position. For fixed length and binary input the
              position is the byte  position  of  the  input  record  and  for
              separated  input the position means the position’th field of the
              input record. Positions start from one.

              Id’s are required  only  if  input  structure  contains  several
              record  types  with equal lengths or field counts. Non printable
              characters can be escaped as \xnn where nn  is  the  hexadecimal
              value of the character.

              A record definition can contain several id’s, then all id’d must
              match the input line (id’s are combined with logical and).

              In a multi-record binary structure every  record  must  have  at
              least one id.

       field name|FILLER|* [length]|* [lookup]|* [output]
              Specifies  one  field  in  a  text  input  structure.  length is
              mandatory for fixed length input structure except for  the  last
              field. If the last field of a fixed length input structure has a
              * in place of length then the  last  field  can  have  arbitrary
              length.

              Length  is  also used for printing fields in fixed length format
              using the %D directive. The order  of  fields  in  configuration
              file is essential, it specifies the field order in a record.

              If  ’*’  is  given instead of the name,  then the ’name’ will be
              the ordinal number of the field, or if the ’header’  option  has
              value  ’first’  or  ’all’, then the name of the field will taken
              from the header line (first line of the input).

              If lookup is given then the fields contents is used to   make  a
              lookup   in  lookup  table  lookup.  If  length  is  not  needed
              (separated format) but lookup is needed,  use  asterisk  (*)  in
              place of length definition.

              If  output  is  given  field is printed using output output. Use
              asterisk in place of lookup if lookup is not needed.

              Naming the field as FILLER causes field not  to  be  printed  in
              output.

       field name|FILLER|* [length]|type [lookup]|* [output]
              Specifies  one  field  in  a  binary  input structure. All other
              features are same as for the  text  structure  except  the  type
              parameter.   type  specifies  field data type and length and can
              have the following values:

              char Printable character.

              short Short integer having current system length and byte order.

              int Integer having current system length and byte order.

              long Long integer having current system length and byte order.

              llong  Long  long  integer having current system length and byte
              order.

              ushort Unsigned short integer having current system  length  and
              byte order.

              uint  Unsigned  integer  having  current  system length and byte
              order.

              ulong Unsigned long integer having  current  system  length  and
              byte order.

              ullong  Unsigned  long long integer having current system length
              and byte order.

              int8 8 bit integer.

              int16_be Big endian 16 bit integer.

              int32_be Big endian 32 bit integer.

              int64_be Big endian 64 bit integer.

              int16_le Little endian 16 bit integer.

              int32_le Little endian 32 bit integer.

              int64_le Little endian 64 bit integer.

              uint8 Unsigned 8 bit integer.

              uint16_be Unsigned big endian 16 bit integer.

              uint32_be Unsigned big endian 32 bit integer.

              uint64_be Unsigned big endian 64 bit integer.

              uint16_le Unsigned little endian 16 bit integer.

              uint32_le Unsigned little endian 32 bit integer.

              uint64_le Unsigned little endian 64 bit integer.

              float Float having current system length and byte order.

              float_be Float having current system length and big endian  byte
              order.

              float_le  Float  having  current system length and little endian
              byte order.

              double Double having current system length and byte order.

              double_be Double having current system  length  and  big  endian
              byte order.

              double_le  Double having current system length and little endian
              byte order.

              bcd_be_len Bcd number having  length  len  and  nybbles  in  big
              endian order.

              bcd_le_len  Bcd  number  having length len and nybbles in little
              endian order.

              hex_be_len Hexadecimal data in big endian  order  having  length
              len.

              hex_le_len Hexadecimal data in little endian order having length
              len.

              If length is given instead  of  the  type,  then  the  field  is
              assumed to be a printable string having length length. String is
              printed until length characters are printed or NULL character is
              found.

              Bcd  number  (bcd_be_len  and  bcd_le_len)  is printed until len
              bytes are read or a nybble having hexadecimal value f is  found.
              Bcd  number  having  big  endian order is printed in order: most
              significant nybble first and least significant nybble second and
              bcd number having little endian order is printed in order: least
              significant nybble first and  most  significant  nybble  second.
              Bytes are always read in big endian order.

              Hexadecimal  data  (hex_be_len  and  hex_le_len)  is  printed as
              hexadecimal values. Big endian data is printed starting from the
              lower  address  and  little  endian data starting from the upper
              address.

       field-count number
              Same effect as having field * number times.  Because  length  is
              not specified, this works only with separated structure.

       fields-from record
              Fields for this record are the same as for record record.

       output name
              This  record is printed according output format name. Default is
              to use output format specified in the structure.

   Output definitions
       There can be several output  definitions  in  the  configuration  file.
       Format  can  be  selected  with ’-p’ option. Default format is named as
       ’default’.

       output name|default {options...}
              Defines one output format. Output named  as  ’default’  will  be
              used  if none is given for structure or record, or none is given
              with option ’-p’.

              There is two predefined output formats no and raw. no suppresses
              all printing and raw prints the original input data.

   Output options:
       Pictures in output definition can contain printf-style %-directives:

       %f     Name of the input file.

       %s     Name of the current structure.

       %r     Name of the current record.

       %o     Input record number in current file.

       %O     Input record number starting from the first file.

       %i     Byte  offset  of  the current record in the current file. Starts
              from zero.

       %I     Byte offset of the current record starting from the first  file.
              Starts from zero.

       %n     Field name.

       %t     Field contents, without leading and trailing whitespaces.

       %d     Field  contents.  Binary  integer is printed as a decimal value.
              Floating point number is printed in the style [-]ddd.ddd,  where
              the number of digits after the decimal-point character is 6. Bcd
              number is printed as a decimal number and  hexadecimal  data  as
              consecutive hexadecimal values.

       %D     Field  contents,  right  padded  to  the  field length (requires
              length definition for the field).

       %x     Unsigned hexadecimal value of a binary integer. Other fields are
              printed using directive %d.

       %l     Value from lookup.

       %L     Value  from  lookup,  right padded to the field length (requires
              length definition for the field).

       %e     Does not print anything, causes still the "field empty" check to
              be  performed.  Can  be  used  when  only the names of non-empty
              fields should be printed.

       %p     Fields start position in a record. For fixed structure  this  is
              field’s  byte  position  in  the  input  line  and for separated
              structure this is the ordinal number of the field.  Starts  from
              one.

       %%     Percent sign.

       file_header picture
              Picture is printed once before file contents.

       file_trailer picture
              Picture is printed once after file contents.

       header picture
              If specified, then the header line describing the field names is
              printed before records. Every field  name is  printed  according
              the  picture  using  the  same  separator  and  fields length as
              defined for the fields. Picture can contain only %n directive.

       data picture
              Field contents is printed according picture.

       lookup picture
              If field is mapped to lookup table, this picture  will  be  used
              instead  of picture from data option. If not given, then picture
              from data will be used.

       separator string
              All fields are terminated by string, except the  last  field  of
              the record. Default is not to print separator.

       record_header picture
              picture  is printed before the record content. Default is not to
              print header.

       record_trailer picture
              picture is printed after the record content. Default is newline.

       justify left|right|char
              Fields  are  left  or  right  justified.  char  justifies output
              according the first occurrence of  char  in  the  data  picture.
              Default is left.

       indent string
              Record  contents  is  intended  by  string.  Field  contents  is
              intended by two times the string. Default is not to indent.

       field-list name1,name2,...
              Only fields or constants named as name1,name2,...  are  printed,
              same  effect  as  has  ’-f’  option. Default is to print all the
              fields. Fields are also printed in the same order  as  they  are
              listed.

       no-data-print yes|no
              When  set  as no and field-list is given, suppresses printing of
              record_header and record_trailer in case  where  current  record
              contains none of the fields specified in field-list.

       field-empty-print yes|no
              When  set  as  no,  nothing  is printed for fields which consist
              entirely of characters from empty-chars. If none of  the  fields
              of  a  record are printed then the printing of record_trailer is
              also suppressed. Default is yes.

       empty-chars string
              string specifies a set of characters  which  define  an  "empty"
              field.  Default  is  "  \f\n\r\t\v"  (space, form-feed, newline,
              carriage return, horizontal tab and vertical tab)

   Lookup definitions
       lookup name {options...}
              Defines one lookup table.

   Lookup options:
       search exact|longest
              The search type for lookup table.

       default-value value
               value is printed if the lookup is not successful.

       pair key value
              One key/value pair for the lookup table.

       file name [separator]
              Key/value  pairs  are  read  from  file  name.  Every  line   is
              considered  as  a key/value pair separated by separator. Default
              separator is semicolon.

   Constants
       Additional to input fields constants values can be printed using option
       -f,--field-list  or  output option field-list. Constant will be printed
       using data output option.

       Constants are specified as

       const name value
              when the name appears in a field list, value will be printed for
              every record as the name were one of the input fields.

EXAMPLES

       Example    of    fixed    length    flat    file    containing   fields
       ’FirstName’,’LastName’ and ’Age’:

       John     Ripper       23
       Scott    Tiger        45
       Mary     Moore        41

       This file can be printed in XML with the following configuration:

       structure personnel {
           type fixed
           output XML
           record person {
               field FirstName 9
               field LastName  13
               field Age 2
           }
       }

       output XML {
           file_header "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n"
           data "<%n>%d</%n>\n"
           record_header "<%r>\n"
           record_trailer "</%r>\n"
           indent " "
       }

SEE ALSO

       More examples in Texinfo manual. If  the  info  and  ffe  are  properly
       installed, the command

              info ffe

       should give more information.

AUTHOR

       Timo Savinen <tjsa@iki.fi>