Man Linux: Main Page and Category List

NAME

       file - determine file type

SYNOPSIS

       file [-dh][-M file][-m file] file ...

       file -i [-h] file ...

DESCRIPTION

       The  file  utility  shall perform a series of tests in sequence on each
       specified file in an attempt to classify it:

        1. If file does not exist, cannot be read, or its  file  status  could
           not  be  determined,  the  output  shall indicate that the file was
           processed, but that its type could not be determined.

        2. If the file  is  not  a  regular  file,  its  file  type  shall  be
           identified.  The file types directory, FIFO, socket, block special,
           and  character  special  shall  be  identified   as   such.   Other
           implementation-defined  file  types may also be identified. If file
           is a symbolic link, by default the link shall be resolved and  file
           shall  test the type of file referenced by the symbolic link.  (See
           the -h and -i options below.)

        3. If the length of file is zero, it shall be identified as  an  empty
           file.

        4. The file utility shall examine an initial segment of file and shall
           make a  guess  at  identifying  its  contents  based  on  position-
           sensitive  tests.  (The answer is not guaranteed to be correct; see
           the -d, -M, and -m options below.)

        5. The file utility shall examine file and make a guess at identifying
           its  contents based on context-sensitive default system tests. (The
           answer is not guaranteed to be correct.)

        6. The file shall be identified as a data file.

       If file does not exist, cannot be read, or its file status could not be
       determined,  the output shall indicate that the file was processed, but
       that its type could not be determined.

       If file is a symbolic link, by default the link shall be  resolved  and
       file shall test the type of file referenced by the symbolic link.

OPTIONS

       The  file  utility  shall  conform  to  the  Base Definitions volume of
       IEEE Std 1003.1-2001, Section 12.2, Utility Syntax  Guidelines,  except
       that the order of the -m, -d, and -M options shall be significant.

       The following options shall be supported by the implementation:

       -d     Apply  any  position-sensitive default system tests and context-
              sensitive default system tests to the file. This is the  default
              if no -M or -m option is specified.

       -h     When  a  symbolic  link  is  encountered, identify the file as a
              symbolic link. If -h is not specified and  file  is  a  symbolic
              link  that refers to a nonexistent file, file shall identify the
              file as a symbolic link, as if -h had been specified.

       -i     If a file is a regular file, do not attempt to classify the type
              of  the  file further, but identify the file as specified in the
              STDOUT section.

       -M  file
              Specify the name of a file containing  position-sensitive  tests
              that shall be applied to a file in order to classify it (see the
              EXTENDED  DESCRIPTION).  No  position-sensitive  default  system
              tests  nor  context-sensitive  default  system  tests  shall  be
              applied unless the -d option is also specified.

       -m  file
              Specify the name of a file containing  position-sensitive  tests
              that shall be applied to a file in order to classify it (see the
              EXTENDED DESCRIPTION).

       If the -m option is specified without specifying the -d option  or  the
       -M  option,  position-sensitive  default  system tests shall be applied
       after the position-sensitive tests specified by the -m option.  If  the
       -M  option  is specified with the -d option, the -m option, or both, or
       the -m option is specified with the -d option, the concatenation of the
       position-sensitive tests specified by these options shall be applied in
       the order specified by the appearance of these options. If a -M  or  -m
       file option-argument is -, the results are unspecified.

OPERANDS

       The following operand shall be supported:

       file   A pathname of a file to be tested.

STDIN

       Not used.

INPUT FILES

       The file can be any file type.

ENVIRONMENT VARIABLES

       The following environment variables shall affect the execution of file:

       LANG   Provide a default value for the  internationalization  variables
              that  are  unset  or  null.  (See the Base Definitions volume of
              IEEE Std 1003.1-2001,    Section    8.2,    Internationalization
              Variables  for  the precedence of internationalization variables
              used to determine the values of locale categories.)

       LC_ALL If set to a non-empty string value, override the values  of  all
              the other internationalization variables.

       LC_CTYPE
              Determine  the  locale  for  the  interpretation of sequences of
              bytes of text data as characters (for  example,  single-byte  as
              opposed  to multi-byte characters in arguments and input files).

       LC_MESSAGES
              Determine the locale that should be used to  affect  the  format
              and  contents  of  diagnostic messages written to standard error
              and informative messages written to standard output.

       NLSPATH
              Determine the location of message catalogs for the processing of
              LC_MESSAGES .

ASYNCHRONOUS EVENTS

       Default.

STDOUT

       In  the  POSIX  locale,  the following format shall be used to identify
       each operand, file specified:

              "%s: %s\n", <file>, <type>

       The values for <type> are unspecified, except that in the POSIX locale,
       if  file  is  identified  as  one  of the types listed in the following
       table, <type> shall contain (but is not limited to)  the  corresponding
       string,  unless  the  file  is  identified by a position-sensitive test
       specified by a -M or -m option. Each space shown in the  strings  shall
       be exactly one <space>.

                         Table: File Utility Output Strings

        If file is:                            <type> shall contain the Notes
                                               string:
        Nonexistent                            cannot open
        Block special                          block special            1
        Character special                      character special        1
        Directory                              directory                1
        FIFO                                   fifo                     1
        Socket                                 socket                   1
        Symbolic link                          symbolic link to         1
        Regular file                           regular file             1,2
        Empty regular file                     empty                    3
        Regular file that cannot be read       cannot open              3
        Executable binary                      executable               4,6
        ar archive library (see ar)            archive                  4,6
        Extended cpio format (see pax)         cpio archive             4,6
        Extended tar format (see ustar in pax) tar archive              4,6
        Shell script                           commands text            5,6
        C-language source                      c program text           5,6
        FORTRAN source                         fortran program text     5,6
        Regular file whose type cannot be      data
        determined

       Notes:

               1. This is a file type test.

               2. This test is applied only if the -i option is specified.

               3. This test is applied only if the -i option is not specified.

               4. This is a position-sensitive default system test.

               5. This is a context-sensitive default system test.

               6. Position-sensitive  default  system   tests   and   context-
                  sensitive  default  system  tests  are not applied if the -M
                  option is specified unless the -d option is also  specified.

       In  the POSIX locale, if file is identified as a symbolic link (see the
       -h option), the following alternative output format shall be used:

              "%s: %s %s\n", <file>, <type>, <contents of link>"

       If the file named by the file operand does not exist, cannot  be  read,
       or the type of the file named by the file operand cannot be determined,
       this shall not be considered an error that affects the exit status.

STDERR

       The standard error shall be used only for diagnostic messages.

OUTPUT FILES

       None.

EXTENDED DESCRIPTION

       A file specified as an option-argument to the -m or  -M  options  shall
       contain one position-sensitive test per line, which shall be applied to
       the file. If the test succeeds, the message field of the line shall  be
       printed  and no further tests shall be applied, with the exception that
       tests on immediately  following  lines  beginning  with  a  single  ’>’
       character shall be applied.

       Each  line  shall  be  composed of the following four <blank>-separated
       fields:

       offset An  unsigned  number  (optionally  preceded  by  a  single   ’>’
              character)  specifying the offset, in bytes, of the value in the
              file that is to be compared against the value field of the line.
              If the file is shorter than the specified offset, the test shall
              fail.

       If the offset begins with the character ’>’ , the test contained in the
       line  shall not be applied to the file unless the test on the last line
       for which the offset did not  begin  with  a  ’>’  was  successful.  By
       default, the offset shall be interpreted as an unsigned decimal number.
       With a leading  0x  or  0X,  the  offset  shall  be  interpreted  as  a
       hexadecimal  number;  otherwise,  with a leading 0, the offset shall be
       interpreted as an octal number.

       type   The type of the value in the file to be tested. The  type  shall
              consist of the type specification characters c , d , f , s , and
              u  ,  specifying  character,  signed  decimal,  floating  point,
              string, and unsigned decimal, respectively.

       The  type  string  shall  be  interpreted  as  the  bytes from the file
       starting at the specified offset and including the same number of bytes
       specified  by the value field. If insufficient bytes remain in the file
       past the offset to match the value field, the test shall fail.

       The type specification characters d , f , and u can be followed  by  an
       optional  unsigned  decimal  integer that specifies the number of bytes
       represented by the type.  The type specification  character  f  can  be
       followed  by an optional F , D , or L , indicating that the value is of
       type  float,  double,  or   long   double,   respectively.   The   type
       specification characters d and u can be followed by an optional C , S ,
       I , or L , indicating that the value is of type char,  short,  int,  or
       long, respectively.

       The  default number of bytes represented by the type specifiers d , f ,
       and u shall correspond to their respective C-language types as follows.
       If   the  system  claims  conformance  to  the  C-Language  Development
       Utilities option, those specifiers  shall  correspond  to  the  default
       sizes  used  in the c99 utility.  Otherwise, the default sizes shall be
       implementation-defined.

       For the type specifier characters d and u , the default number of bytes
       shall   correspond  to  the  size  of  a  basic  integer  type  of  the
       implementation. For  these  specifier  characters,  the  implementation
       shall  support  values  of the optional number of bytes to be converted
       corresponding to the number of bytes  in  the  C-language  types  char,
       short,  int,  or  long.  These  numbers  can  also  be  specified by an
       application as the characters C , S , I , and  L  ,  respectively.  The
       byte  order  used  when  interpreting numeric values is implementation-
       defined, but shall correspond to the order in which a constant  of  the
       corresponding type is stored in memory on the system.

       For the type specifier f , the default number of bytes shall correspond
       to the number of bytes in the  basic  double  precision  floating-point
       data  type  of the underlying implementation.  The implementation shall
       support values  of  the  optional  number  of  bytes  to  be  converted
       corresponding  to  the  number  of bytes in the C-language types float,
       double, and long double. These numbers can  also  be  specified  by  an
       application as the characters F , D , and L , respectively.

       All type specifiers, except for s , can be followed by a mask specifier
       of the form &number. The mask value shall be AND’ed with the  value  of
       the  input  file before the comparison with the value field of the line
       is made. By default, the mask  shall  be  interpreted  as  an  unsigned
       decimal  number. With a leading 0x or 0X, the mask shall be interpreted
       as an unsigned hexadecimal number; otherwise, with  a  leading  0,  the
       mask shall be interpreted as an unsigned octal number.

       The  strings  byte,  short, long, and string shall also be supported as
       type fields, being interpreted as dC , dS , dL , and s ,  respectively.

       value  The value to be compared with the value from the file.

       If the specifier from the type field is s or string, then interpret the
       value as a string. Otherwise, interpret it as a number. If the value is
       a  string, then the test shall succeed only when a string value exactly
       matches the bytes from the file.

       If the value is a string, it can contain the following sequences:

       \character
              The  backslash-escape  sequences  as  specified  in   the   Base
              Definitions  volume  of  IEEE Std 1003.1-2001, Table 5-1, Escape
              Sequences and Associated Actions ( ’\\’ , ’\a’ , ’\b’ ,  ’\f’  ,
              ’\n’  ,  ’\r’  ,  ’\t’  , ’\v’ ). The results of using any other
              character, other than an octal digit,  following  the  backslash
              are unspecified.

       \octal
              Octal  sequences  that  can be used to represent characters with
              specific coded values. An octal  sequence  shall  consist  of  a
              backslash followed by the longest sequence of one, two, or three
              octal-digit characters (01234567). If the size of a byte on  the
              system is greater than 9 bits, the valid escape sequence used to
              represent a byte is implementation-defined.

       By default, any value that is not a string shall be  interpreted  as  a
       signed  decimal  number. Any such value, with a leading 0x or 0X, shall
       be interpreted as an unsigned hexadecimal  number;  otherwise,  with  a
       leading  zero,  the  value  shall  be  interpreted as an unsigned octal
       number.

       If the value is not a  string,  it  can  be  preceded  by  a  character
       indicating  the  comparison to be performed. Permissible characters and
       the comparisons they specify are as follows:

       =
              The test shall succeed if the value from  the  file  equals  the
              value field.

       <
              The  test  shall succeed if the value from the file is less than
              the value field.

       >
              The test shall succeed if the value from  the  file  is  greater
              than the value field.

       &
              The test shall succeed if all of the set bits in the value field
              are set in the value from the file.

       ^
              The test shall succeed if at least one of the set  bits  in  the
              value field is not set in the value from the file.

       x
              The  test shall succeed if the file is large enough to contain a
              value of the type specified starting at the offset specified.

       message
              The message to be printed if  the  test  succeeds.  The  message
              shall   be   interpreted  using  the  notation  for  the  printf
              formatting specification; see printf() . If the value field  was
              a string, then the value from the file shall be the argument for
              the printf formatting specification; otherwise, the  value  from
              the file shall be the argument.

EXIT STATUS

       The following exit values shall be returned:

        0     Successful completion.

       >0     An error occurred.

CONSEQUENCES OF ERRORS

       Default.

       The following sections are informative.

APPLICATION USAGE

       The  file  utility  can  only  be required to guess at many of the file
       types because only exhaustive testing can  determine  some  types  with
       certainty. For example, binary data on some implementations might match
       the initial segment of an executable or a tar archive.

       Note that the table indicates  that  the  output  contains  the  stated
       string.   Systems  may  add  text  before  or  after  the  string.  For
       executables, as an example, the machine architecture and various  facts
       about  how  the file was link-edited may be included. Note also that on
       systems that  recognize  shell  script  files  starting  with  "#!"  as
       executable  files,  these  may be identified as executable binary files
       rather than as shell scripts.

EXAMPLES

       Determine whether an argument is a binary executable file:

              file "$1" | grep -Fq executable &&
                  printf "%s is executable.\n" "$1"

RATIONALE

       The -f option was omitted because the same effect can (and  should)  be
       obtained using the xargs utility.

       Historical  versions  of  the  file  utility  attempt  to  identify the
       following types of files: symbolic link, directory, character  special,
       block special, socket, tar archive, cpio archive, SCCS archive, archive
       library, empty, compress output, pack output, binary  data,  C  source,
       FORTRAN  source,  assembler source, nroff/ troff/ eqn/ tbl source troff
       output, shell script, C shell script, English text, ASCII text, various
       executables,  APL  workspace,  compiled  terminfo  entries,  and CURSES
       screen images. Only those types that are reasonably well  specified  in
       POSIX  or  are  directly  related  to POSIX utilities are listed in the
       table.

       Historical systems have used a "magic file" named  /etc/magic  to  help
       identify  file  types.  Because  it  is  generally useful for users and
       scripts to be able to identify special file types, the -m  flag  and  a
       portable  format  for  user-created  magic files has been specified. No
       requirement is made that an implementation of file use this  method  of
       identifying  files,  only  that  users  be  permitted  to add their own
       classifying tests.

       In addition, three options have been added to historical practice.  The
       -d  flag  has been added to permit users to cause their tests to follow
       any default system tests. The -i flag has been added to permit users to
       test  portably for regular files in shell scripts. The -M flag has been
       added to permit users to ignore any default system tests.

       The IEEE Std 1003.1-2001 description of default system  tests  and  the
       interaction between the -d, -M, and -m options did not clearly indicate
       that there were two types of "default  system  tests".  The  "position-
       sensitive tests’’ determine file types by looking for certain string or
       binary values at specific offsets in the  file  being  examined.  These
       position-sensitive  tests  were implemented in historical systems using
       the magic file described above. Some of these tests are now built  into
       the  file  utility  itself  on  some  implementations so the output can
       provide more detail than can be provided by magic files. For example, a
       magic file can easily identify a core file on most implementations, but
       cannot name the program file that dropped the core. A magic file  could
       produce output such as:

              /home/dwc/core: ELF 32-bit MSB core file SPARC Version 1

       but  by  building  the test into the file utility, you could get output
       such as:

              /home/dwc/core: ELF 32-bit MSB core file SPARC Version 1, fromtestprog’

       These extended built-in tests are still  to  be  treated  as  position-
       sensitive  default  system  tests  even  if  they  are  not  listed  in
       /etc/magic or any other magic file.

       The context-sensitive default system tests were always built  into  the
       file  utility. These tests looked for language constructs in text files
       trying to identify  shell  scripts,  C,  FORTRAN,  and  other  computer
       language  source files, and even plain text files. With the addition of
       the -m and -M options the distinction  between  position-sensitive  and
       context-sensitive  default  system  tests  became important because the
       order of testing is important.  The  context-sensitive  system  default
       tests  should never be applied before any position-sensitive tests even
       if the -d option is specified before a -m option or -M  option  due  to
       the  high  probability  that the context-sensitive system default tests
       will incorrectly identify arbitrary text files  as  text  files  before
       position-sensitive  tests  specified  by  the  -m or -M option would be
       applied to give a more accurate identification.

       Leaving the meaning of -M - and -m -  unspecified  allows  an  existing
       prototype  of  these  options  to  continue  to  work  in  a backwards-
       compatible manner. (In that implementation, -M - was roughly equivalent
       to -d in IEEE Std 1003.1-2001.)

       The  historical  -c  option  was  omitted as not particularly useful to
       users  or  portable  shell   scripts.   In   addition,   a   reasonable
       implementation  of  the file utility would report any errors found each
       time the magic file is read.

       The historical format of the magic file was the same as that  specified
       by  the  Rationale  in  the  ISO POSIX-2:1993  standard for the offset,
       value, and message fields; however, it used less  precise  type  fields
       than  the  format specified by the current normative text. The new type
       field values are a superset of the historical ones.

       The following is an example magic file:

              0  short     070707              cpio archive
              0  short     0143561             Byte-swapped cpio archive
              0  string    070707              ASCII cpio archive
              0  long      0177555             Very old archive
              0  short     0177545             Old archive
              0  short     017437              Old packed data
              0  string    \037\036            Packed data
              0  string    \377\037            Compacted data
              0  string    \037\235            Compressed data
              >2 byte&0x80 >0                  Block compressed
              >2 byte&0x1f x                   %d bits
              0  string    \032\001            Compiled Terminfo Entry
              0  short     0433                Curses screen image
              0  short     0434                Curses screen image
              0  string    <ar>                System V Release 1 archive
              0  string    !<arch>\n__.SYMDEF  Archive random library
              0  string    !<arch>             Archive
              0  string    ARF_BEGARF          PHIGS clear text archive
              0  long      0x137A2950          Scalable OpenFont binary
              0  long      0x137A2951          Encrypted scalable OpenFont binary

       The use of  a  basic  integer  data  type  is  intended  to  allow  the
       implementation  to  choose a word size commonly used by applications on
       that architecture.

FUTURE DIRECTIONS

       None.

SEE ALSO

       ar , ls , pax

COPYRIGHT

       Portions of this text are reprinted and reproduced in  electronic  form
       from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
       -- Portable Operating System Interface (POSIX),  The  Open  Group  Base
       Specifications  Issue  6,  Copyright  (C) 2001-2003 by the Institute of
       Electrical and Electronics Engineers, Inc and The Open  Group.  In  the
       event of any discrepancy between this version and the original IEEE and
       The Open Group Standard, the original IEEE and The Open Group  Standard
       is  the  referee document. The original Standard can be obtained online
       at http://www.opengroup.org/unix/online.html .