Man Linux: Main Page and Category List

NAME

       israndom  -  randomness  testing using data compressors over fixed-size
       alphabets

SYNOPSIS

       israndom  [-a  alphasize]  [-c  compressor]  [-s   samplelen]   [-qhnr]
       [filename]

DESCRIPTION

       israndom tests a sequence of symbols for randomness.  israndom tries to
       determine if a given sequence of trials could reasonably be assumed  to
       be  from  a  random  uniform distribution over a fixed-size alphabet of
       2-256 symbols.

       israndom assumes that each sequence (or sample trial) is represented by
       exactly  one byte.  The only exceptions to this rule are in the case of
       the
              -n and -r options which ignore newlines  and  carriage  returns,
              respectively (see below).

       israndom is based on the mathematical ideas of Shannon, Kolmogorov, and
       Cilibrasi and uses the following formula to determine an expected  size
       for a sample of
              k  trials  of  a  uniform distribution over an alphasize- symbol
              alphabet.  Each symbol takes log(alphasize) bits, so  the  total
              cost (in bits) c for the ensemble of samples is k log(alphasize)
              bits.  This number  is  rounded  up  to  the  nearest  byte  and
              increased by one to arrive at the final estimate of the expected
              communication cost on the assumption of uniform randomness.

       If the compressed size of
              k samples is less than  c  then  this  represents  a  randomness
              deficiency  and  the  randomness test fails.  israndom will exit
              with a nonzero exit status.  If israndom indicates that a source
              is   nonrandom,   this   fact  is  effectively  certain  if  the
              compression module is correct and invertable.  If the compressed
              size is at least the threshhold value c then the file appears to
              be random and passes the test and israndom will exit  with  a  0
              return  value.  In either case, it will print the alphabet size,
              expected  compressed  size,   sample   count,   and   randomness
              difference before exitting with an appropriate return code.

       The  default number of samples is 393216.  Although larger sizes should
       increase accuracy, using too few samples will cause the method to  fail
       to  be  able  to  resolve  randomness in certain situations.  This is a
       theoretically unavoidable fact for all effective randomness tests.

       If  a filename is given, it is read to find the samples to analyze.  If
       the filename "-" is given,  or  no  filename  is  given  at  all,  then
       israndom reads from standard input.

       If text files are to be used, it is important to specify one or both of
       -n and  -r  since  without  these,  end  of  line  characters  will  be
       misinterpreted as samples.

OPTIONS

       -c compressor_name
              set  compressor  explicitly  to  compressor_name  instead of the
              default,  bzlib.   For  basic   analysis,   bzlib   is   usually
              sufficient.   For  detecting  complex  or  subtle biases, a more
              powerful compression module such as lzma (lzmax) or ppmd (ppmdx)
              will  detect  more  types of non-randomness.  Because Lempel-Ziv
              types are universal,  all  effective  randomness  tests  can  be
              captured as a kind of compression discriminant function.

       -n     ignore newlines (so that text files may be used)

       -r     ignore carriage returns (so that text files may be used)

       -a alphasize
              set alphabet size to alphasize an integer between 2 and 256.  If
              you do  not  specify  an  alphabet  size,  it  is  automatically
              determined by the contents of the samples.

       -s samplecount
              Use  samplecount samples instead of the default of 393216. Using
              a number that is too small here will reduce the accuracy of  the
              test,  causing everything to appear to be random.  If 0 is used,
              it means to read until EOF.

       -q     quiet mode, with no extra status messages

       -h     print help and exit.

       EXAMPLES
              First, we can verify  that  the  cryptographicly  strong  random
              number generator is correct:

       israndom /dev/urandom

       Next,  we  can  notice that the "od" command, without extra options, is
       not random because it prints  out  addresses  and  spaces  predictably.
       Most compressors can tell by the regular spaces that it is not random:

       od /dev/urandom | israndom -n -r

       but  if  we  remove  spaces using ’tr’ then a more powerful compressor,
       lzmax, is required to demonstrate the non-randomness of the sequence:

       od /dev/urandom | tr -d ’ ’ | israndom -n -r -c lzmax

       Removing the address lines using an
              od  option  yields  the  expected  result  once  again  that the
              sequence is effectively random:

       od -An /dev/urandom | tr -d ’ ’ | israndom -n -r -c lzmax

       The above sequence is not actually random, because  every  third  octal
       digit
              only ranges from 0 to 3 since 377  octal  is  the  same  as  256
              decimal.   This  subtle  pattern  is detectable using 10 million
              samples and the advanced ppmdx compressor:

       od -An /dev/urandom | tr -d ’ ’ | israndom -n -r -c ppmdx -s 10000000

       As a sanity check, we see that  even  in  extreme  analysis  as  above,
       /dev/urandom
              still checks out okay as random, even with newlines and carriage
              returns removed for good measure.

       cat /dev/urandom | israndom -n -r -c ppmdx -s 10000000

       ENVIRONMENT
              No environment variables.

BUGS

       Please report bugs to the Debian BTS.

AUTHOR

       Rudi Cilibrasi <cilibrar@cilibrar.com>

SEE ALSO

       complearn(5), ncd(1)