Man Linux: Main Page and Category List

NAME

       djvu2hocr - DjVu to hOCR converter

SYNOPSIS

       djvu2hocr [option...] djvu-file

       djvu2hocr {--version | --help | -h}

DESCRIPTION

       djvu2hocr converts hidden text from a DjVu file to the hOCR[1] format.

OPTIONS

   Text segmentation options
       --word-segmentation=simple
           Use the same word segmentation as found in the DjVu file.

           This is the default.

       --word-segmentation=uax29
           Use the Unicode Text Segmentation[2] algorithm to break lines into
           words, possibly fixing word segmentation found in the DjVu file.

   Other options
       --version
           Output version information and exit.

       -h, --help
           Display help and exit.

PORTABILITY

       djvu2hocr uses a custom extension to hOCR to retain characters which
       cannot be directly represented in an HTML/XML document. For example,
       control character BEL (^G, U+0007), is converted into the following
       HTML chunk: <span class="djvu_char" title="#x07"> </span>

SEE ALSO

       djvu(1)

AUTHOR

       Jakub Wilk <jwilk@jwilk.net>
           Author.

COPYRIGHT

       Copyright © 2009, 2010 Jakub Wilk

NOTES

        1. hOCR
           http://docs.google.com/View?docid=dfxcv4vc_67g844kf

        2. Unicode Text Segmentation
           http://unicode.org/reports/tr29/