Man Linux: Main Page and Category List

NAME

       bulk_loader - PgQ consumer that loads urlencoded records to slow
       databases

SYNOPSIS

           bulk_loader.py [switches] config.ini

DESCRIPTION

       bulk_loader is PgQ consumer that reads url encoded records from source
       queue and writes them into tables according to configuration file. It
       is targeted to slow databases that cannot handle applying each row as
       separate statement. Originally written for BizgresMPP/greenplumDB which
       have very high per-statement overhead, but can also be used to load
       regular PostgreSQL database that cannot manage regular replication.

       Behaviour properties: - reads urlencoded "logutriga" records. - does
       not do partitioning, but allows optionally redirect table events. -
       does not keep event order. - always loads data with COPY, either
       directly to main table (INSERTs) or to temp tables (UPDATE/COPY) then
       applies from there.

       Events are usually procuded by pgq.logutriga(). Logutriga adds all the
       data of the record into the event (also in case of updates and
       deletes).

QUICK-START

       Basic bulk_loader setup and usage can be summarized by the following
       steps:

        1.  pgq and logutriga must be installed in source databases. See
           pgqadm man page for details. target database must also have pgq_ext
           schema.

        2.  edit a bulk_loader configuration file, say bulk_loader_sample.ini

        3.  create source queue

               $ pgqadm.py ticker.ini create <queue>

        4.  Tune source queue to have big batches:

               $ pgqadm.py ticker.ini config <queue> ticker_max_count="10000" ticker_max_lag="10 minutes" ticker_idle_period="10 minutes"

        5.  create target database and tables in it.

        6.  launch bulk_loader in daemon mode

               $ bulk_loader.py -d bulk_loader_sample.ini

        7.  start producing events (create logutriga trggers on tables) CREATE
           OR REPLACE TRIGGER trig_bulk_replica AFTER INSERT OR UPDATE ON
           some_table FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga(<queue>)

CONFIG

   Common configuration parameters
       job_name
           Name for particulat job the script does. Script will log under this
           name to logdb/logserver. The name is also used as default for PgQ
           consumer name. It should be unique.

       pidfile
           Location for pid file. If not given, script is disallowed to
           daemonize.

       logfile
           Location for log file.

       loop_delay
           If continuisly running process, how long to sleep after each work
           loop, in seconds. Default: 1.

       connection_lifetime
           Close and reconnect older database connections.

       use_skylog
           foo.

   Common PgQ consumer parameters
       pgq_queue_name
           Queue name to attach to. No default.

       pgq_consumer_id
           Consumers ID to use when registering. Default: %(job_name)s

   Config options specific to bulk_loader
       src_db
           Connect string for source database where the queue resides.

       dst_db
           Connect string for target database where the tables should be
           created.

       remap_tables
           Optional parameter for table redirection. Contains comma-separated
           list of <oldname>:<newname> pairs. Eg: oldtable1:newtable1,
           oldtable2:newtable2.

       load_method
           Optional parameter for load method selection. Available options:

           0   UPDATE as UPDATE from temp
               table. This is default.
           1   UPDATE as DELETE+COPY from
               temp table.
           2   merge INSERTs with
               UPDATEs, then do
               DELETE+COPY from temp
               table.

LOGUTRIGA EVENT FORMAT

       PgQ trigger function pgq.logutriga() sends table change event into
       queue in following format:

       ev_type

           (op || ":" || pkey_fields). Where op is either "I", "U" or "D",
           corresponging to insert, update or delete. And pkey_fields is
           comma-separated list of primary key fields for table. Operation
           type is always present but pkey_fields list can be empty, if table
           has no primary keys. Example: I:col1,col2

       ev_data
           Urlencoded record of data. It uses db-specific urlecoding where
           existence of = is meaningful - missing = means NULL, present =
           means literal value. Example: id=3&name=str&nullvalue&emptyvalue=

       ev_extra1
           Fully qualified table name.

COMMAND LINE SWITCHES

       Following switches are common to all skytools.DBScript-based Python
       programs.

       -h, --help
           show help message and exit

       -q, --quiet
           make program silent

       -v, --verbose
           make program more verbose

       -d, --daemon
           make program go background

       Following switches are used to control already running process. The
       pidfile is read from config then signal is sent to process id specified
       there.

       -r, --reload
           reload config (send SIGHUP)

       -s, --stop
           stop program safely (send SIGINT)

       -k, --kill
           kill program immidiately (send SIGTERM)

                                  09/22/2008