Man Linux: Main Page and Category List

NAME

       slurm_checkpoint_able,                       slurm_checkpoint_complete,
       slurm_checkpoint_create,                      slurm_checkpoint_disable,
       slurm_checkpoint_enable,                        slurm_checkpoint_error,
       slurm_checkpoint_restart, slurm_checkpoint_vacate  -  Slurm  checkpoint
       functions

SYNTAX

       #include <slurm/slurm.h>

       int slurm_checkpoint_able (
            uint32_t job_id,
            uint32_t step_id,
            time_t *start_time,
       );

       int slurm_checkpoint_complete (
            uint32_t job_id,
            uint32_t step_id,
            time_t start_time,
            uint32_t error_code,
            char *error_msg
       );

       int slurm_checkpoint_create (
            uint32_t job_id,
            uint32_t step_id,
            uint16_t max_wait,
            char *image_dir
       );

       int slurm_checkpoint_disable (
            uint32_t job_id,
            uint32_t step_id
       );

       int slurm_checkpoint_enable (
            uint32_t job_id,
            uint32_t step_id
       );

       int slurm_checkpoint_error (

            uint32_t job_id,
            uint32_t step_id,
            uint32_t *error_code,
            char ** error_msg
       );

       int slurm_checkpoint_restart (
            uint32_t job_id,
            uint32_t step_id,
            uint16_t stick,
            char *image_dir
       );

       int slurm_checkpoint_tasks (
            uint32_t job_id,
            uint32_t step_id,
            time_t begin_time,
            char *image_dir,
            uint16_t max_wait,
            char *nodelist
       );

       int slurm_checkpoint_vacate (
            uint32_t job_id,
            uint32_t step_id,
            uint16_t max_wait,
            char *image_dir
       );

ARGUMENTS

       begin_time
              When to begin the operation.

       error_code
              Error  code  for checkpoint operation. Only the highest value is
              preserved.

       error_msg
              Error message for checkpoint operation. Only the error_msg value
              for the highest error_code is preserved.

       image_dir
              Directory  specification for where the checkpoint file should be
              read from or written to. The default value is specified  by  the
              JobCheckpointDir SLURM configuration parameter.

       job_id SLURM job ID to perform the operation upon.

       max_wait
              Maximum  time to allow for the operation to complete in seconds.

       nodelist
              Nodes to send the request.

       start_time
              Time at which last checkpoint operation  began  (if  one  is  in
              progress), otherwise zero.

       step_id
              SLURM  job step ID to perform the operation upon.  May be NO_VAL
              if the operation  is  to  be  performed  on  all  steps  of  the
              specified job.  Specify SLURM_BATCH_SCRIPT to checkpoint a batch
              job.

       stick  If non-zero then restart the job on the same nodes that  it  was
              checkpointed from.

DESCRIPTION

       slurm_checkpoint_able  Report if checkpoint operations can presently be
       issued for the specified job step.  If yes, returns  SLURM_SUCCESS  and
       sets  start_time  if  checkpoint operation is presently active. Returns
       ESLURM_DISABLED if checkpoint operation is disabled.

       slurm_checkpoint_complete Note that a  requested  checkpoint  has  been
       completed.

       slurm_checkpoint_create  Request  a  checkpoint  for the identified job
       step.  Continue its execution upon completion of the checkpoint.

       slurm_checkpoint_disable    Make    the     identified     job     step
       non-checkpointable.    This   can   be  issued  as  needed  to  prevent
       checkpointing while a job step is in a critical section  or  for  other
       reasons.

       slurm_checkpoint_enable Make the identified job step checkpointable.

       slurm_checkpoint_error  Get error information about the last checkpoint
       operation for a given job step.

       slurm_checkpoint_restart Request that  a  previously  checkpointed  job
       resume  execution.   It  may continue execution on different nodes than
       were originally used.  Execution may be delayed if  resources  are  not
       immediately available.

       slurm_checkpoint_vacate  Request  a  checkpoint  for the identified job
       step.  Terminate its execution upon completion of the checkpoint.

RETURN VALUE

       Zero is returned upon success.  On error, -1 is returned, and the Slurm
       error code is set appropriately.

ERRORS

       ESLURM_INVALID_JOB_ID  the requested job or job step id does not exist.

       ESLURM_ACCESS_DENIED the requesting user lacks  authorization  for  the
       requested  action (e.g. trying to delete or modify another user’s job).

       ESLURM_JOB_PENDING the requested job is still pending.

       ESLURM_ALREADY_DONE the requested job has already completed.

       ESLURM_DISABLED the requested operation has been disabled for this  job
       step.   This  will  occur  when a request for checkpoint is issued when
       they have been disabled.

       ESLURM_NOT_SUPPORTED the requested operation is not supported  on  this
       system.

EXAMPLE

       #include <stdio.h>
       #include <stdlib.h>
       #include <slurm/slurm.h>
       #include <slurm/slurm_errno.h>

       int main (int argc, char *argv[])
       {
            uint32_t job_id, step_id;

            if (argc < 3) {
                 printf("Usage: %s job_id step_id\n", argv[0]);
                 exit(1);
            }

            job_id = atoi(argv[1]);
            step_id = atoi(argv[2]);
            if (slurm_checkpoint_disable(job_id, step_id)) {
                 slurm_perror ("slurm_checkpoint_error:");
                 exit (1);
            }
            exit (0);
       }

NOTE

       These  functions  are  included  in the libslurm library, which must be
       linked to your process for use (e.g. "cc -lslurm myprog.c").

COPYING

       Copyright (C) 2004-2007 The Regents of the  University  of  California.
       Copyright (C) 2008-2009 Lawrence Livermore National Security.  Produced
       at   Lawrence   Livermore   National   Laboratory   (cf,   DISCLAIMER).
       CODE-OCEC-09-009. All rights reserved.

       This  file  is  part  of  SLURM,  a  resource  management program.  For
       details, see <https://computing.llnl.gov/linux/slurm/>.

       SLURM is free software; you can redistribute it and/or modify it  under
       the  terms  of  the GNU General Public License as published by the Free
       Software Foundation; either version 2  of  the  License,  or  (at  your
       option) any later version.

       SLURM  is  distributed  in the hope that it will be useful, but WITHOUT
       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
       for more details.

SEE ALSO

       srun(1), squeue(1), free(3), slurm.conf(5)