Thursday, 22 November 2012

Bash locking function to prevent simultaneous running of a given script.

Quite often one wants to run a periodic serial process from a linux shell. This process is not intended to run concurrently with itself.

There are many ways to achieve this. One good way is to use flock, but this may not always be available on your system of choice. We need an atomic operation in order to decide if we can grab a unique lock for our process, and mkdir is a pretty good choice. mkdir can be used to create a semaphore directory polled by all invocations of our process.

The below script uses mkdir, and if it successfully created the directory, writes the PID for itself inside that dir. When the script finishes and exits, it cleans up the directory. We also set some traps to clean up in case of a few interrupt conditions.

If the script runs and finds it cannot acquire the lock because the directory already exists, it tests why this is the case. If the PID still exists then it sends out an email to warn you a job may be overrunning unexpectedly. If the PID no longer exists, it concludes the job was killed quite rudely, and cleans up the lock directory, as well as emailing an alert. The script will run upon the next invocation.

You can invoke the script with the lock directory name, so you can in effect deliberately run multiple instances of your process, locked using different dir names. With this it is possible to have several process 'streams' as it were, say A, B and C, and make sure A is locked against all other occurrences of A, B against B, C against C and so on. That is, you may have a process that runs for each customer, and that should only be running once at any given moment. But you may want to run several customers simultaneously. Just call the lockscript with customerA, customerB etc as the argument.

This is limited to a single server, and possible improvements would include enhancing it to cope with locking processes running on multiple servers, by using some shared resource (mounted filesystems, database) for the locking semaphore.


#!/bin/bash
#
# locking function: This function must be called
# with <lockdirname> as the FIRST and only argument.
# This is for the bash file-locking mechanism
# exit codes adhere to http://tldp.org/LDP/abs/html/exitcodes.html#EXITCODESREF

lock () {

 USAGE="usage: lock <lockdirname>"
 NOOPTION="You must specify the lockdir name. Exiting"

 [ -z "$1" ] && echo $USAGE && echo $NOOPTION && exit 64
 EXECUTION=$1

 SUPPORTMAIL=root
 export APP_HOME=`dirname "$0"`
 [ -z "$APP_HOME" ] && echo Could not determine base directory - Exiting && exit 71

 LOCKDIR=${APP_HOME}/$EXECUTION.lock

 if mkdir $LOCKDIR
 then

        echo >&2 "$0: successfully acquired lockdir $LOCKDIR at `date` "
        # Remove LOCKDIR when the script finishes, or when it receives a signal
        trap '{ echo Cleaning up lockdir $LOCKDIR ...; rm -rf "$LOCKDIR"; echo done, exiting; }' 0 EXIT   # remove directory when script finishes
        trap "{ echo Caught SIGHUP; exit 129; }" 1 SIGHUP   # exit with 128+n 
        trap "{ echo Caught SIGINT; exit 130; }" 2 SIGINT   # exit with 128+n 
        trap "{ echo Caught SIGQUIT; exit 131; }" 3 SIGQUIT  # exit with 128+n
        trap "{ echo Caught SIGTERM; exit 143; }" 15 SIGTERM # exit with 128+n
        # put PID of this process into the $LOCKDIR, so we can check this if the next invocation fails to run
        echo $$ > $LOCKDIR/PID

 else

        echo >&2 "$0: WARNING! $LOCKDIR present - aborting process - reason follows:"

        PID=`cat $LOCKDIR/PID`
        if  kill -0 $PID
        then
                # process is still running
                echo >&2 "$0: REASON - there is a lingering process, $PID"
                echo -e "$0: PID= $PID \n `ps -lyf $PID `" | mail -s "Aborting $0 - old process still running (error 1001)" $SUPPORTMAIL
                exit 1001
        else
                # process is not running, but lock file not deleted?
                echo >&2 "$0: REASON - orphan lockdir. Host process $PID is gone, so lockdir will now be deleted."
                echo "$0: Lockdir will be deleted. Process should run at next invocation" | mail -s "Aborting $0 - orphan lockdir (error 1002)" $SUPPORTMAIL
                rm -rf $LOCKDIR
                exit 1002
        fi

 fi
 # End
 return 0
}

No comments:

Post a Comment