net.misc

Thursday, 22 November 2012

Bash locking function to prevent simultaneous running of a given script.

Quite often one wants to run a periodic serial process from a linux shell. This process is not intended to run concurrently with itself.

There are many ways to achieve this. One good way is to use flock, but this may not always be available on your system of choice. We need an atomic operation in order to decide if we can grab a unique lock for our process, and mkdir is a pretty good choice. mkdir can be used to create a semaphore directory polled by all invocations of our process.

The below script uses mkdir, and if it successfully created the directory, writes the PID for itself inside that dir. When the script finishes and exits, it cleans up the directory. We also set some traps to clean up in case of a few interrupt conditions.

If the script runs and finds it cannot acquire the lock because the directory already exists, it tests why this is the case. If the PID still exists then it sends out an email to warn you a job may be overrunning unexpectedly. If the PID no longer exists, it concludes the job was killed quite rudely, and cleans up the lock directory, as well as emailing an alert. The script will run upon the next invocation.

You can invoke the script with the lock directory name, so you can in effect deliberately run multiple instances of your process, locked using different dir names. With this it is possible to have several process 'streams' as it were, say A, B and C, and make sure A is locked against all other occurrences of A, B against B, C against C and so on. That is, you may have a process that runs for each customer, and that should only be running once at any given moment. But you may want to run several customers simultaneously. Just call the lockscript with customerA, customerB etc as the argument.

This is limited to a single server, and possible improvements would include enhancing it to cope with locking processes running on multiple servers, by using some shared resource (mounted filesystems, database) for the locking semaphore.

#!/bin/bash
#
# locking function: This function must be called
# with <lockdirname> as the FIRST and only argument.
# This is for the bash file-locking mechanism
# exit codes adhere to http://tldp.org/LDP/abs/html/exitcodes.html#EXITCODESREF

lock () {

 USAGE="usage: lock <lockdirname>"
 NOOPTION="You must specify the lockdir name. Exiting"

 [ -z "$1" ] && echo $USAGE && echo $NOOPTION && exit 64
 EXECUTION=$1

 SUPPORTMAIL=root
 export APP_HOME=`dirname "$0"`
 [ -z "$APP_HOME" ] && echo Could not determine base directory - Exiting && exit 71

 LOCKDIR=${APP_HOME}/$EXECUTION.lock

 if mkdir $LOCKDIR
 then

        echo >&2 "$0: successfully acquired lockdir $LOCKDIR at `date` "
        # Remove LOCKDIR when the script finishes, or when it receives a signal
        trap '{ echo Cleaning up lockdir $LOCKDIR ...; rm -rf "$LOCKDIR"; echo done, exiting; }' 0 EXIT   # remove directory when script finishes
        trap "{ echo Caught SIGHUP; exit 129; }" 1 SIGHUP   # exit with 128+n 
        trap "{ echo Caught SIGINT; exit 130; }" 2 SIGINT   # exit with 128+n 
        trap "{ echo Caught SIGQUIT; exit 131; }" 3 SIGQUIT  # exit with 128+n
        trap "{ echo Caught SIGTERM; exit 143; }" 15 SIGTERM # exit with 128+n
        # put PID of this process into the $LOCKDIR, so we can check this if the next invocation fails to run
        echo $$ > $LOCKDIR/PID

 else

        echo >&2 "$0: WARNING! $LOCKDIR present - aborting process - reason follows:"

        PID=`cat $LOCKDIR/PID`
        if  kill -0 $PID
        then
                # process is still running
                echo >&2 "$0: REASON - there is a lingering process, $PID"
                echo -e "$0: PID= $PID \n `ps -lyf $PID `" | mail -s "Aborting $0 - old process still running (error 1001)" $SUPPORTMAIL
                exit 1001
        else
                # process is not running, but lock file not deleted?
                echo >&2 "$0: REASON - orphan lockdir. Host process $PID is gone, so lockdir will now be deleted."
                echo "$0: Lockdir will be deleted. Process should run at next invocation" | mail -s "Aborting $0 - orphan lockdir (error 1002)" $SUPPORTMAIL
                rm -rf $LOCKDIR
                exit 1002
        fi

 fi
 # End
 return 0
}

Wednesday, 21 November 2012

No matter how many times I have to set up an automated tomcat init daemon, my fragile memory cannot quite recall all the best bits at one time.

Kind of like trying to remember all three Starsky And Hutch theme tunes in one sitting; by the time you get to the end of humming theme number two, your grasp on theme number three, which seemed firm at the outset of humming theme number one, is already slipping inexorably through your fingers. Your brain-fingers.

So after a decade of forgetting, or making notes in forgotten wikis, I'm putting it here so I can sit back and forget it's even here.

I need to remember to set the necessary environment variables first, start/stop/check/kill the tomcat processes and set the run-levels, amongst other things.

Here it is, and if anyone finds it, I hope it helps. Any good tweaks would be appreciated.

Possible improvements include use of PID files instead of a grep/pgrep to determine the state of running processes, although pitfalls lie in wait down that path too.

It can be modified to cope with multi-homed tomcat installs fairly trivially, by finding the relevant process-unique grep terms.

#!/bin/sh

### BEGIN INIT INFO
# Provides:          tomcat7
# Required-Start:    $local_fs $network
# Required-Stop:     $local_fs $network
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Example initscript for tomcat 7 with single base
### END INIT INFO


JAVA_HOME=/home/youruser/opt/jre1.7.0_09
JAVA_OPTS="-Duser.language=en -Duser.region=GB"
CATALINA_HOME=/home/youruser/apache-tomcat-7.0.32
CATALINA_OPTS="-Xmx900m -Xms128m"
TOMCAT_USER=youruser

export JAVA_HOME CATALINA_HOME CATALINA_OPTS JAVA_OPTS

start_tomcat=$CATALINA_HOME/bin/startup.sh
stop_tomcat=$CATALINA_HOME/bin/shutdown.sh


tomcat_procs() {
        pgrep -u $TOMCAT_USER -f $CATALINA_HOME
}

start() {
        # check if tomcat is already running
       if [ `tomcat_procs | wc -l` -gt 0 ]; then
           echo Tomcat is already running? Investigate process $(tomcat_procs)
           exit 1
        fi

        echo Starting tomcat:
        su -c ${start_tomcat} $TOMCAT_USER
        echo Tomcat is running with the following process: $(tomcat_procs)
}

stop() {
        # check it is actually running 
        if [ `tomcat_procs | wc -l` -eq 0 ]; then
           echo Tomcat is already stopped!
           exit 1
        fi

        echo Shutting down tomcat: 
        ${stop_tomcat}

        echo "issued stop command - waiting for process  $(tomcat_procs) to end"

        for i in {1..60}
        do
            sleep 1
            procs=`tomcat_procs | wc -l`
            if [ $procs -eq 0 ]; then
                echo && echo Stopped
                break
            elif [ $procs -eq 1 ]; then
                if [ $i -lt 55 ]; then
                    echo -n .
                else
                    echo && echo exhausted allocated wait time -  killing outstanding tomcat process... $(tomcat_procs)
                    pkill -u $TOMCAT_USER -f $CATALINA_HOME
                fi
            else
               echo Tomcat is running more than once? Check it out! $(tomcat_procs)
                break
            fi
        done
        echo done.
}

status() {
        procs=`tomcat_procs | wc -l`
        if [ $procs -eq 0 ]; then
           echo Tomcat is stopped
        elif [ $procs -eq 1 ]; then
           echo Tomcat is running with the following process: $(tomcat_procs)
        else
           echo Tomcat is running more than once? Investigate! $(tomcat_procs)
        fi
}

# See how we were called
case "$1" in

  start)
        start
        ;;
  stop)
        stop
        ;;
  status)
        status
        ;;
  restart)
        stop
        sleep 5
        start
        ;;
  *)
        echo "Usage: $0 {start|stop|restart|status}"
esac

exit 0

Edit this script to adjust the pathnames and usernames for your own circumstances. Then place it in /etc/init.d, make sure it has the execute bit set, and then use the appropriate system command to install the links into the desired runlevel directories. Since this is a fairly late starting process, something like start priority 98, stop priority 02 should be good. You can make the links by hand, but using the built in tools is usually the safest bet: e.g. on Debian derivatives such as Ubuntu

# update-rc.d tomcat7 defaults 98 02

You can man insserv or update-rc.d for useful info on the LSB dependency information. Or, on RH derivatives, something like

# chkconfig --add tomcat7
# chkconfig tomcat7 on

Now, all you have to do is set up a Linux Container (LXC) to be running your tomcat instance inside, and you'll be spreading some service goodness all around your organisation.