Jay Taylor's notes

back to listing index

bash - Collect exit codes of parallel background processes (sub shells) - Unix & Linux Stack Exchange

[web search]
Original source (unix.stackexchange.com)
Tags: bash shell-scripting parallelism
Clipped on: 2020-05-21

Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It only takes a minute to sign up.

Sign up to join this community
Image (Asset 1/12) alt=

Say we have a bash script like so:

echo "x" &
echo "y" &
echo "z" &
.....
echo "Z" &
wait

is there a way to collect the exit codes of the sub shells / sub processes? Looking for way to do this and can't find anything. I need to run these subshells in parallel, otherwise yes this would be easier.

I am looking for a generic solution (I have an unknown/dynamic number of sub processes to run in parallel).

asked Feb 12 '17 at 8:31
Image (Asset 2/12) alt=

The answer by Alexander Mills which uses handleJobs gave me a great starting point, but also gave me this error

warning: run_pending_traps: bad value in trap_list[17]: 0x461010

Which may be a bash race-condition problem

Instead I did just store pid of each child and wait and gets exit code for each child specifically. I find this cleaner in terms of subprocesses spawning subprocesses in functions and avoiding the risk of waiting for a parent process where I meant to wait for child. Its clearer what happens because its not using the trap.

#!/usr/bin/env bash

# it seems it does not work well if using echo for function return value, and calling inside $() (is a subprocess spawned?) 
function wait_and_get_exit_codes() {
    children=("$@")
    EXIT_CODE=0
    for job in "${children[@]}"; do
       echo "PID => ${job}"
       CODE=0;
       wait ${job} || CODE=$?
       if [[ "${CODE}" != "0" ]]; then
           echo "At least one test failed with exit code => ${CODE}" ;
           EXIT_CODE=1;
       fi
   done
}

DIRN=$(dirname "$0");

commands=(
    "{ echo 'a'; exit 1; }"
    "{ echo 'b'; exit 0; }"
    "{ echo 'c'; exit 2; }"
    )

clen=`expr "${#commands[@]}" - 1` # get length of commands - 1

children_pids=()
for i in `seq 0 "$clen"`; do
    (echo "${commands[$i]}" | bash) &   # run the command via bash in subshell
    children_pids+=("$!")
    echo "$i ith command has been issued as a background job"
done
# wait; # wait for all subshells to finish - its still valid to wait for all jobs to finish, before processing any exit-codes if we wanted to
#EXIT_CODE=0;  # exit code of overall script
wait_and_get_exit_codes "${children_pids[@]}"

echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end
answered Apr 11 '18 at 7:06
arberg
17611 silver badge44 bronze badges
21

Use wait with a PID, which will:

Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for.

You'll need to save the PID of each process as you go:

echo "x" & X=$!
echo "y" & Y=$!
echo "z" & Z=$!

You can also enable job control in the script with set -m and use a %n jobspec, but you almost certainly don't want to - job control has a lot of other side effects.

wait will return the same code as the process finished with. You can use wait $X at any (reasonable) later point to access the final code as $? or simply use it as true/false:

echo "x" & X=$!
echo "y" & Y=$!
...
wait $X
echo "job X returned $?"

wait will pause until the command completes if it hasn't already.

If you want to avoid stalling like that, you can set a trap on SIGCHLD, count the number of terminations, and handle all the waits at once when they've all finished. You can probably get away with using wait alone almost all the time.

answered Feb 12 '17 at 9:25
Image (Asset 3/12) alt=

If you had a good way to identify the commands, you could print their exit code to a tmp file and then access the specific file you're interested in:

#!/bin/bash

for i in `seq 1 5`; do
    ( sleep $i ; echo $? > /tmp/cmd__${i} ) &
done

wait

for i in `seq 1 5`; do # or even /tmp/cmd__*
    echo "process $i:"
    cat /tmp/cmd__${i}
done

Don't forget to remove the tmp files.

answered Feb 14 '17 at 8:05
Image (Asset 4/12) alt=

Use a compound command - put the statement in parentheses:

( echo "x" ; echo X: $? ) &
( true ; echo TRUE: $? ) &
( false ; echo FALSE: $? ) &

will give the output

x
X: 0
TRUE: 0
FALSE: 1

A really different way to run several commands in parallel is by using GNU Parallel. Make a list of commands to run and put them in the file list:

cat > list
sleep 2 ; exit 7
sleep 3 ; exit 55
^D

Run all the commands in parallel and collect the exit codes in the file job.log:

cat list | parallel -j0 --joblog job.log
cat job.log

and the output is:

Seq     Host    Starttime       JobRuntime      Send    Receive Exitval Signal  Command
1       :       1486892487.325       1.976      0       0       7       0       sleep 2 ; exit 7
2       :       1486892487.326       3.003      0       0       55      0       sleep 3 ; exit 55
Image (Asset 5/12) alt=

this is the generic script you're looking for. The only downside is your commands are in quotes which means syntax highlighting via your IDE will not really work. Otherwise, I have tried a couple of the other answers and this is the best one. This answer incorporates the idea of using wait <pid> given by @Michael but goes a step further by using the trap command which seems to work best.

#!/usr/bin/env bash

set -m # allow for job control
EXIT_CODE=0;  # exit code of overall script

function handleJobs() {
     for job in `jobs -p`; do
         echo "PID => ${job}"
         CODE=0;
         wait ${job} || CODE=$?
         if [[ "${CODE}" != "0" ]]; then
         echo "At least one test failed with exit code => ${CODE}" ;
         EXIT_CODE=1;
         fi
     done
}

trap 'handleJobs' CHLD  # trap command is the key part
DIRN=$(dirname "$0");

commands=(
    "{ echo 'a'; exit 1; }"
    "{ echo 'b'; exit 0; }"
    "{ echo 'c'; exit 2; }"
)

clen=`expr "${#commands[@]}" - 1` # get length of commands - 1

for i in `seq 0 "$clen"`; do
    (echo "${commands[$i]}" | bash) &   # run the command via bash in subshell
    echo "$i ith command has been issued as a background job"
done

wait; # wait for all subshells to finish

echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end

thanks to @michael homer for getting me on the right track, but using trap command is the best approach AFAICT.

answered Feb 12 '17 at 11:23
Image (Asset 6/12) alt=
  • 1
    Also "wait -n" will wait for any child and then return the exit status of that child in the $? variable. So you can print progress as each one exits. However note that unless you use the CHLD trap, you may miss some child exits that way. – Chunko Feb 12 '17 at 13:52
  • @Chunko thanks! that is good info, could you maybe update the answer with something you think is best? – Alexander Mills Feb 12 '17 at 20:04
  • thanks @Chunko, trap works better, you're right. With wait <pid>, I got fallthrough. – Alexander Mills Feb 13 '17 at 9:37
  • Can you explain how and why you believe the version with the trap is better than the one without it?  (I believe that it’s no better, and therefore that it is worse, because it is more complex with no benefit.) – Scott Mar 29 '18 at 6:54
  • 1

    Another variation of @rolf 's answer:

    Another way to save the exit status would be something like

    mkdir /tmp/status_dir

    and then have each script

    script_name="${0##*/}"  ## strip path from script name
    tmpfile="/tmp/status_dir/${script_name}.$$"
    do something
    rc=$?
    echo "$rc" > "$tmpfile"

    This gives you a unique name for each status file including the name of the script which created it and its process id (in case more than one instance of the same script is running) which you can save for reference later and puts them all in the same place so you can just delete the whole subdirectory when you're done.

    You can even save more than one status from each script by doing something like

    tmpfile="$(/bin/mktemp -q "/tmp/status_dir/${script_name}.$$.XXXXXX")"

    which creates the file as before, but adds a unique random string to it.

    Or, you can just append more status information to the same file.

    answered Feb 18 '17 at 19:38
    Image (Asset 7/12) alt=

    script3 will be executed only if script1 and script2 are successful and script1 and script2 will be executed in parallel:

    ./script1 &
    process1=$!
    
    ./script2 &
    process2=$!
    
    wait $process1
    rc1=$?
    
    wait $process2
    rc2=$?
    
    if [[ $rc1 -eq 0 ]] && [[ $rc2 -eq 0  ]];then
    ./script3
    fi
    Image (Asset 8/12) alt= Sign up using Google
    Sign up using Facebook
    Sign up using Email and Password

    Post as a guest

    Name
    Email

    Required, but never shown

    By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

    Not the answer you're looking for? Browse other questions tagged or ask your own question.

    Hot Network Questions