bash - Collect exit codes of parallel background processes (sub shells) - Unix & Linux Stack Exchange

Jay Taylor's notes

bash - Collect exit codes of parallel background processes (sub shells) - Unix & Linux Stack Exchange

Original source (unix.stackexchange.com)

Tags: bash shell-scripting parallelism unix.stackexchange.com

Clipped on: 2020-05-21

Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It only takes a minute to sign up.

Say we have a bash script like so:

echo "x" &
echo "y" &
echo "z" &
.....
echo "Z" &
wait

is there a way to collect the exit codes of the sub shells / sub processes? Looking for way to do this and can't find anything. I need to run these subshells in parallel, otherwise yes this would be easier.

I am looking for a generic solution (I have an unknown/dynamic number of sub processes to run in parallel).

edited Feb 12 '17 at 20:19

asked Feb 12 '17 at 8:31

The answer by Alexander Mills which uses handleJobs gave me a great starting point, but also gave me this error

warning: run_pending_traps: bad value in trap_list[17]: 0x461010

Which may be a bash race-condition problem

Instead I did just store pid of each child and wait and gets exit code for each child specifically. I find this cleaner in terms of subprocesses spawning subprocesses in functions and avoiding the risk of waiting for a parent process where I meant to wait for child. Its clearer what happens because its not using the trap.

#!/usr/bin/env bash

# it seems it does not work well if using echo for function return value, and calling inside $() (is a subprocess spawned?) 
function wait_and_get_exit_codes() {
    children=("$@")
    EXIT_CODE=0
    for job in "${children[@]}"; do
       echo "PID => ${job}"
       CODE=0;
       wait ${job} || CODE=$?
       if [[ "${CODE}" != "0" ]]; then
           echo "At least one test failed with exit code => ${CODE}" ;
           EXIT_CODE=1;
       fi
   done
}

DIRN=$(dirname "$0");

commands=(
    "{ echo 'a'; exit 1; }"
    "{ echo 'b'; exit 0; }"
    "{ echo 'c'; exit 2; }"
    )

clen=`expr "${#commands[@]}" - 1` # get length of commands - 1

children_pids=()
for i in `seq 0 "$clen"`; do
    (echo "${commands[$i]}" | bash) &   # run the command via bash in subshell
    children_pids+=("$!")
    echo "$i ith command has been issued as a background job"
done
# wait; # wait for all subshells to finish - its still valid to wait for all jobs to finish, before processing any exit-codes if we wanted to
#EXIT_CODE=0;  # exit code of overall script
wait_and_get_exit_codes "${children_pids[@]}"

echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end

edited Apr 12 '18 at 15:48

answered Apr 11 '18 at 7:06

arberg

17611 silver badge44 bronze badges

cool, I think for job in "${childen[@]}"; do should be for job in "${1}"; do though, for clarity – Alexander Mills Apr 11 '18 at 12:31
the only concern I have with this script, is if children_pids+=("$!") is actually capturing the desired pid for the sub shell. – Alexander Mills Apr 11 '18 at 12:40
1

I tested with "${1}" and it doesn't work. I'm passing an array to the function, and apparently that needs special attention in bash. $! is the pid of the last spawned job, see tldp.org/LDP/abs/html/internalvariables.html It seems to work correctly in my tests, and I'm now using in the in unRAID cache_dirs script, and it seems to do its job. I'm using bash 4.4.12. – arberg Apr 12 '18 at 15:51
nice yep seems like you are correct – Alexander Mills Apr 12 '18 at 16:15

add a comment

Use wait with a PID, which will:

Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for.

You'll need to save the PID of each process as you go:

echo "x" & X=$!
echo "y" & Y=$!
echo "z" & Z=$!

You can also enable job control in the script with set -m and use a %n jobspec, but you almost certainly don't want to - job control has a lot of other side effects.

wait will return the same code as the process finished with. You can use wait $X at any (reasonable) later point to access the final code as $? or simply use it as true/false:

echo "x" & X=$!
echo "y" & Y=$!
...
wait $X
echo "job X returned $?"

wait will pause until the command completes if it hasn't already.

If you want to avoid stalling like that, you can set a trap on SIGCHLD, count the number of terminations, and handle all the waits at once when they've all finished. You can probably get away with using wait alone almost all the time.

edited Feb 12 '17 at 9:31

answered Feb 12 '17 at 9:25

If you had a good way to identify the commands, you could print their exit code to a tmp file and then access the specific file you're interested in:

#!/bin/bash

for i in `seq 1 5`; do
    ( sleep $i ; echo $? > /tmp/cmd__${i} ) &
done

wait

for i in `seq 1 5`; do # or even /tmp/cmd__*
    echo "process $i:"
    cat /tmp/cmd__${i}
done

Don't forget to remove the tmp files.

edited Feb 14 '17 at 8:12

answered Feb 14 '17 at 8:05

Use a compound command - put the statement in parentheses:

( echo "x" ; echo X: $? ) &
( true ; echo TRUE: $? ) &
( false ; echo FALSE: $? ) &

will give the output

x
X: 0
TRUE: 0
FALSE: 1

A really different way to run several commands in parallel is by using GNU Parallel. Make a list of commands to run and put them in the file list:

cat > list
sleep 2 ; exit 7
sleep 3 ; exit 55
^D

Run all the commands in parallel and collect the exit codes in the file job.log:

cat list | parallel -j0 --joblog job.log
cat job.log

and the output is:

Seq     Host    Starttime       JobRuntime      Send    Receive Exitval Signal  Command
1       :       1486892487.325       1.976      0       0       7       0       sleep 2 ; exit 7
2       :       1486892487.326       3.003      0       0       55      0       sleep 3 ; exit 55

edited Feb 12 '17 at 13:18

this is the generic script you're looking for. The only downside is your commands are in quotes which means syntax highlighting via your IDE will not really work. Otherwise, I have tried a couple of the other answers and this is the best one. This answer incorporates the idea of using wait <pid> given by @Michael but goes a step further by using the trap command which seems to work best.

#!/usr/bin/env bash

set -m # allow for job control
EXIT_CODE=0;  # exit code of overall script

function handleJobs() {
     for job in `jobs -p`; do
         echo "PID => ${job}"
         CODE=0;
         wait ${job} || CODE=$?
         if [[ "${CODE}" != "0" ]]; then
         echo "At least one test failed with exit code => ${CODE}" ;
         EXIT_CODE=1;
         fi
     done
}

trap 'handleJobs' CHLD  # trap command is the key part
DIRN=$(dirname "$0");

commands=(
    "{ echo 'a'; exit 1; }"
    "{ echo 'b'; exit 0; }"
    "{ echo 'c'; exit 2; }"
)

clen=`expr "${#commands[@]}" - 1` # get length of commands - 1

for i in `seq 0 "$clen"`; do
    (echo "${commands[$i]}" | bash) &   # run the command via bash in subshell
    echo "$i ith command has been issued as a background job"
done

wait; # wait for all subshells to finish

echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end

thanks to @michael homer for getting me on the right track, but using trap command is the best approach AFAICT.

edited Feb 18 '17 at 22:40

answered Feb 12 '17 at 11:23

Also "wait -n" will wait for any child and then return the exit status of that child in the $? variable. So you can print progress as each one exits. However note that unless you use the CHLD trap, you may miss some child exits that way. – Chunko Feb 12 '17 at 13:52

@Chunko thanks! that is good info, could you maybe update the answer with something you think is best? – Alexander Mills Feb 12 '17 at 20:04

thanks @Chunko, trap works better, you're right. With wait <pid>, I got fallthrough. – Alexander Mills Feb 13 '17 at 9:37

Can you explain how and why you believe the version with the trap is better than the one without it? (I believe that it’s no better, and therefore that it is worse, because it is more complex with no benefit.) – Scott Mar 29 '18 at 6:54

show 1 more comment

Another variation of @rolf 's answer:

Another way to save the exit status would be something like

mkdir /tmp/status_dir

and then have each script

script_name="${0##*/}"  ## strip path from script name
tmpfile="/tmp/status_dir/${script_name}.$$"
do something
rc=$?
echo "$rc" > "$tmpfile"

This gives you a unique name for each status file including the name of the script which created it and its process id (in case more than one instance of the same script is running) which you can save for reference later and puts them all in the same place so you can just delete the whole subdirectory when you're done.

You can even save more than one status from each script by doing something like

tmpfile="$(/bin/mktemp -q "/tmp/status_dir/${script_name}.$$.XXXXXX")"

which creates the file as before, but adds a unique random string to it.

Or, you can just append more status information to the same file.

answered Feb 18 '17 at 19:38

script3 will be executed only if script1 and script2 are successful and script1 and script2 will be executed in parallel:

./script1 &
process1=$!

./script2 &
process2=$!

wait $process1
rc1=$?

wait $process2
rc2=$?

if [[ $rc1 -eq 0 ]] && [[ $rc2 -eq 0  ]];then
./script3
fi