Discussion:
[Help-bash] Waiting for a sub-process to finish
João Eiras
2017-06-01 20:11:04 UTC
Permalink
Hello !

Here's my problem. I have a script that calls an external program that
output its stuff in plain text. I'm piping the output to a process of mine
to compress it. Something like

long_command -o >(gzip -c > file.gz)

I do not want to touch stdout nor stderr as these can be clobbered by
status messages, and the correct way to receive the output is using the -o
parameter.

Immediately after the program finishes, I want to check that the produced
file is OK, regarding contents and whatnot. So something like.

long_command -o >(gzip -c > file.gz)
if gzip -dc file.gz | grep -q bad_line; then
echo some error
fi

That second like with the call to "gzip -dc" often returns an error:
gzip: abort: corrupted input -- invalid deflate data

This happens because the subprocess ">(gzip -c > file.gz)" has not had time
to finish and close the output file.

Question: How can I force bash to wait until all subprocesses to finish ?

'wait' does not work here as these are not background tasks managed by the
job control.

A simple testcase follows. "my_command_3 ended" should be printed before
"my_command_4 ran" is.

Thank you !
_________________________________________

#!/bin/bash
function my_command_1 {
echo my_command_1 started >&2
cat "$1" | sed 's/$/+2/'
echo my_command_1 ended >&2
}
function my_command_2 {
echo my_command_2 started >&2
echo 1
echo my_command_2 ended >&2
}
function my_command_3 {
echo my_command_3 started >&2
cat | sed 's/$/+3/'
sleep 1
echo my_command_3 ended >&2
}
function my_command_4 {
echo my_command_4 ran >&2
}
my_command_1 <(my_command_2) > >(my_command_3)
wait # Doesn't work?
my_command_4
sleep 1.1
Greg Wooledge
2017-06-01 20:27:35 UTC
Permalink
Post by João Eiras
long_command -o >(gzip -c > file.gz)
This happens because the subprocess ">(gzip -c > file.gz)" has not had time
to finish and close the output file.
Question: How can I force bash to wait until all subprocesses to finish ?
'wait' does not work here as these are not background tasks managed by the
job control.
The most portable answer would be to create your own named pipe, and run
the long_command in the background and the gzip in the foreground:

mkfifo mypipe
long_command -o mypipe &
gzip -c <mypipe >file.gz
wait

(You could also run gzip in the background and let wait wait for it,
but there doesn't seem to be any real gain there.)

Other than that, bash 4.4 added this new feature:

u. Bash now allows waiting for the most recent process substitution, since it
appears as $!.

I believe you have to explicitly pass the PID to wait for those, rather
than just calling wait with 0 arguments, but it's not something I've
played with much.
Chet Ramey
2017-06-01 20:35:17 UTC
Permalink
Post by Greg Wooledge
u. Bash now allows waiting for the most recent process substitution, since it
appears as $!.
I believe you have to explicitly pass the PID to wait for those, rather
than just calling wait with 0 arguments, but it's not something I've
played with much.
It's always appeared as $!, and you can wait for it the same as any other
process. The changelog entry refers to wait without arguments, which is
supposed to wait for all background processes.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://cnswww.cns.cwru.edu/~chet/
João Eiras
2017-06-01 20:47:14 UTC
Permalink
Post by Chet Ramey
Post by Greg Wooledge
u. Bash now allows waiting for the most recent process substitution, since it
appears as $!.
I believe you have to explicitly pass the PID to wait for those, rather
than just calling wait with 0 arguments, but it's not something I've
played with much.
It's always appeared as $!, and you can wait for it the same as any other
process. The changelog entry refers to wait without arguments, which is
supposed to wait for all background processes.
I'll try bash 4.4 when I get the change, but not an option to upgrade
now, since I have to run the script in many machines. Portability is
not an issue as the machines all have the same configuration.

I just tried in bash 4.3 "wait $!" and I got an error message

my_command_1 <(my_command_2) > >(my_command_3)
wait $!

wait: pid XXXXX is not a child of this shell

So, I think the feature does not work at all in bash 4.3. But I'm
happy to know it has been added/fixed for future versions.

Using a namedpipe is something I want to avoid. This section of the
script is called hundreds of time in parallel, so the less poking the
file system with temporary files, the better.

I came up with something that I had been using elsewhere, by
implementing my own "wait". Not pretty but it works

Thank you very much for your time !
______
function wait_for_subprocesses {
function get_subprocesses {
# @param[1] parent pid
# @param[2] blacklisted pid
for p in $(pgrep -P $1); do
[[ "$p" = "$2" ]] && continue
echo $p
get_subprocesses "$p" "$2"
done
}
local thispid=$BASHPID
while [[ 1 ]]; do
local sub=( $(get_subprocesses $thispid $BASHPID) )
[[ ${#sub[@]} = 0 ]] && break
sleep 0.005
done
}
João Eiras
2017-06-01 21:11:43 UTC
Permalink
Talking generically about process substitution

command <(generate_input) > >(collect_output)
second_co

Looking at that code I would always expect the three tasks to be
completed before "second_command" executed, since there is no
ampersand (&) being used anywhere. Else, every single time there is
some process substitution, one would be forced to use job control plus
"wait", which does not make that much sense as "jobs" yields nothing
and $! only returns the last process.

So, every single instruction that adds a substituted process, should
be followed by an implicit "wait ${substituted_pids[@]}". At the very
least I'd like see such behavior being a shell option
(https://www.gnu.org/software/bash/manual/html_node/The-Shopt-Builtin.html)
Or, at least "jobs" should be populated accordingly and "wait" work as
such, which is what bash 4.4 does I presume.

Cheers.
Chet Ramey
2017-06-07 12:15:52 UTC
Permalink
Post by João Eiras
Talking generically about process substitution
command <(generate_input) > >(collect_output)
second_co
Looking at that code I would always expect the three tasks to be
completed before "second_command" executed, since there is no
ampersand (&) being used anywhere. Else, every single time there is
some process substitution, one would be forced to use job control plus
"wait", which does not make that much sense as "jobs" yields nothing
and $! only returns the last process.
So, every single instruction that adds a substituted process, should
least I'd like see such behavior being a shell option
(https://www.gnu.org/software/bash/manual/html_node/The-Shopt-Builtin.html)
Or, at least "jobs" should be populated accordingly and "wait" work as
such, which is what bash 4.4 does I presume.
Bash-4.4 allows `wait' to wait for the last process substitution, since it
has always set $!. It was an oversight that it wasn't possible before that.
Bash-4.3 and earlier tried to verify that `wait' was waiting for a child
of the shell instead of letting waitpid return an error.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://cnswww.cns.cwru.edu/~chet/
Russell Lewis
2017-06-01 21:45:06 UTC
Permalink
Copying my reply to the list.
---------- Forwarded message ----------
From: "Russell Lewis" <***@gmail.com>
Date: Jun 1, 2017 14:44
Subject: Re: [Help-bash] Waiting for a sub-process to finish
To: "João Eiras" <***@gmail.com>
Cc:

A simple (though non-intuitive) solution is to pipe the entire command into
cat:
long_command -o >(gzip -c > file.gz) | cat

Since both long_command and gzip have file handles to the same stdout, and
cat is reading from it, cat will block until both commands close their
stdout.

Of course, this could fail if they both close their stdout before the
process exits, but that's rare.

(I wish I remember where I found this, so I could give proper attribution.
Somewhere on StackOverflow, probably.)

Russ

On Jun 1, 2017 13:11, "João Eiras" <***@gmail.com> wrote:

Hello !

Here's my problem. I have a script that calls an external program that
output its stuff in plain text. I'm piping the output to a process of mine
to compress it. Something like

long_command -o >(gzip -c > file.gz)

I do not want to touch stdout nor stderr as these can be clobbered by
status messages, and the correct way to receive the output is using the -o
parameter.

Immediately after the program finishes, I want to check that the produced
file is OK, regarding contents and whatnot. So something like.

long_command -o >(gzip -c > file.gz)
if gzip -dc file.gz | grep -q bad_line; then
echo some error
fi

That second like with the call to "gzip -dc" often returns an error:
gzip: abort: corrupted input -- invalid deflate data

This happens because the subprocess ">(gzip -c > file.gz)" has not had time
to finish and close the output file.

Question: How can I force bash to wait until all subprocesses to finish ?

'wait' does not work here as these are not background tasks managed by the
job control.

A simple testcase follows. "my_command_3 ended" should be printed before
"my_command_4 ran" is.

Thank you !
_________________________________________

#!/bin/bash
function my_command_1 {
echo my_command_1 started >&2
cat "$1" | sed 's/$/+2/'
echo my_command_1 ended >&2
}
function my_command_2 {
echo my_command_2 started >&2
echo 1
echo my_command_2 ended >&2
}
function my_command_3 {
echo my_command_3 started >&2
cat | sed 's/$/+3/'
sleep 1
echo my_command_3 ended >&2
}
function my_command_4 {
echo my_command_4 ran >&2
}
my_command_1 <(my_command_2) > >(my_command_3)
wait # Doesn't work?
my_command_4
sleep 1.1
Bob Proulx
2017-06-02 01:08:50 UTC
Permalink
Post by João Eiras
long_command -o >(gzip -c > file.gz)
if gzip -dc file.gz | grep -q bad_line; then
...
Post by João Eiras
This happens because the subprocess ">(gzip -c > file.gz)" has not had time
to finish and close the output file.
...
Post by João Eiras
Question: How can I force bash to wait until all subprocesses to finish ?
I complained about this exact same problem back in 2007. And
apparently I was the first complaint about it. :-}

https://lists.gnu.org/archive/html/bug-bash/2007-09/msg00019.html

Some years later I learned the trick of "| cat" to use closure of the
file handles to join these asynchronous processes up. But I still
avoid it because I think it would be confusing to later programmers
reading the code.

I still think this is surprising behavior and something that the bash
should wait for automatically.

Bob
Russell Lewis
2017-06-02 14:46:06 UTC
Permalink
Indeed, "| cat" is subtle. But it works so well, I use it - just with lots
of comments to explain it, every time. :)

Russ
Post by Bob Proulx
Post by João Eiras
long_command -o >(gzip -c > file.gz)
if gzip -dc file.gz | grep -q bad_line; then
...
Post by João Eiras
This happens because the subprocess ">(gzip -c > file.gz)" has not had
time
Post by João Eiras
to finish and close the output file.
...
Post by João Eiras
Question: How can I force bash to wait until all subprocesses to finish ?
I complained about this exact same problem back in 2007. And
apparently I was the first complaint about it. :-}
https://lists.gnu.org/archive/html/bug-bash/2007-09/msg00019.html
Some years later I learned the trick of "| cat" to use closure of the
file handles to join these asynchronous processes up. But I still
avoid it because I think it would be confusing to later programmers
reading the code.
I still think this is surprising behavior and something that the bash
should wait for automatically.
Bob
Loading...