[Help-bash] How to merge stdout and stderr yet distinguish what is from stdout and what is from stderr?

Discussion:

[Help-bash] How to merge stdout and stderr yet distinguish what is from stdout and what is from stderr?

Peng Yu

2018-02-04 21:34:00 UTC

Hi,

`cmd 2>&1` can be used to merge stdout stderr into one stream. But the
content from stdout and stderr are not distinguishable anymore.

Is there a way to not only merge stdout and stderr but also prepend
each line from stdout with a prefix and prepend each line from stderr
with a different prefix (e.g., 'o' and 'e' respectively) so that they
are distinguishable (we assume that there will be no merged lines
coming from both stdout and stderr)?

Maybe this can be done through an external program, but I am not sure
how to implement, especially, if I want the order of the lines
combined from both streams unchanged whether there are prefixes or
not.

Does anybody know a solution to this problem? Thanks.

--
Regards,
Peng

João Eiras

2018-02-04 22:24:04 UTC

Hi !

You have two options

1) easy solution, prepend something to the lines coming from each file
descriptor

( command 1> >(awk '{print "1" $0}') 2> >(awk '{print "2" $0}') ) >
all_my_output.txt

Note: both sub-shells running awk will output to stdout, so both
outputs are merged

2) fancy solution, keep both stdout and stderr separate and handle
them independently.

This code is more complicated, but it conveniently wrapped with a function

function run_command_with_output_handler {
local output_handler="$1"
shift

local p1=/tmp/stdout.pipe.$BASHPID p2=/tmp/stderr.pipe.$BASHPID

"$@" 1>"$p1" 2>"$p2" &

(while [[ 1 ]] ; do
line=
pipe_broken=0
timed_out=0

for fd in 101 102; do
read -r -t 0.005 -u $fd line
readstatus=$?

[[ $readstatus -gt 127 && ${#line} = 0 ]] && timed_out=$(($timed_out + 1))
[[ $readstatus = 0 || ${#line} != 0 ]] && break
[[ $readstatus = 1 ]] && pipe_broken=$(($pipe_broken + 1))
done

# Both pipes are closed, so the task ended
[[ $pipe_broken -ge 2 ]] && break

# Both pipes timed out so so the task has not written anything meanwhile.
# Find way to use
http://man7.org/linux/man-pages/man2/select.2.html instead of a sleep.
[[ "$timed_out" = 2 ]] && { sleep 0.05 ; continue ; }

# Now fd is either 101 or 102 and $line is not empty, use at will.
output_handler "${fd:2}" "$line"
done) 101<"$p1" 102<"$p2"

wait

rm -f "$p1" "$p2"
}

function my_output_handler {
echo "Fd: $1. line: $2"
}

some_command=( lots of stuff here and arguments )
run_command_with_output_handler my_output_handler "${some_command[@]}"

Note: part of this code was just typed in the e-mail and not tested.

Hope this helps.

Cheers.

Andy Chu

2018-02-04 22:33:17 UTC

Post by Peng Yu
Hi,
`cmd 2>&1` can be used to merge stdout stderr into one stream. But the
content from stdout and stderr are not distinguishable anymore.
Is there a way to not only merge stdout and stderr but also prepend
each line from stdout with a prefix and prepend each line from stderr
with a different prefix (e.g., 'o' and 'e' respectively) so that they
are distinguishable (we assume that there will be no merged lines
coming from both stdout and stderr)?

Joao just gave the same answer as I was typing it, but:

https://github.com/oilshell/blog-code/blob/master/stdout-stderr/demo.sh

I don't like process substitution that much, because the syntax looks odd.
For me it helps to think of it as

prog > $stdin(awk ... ) 2> $stdin(awk ...) # imaginary syntax

The $ makes it clearer that it expands to a word like /dev/fd/64. >() is
confusing especially when preceded by the redirect operators!

$ ./demo.sh prog
1
2
3
4
5
6
7
8
9

$ ./demo.sh filter
o 1
o 2
o 3
e 4
e 5
e 6
o 7
o 8
o 9

Andy

(reply to all this time)

Peng Yu

2018-02-05 00:03:37 UTC

Post by Andy Chu

Post by Peng Yu
Hi,
`cmd 2>&1` can be used to merge stdout stderr into one stream. But the
content from stdout and stderr are not distinguishable anymore.
Is there a way to not only merge stdout and stderr but also prepend
each line from stdout with a prefix and prepend each line from stderr
with a different prefix (e.g., 'o' and 'e' respectively) so that they
are distinguishable (we assume that there will be no merged lines
coming from both stdout and stderr)?

https://github.com/oilshell/blog-code/blob/master/stdout-stderr/demo.sh
I don't like process substitution that much, because the syntax looks odd.
For me it helps to think of it as
prog > $stdin(awk ... ) 2> $stdin(awk ...) # imaginary syntax
The $ makes it clearer that it expands to a word like /dev/fd/64. >() is
confusing especially when preceded by the redirect operators!
$ ./demo.sh prog
1
2
3
4
5
6
7
8
9
$ ./demo.sh filter
o 1
o 2
o 3
e 4
e 5
e 6
o 7
o 8
o 9

I tried it multiple times. But the order is not guaranteed to be
maintained, I guess it is due the buffer that might be used with awk
when reading from stdout or stderr. So technically speaking, this
solution does not solve my original question.

--
Regards,
Peng

Andy Chu

2018-02-05 03:40:30 UTC

Post by Peng Yu
I tried it multiple times. But the order is not guaranteed to be
maintained, I guess it is due the buffer that might be used with awk
when reading from stdout or stderr. So technically speaking, this
solution does not solve my original question.

Well, if you want the same order line-by-line, I think you have to write a
program that redirects stdout and stderr to separate pipes, and does
select() over them.

Even then, I'm not sure there is any guarantee. If your program does this:

print 'stdout'
print >>sys.stderr, 'stderr'

Then when the select() wakes up, it should have data on both file
descriptors. And then it won't know which one to print first -- stdout or
stderr.

So it could be impossible not just in shell, **but in Unix**.

But if anyone has any other ideas I'm curious.

As I see it, there is a fundamental problem:

- A pipe/buffer in Unix is inherently ordered (in the kernel). If both
stdout and stderr are connected to the same pipe, lines will have a natural
order.
- If you want to tell stdout and stderr apart, then you need to write them
to separate pipes/buffers.
- Once they're in separate pipes/buffers, there is no longer a global
order. You lost that information.

The calling program would have to number its lines. But I don't think it
can be done without modifying the calling program.

Andy

Tadeus Prastowo

2018-02-05 10:20:12 UTC

Post by Peng Yu
I tried it multiple times. But the order is not guaranteed to be
maintained, I guess it is due the buffer that might be used with awk
when reading from stdout or stderr. So technically speaking, this
solution does not solve my original question.
--
Regards,
Peng

What about the following one with `-u' for unbuffered added to the sed commands?

Again, I assume that a write is atomic at line level:

exec {mystdout}>&1
coproc pA (sed -u 's%^%[From stdout] %' >&${mystdout})
coproc pB (sed -u 's%^%[From stderr] %' >&${mystdout})
coproc (echo 'Forget pB as pA has already been forgotten' >/dev/null)
cmd 1>&${pA[1]} 2>&${pB[1]}; exec {pA[1]}>&- {pB[1]}>&-

If that does not work, you can try to attach timestamp using GNU `sed'
e command and use `sort' to order your stdout based on the timestamp.
I will let you tinker with that.

--
Best regards,
Tadeus

Dennis Williamson

2018-02-04 22:34:06 UTC

On Feb 4, 2018 3:34 PM, "Peng Yu" <***@gmail.com> wrote:

Hi,

`cmd 2>&1` can be used to merge stdout stderr into one stream. But the
content from stdout and stderr are not distinguishable anymore.

Is there a way to not only merge stdout and stderr but also prepend
each line from stdout with a prefix and prepend each line from stderr
with a different prefix (e.g., 'o' and 'e' respectively) so that they
are distinguishable (we assume that there will be no merged lines
coming from both stdout and stderr)?

Maybe this can be done through an external program, but I am not sure
how to implement, especially, if I want the order of the lines
combined from both streams unchanged whether there are prefixes or
not.

Does anybody know a solution to this problem? Thanks.

--
Regards,
Peng

Please see https://mywiki.wooledge.org/BashFAQ/106

Peng Yu

2018-02-05 00:05:51 UTC

Post by Dennis Williamson
Please see https://mywiki.wooledge.org/BashFAQ/106

I am not sure which specific solution you referred to. But it seems
that any solutions there should also have the problem that I mentioned
for Andy Chu's solution. Let me know if it is not.

--
Regards,
Peng

Russell Lewis

2018-02-05 05:01:47 UTC

Dennis, I skimmed the link you posted. You're correct that, when you use
process substitution, the 'tee' programs run in the background, and it's
hard to wait on them.

However, I found a hackish solution for that: a trailing 'cat' :
cmd > >(tee logfile) | cat
Basically, the 'cat' is reading from the stdout of the things before the
pipe - which includes the stdout of the 'tee'. So 'cat' won't terminate
until it has read everything from 'tee', and copied it to screen. And bash
will wait for 'cat' to terminate.

Russ

On Sun, Feb 4, 2018 at 3:34 PM, Dennis Williamson <

Post by Peng Yu
Hi,
`cmd 2>&1` can be used to merge stdout stderr into one stream. But the
content from stdout and stderr are not distinguishable anymore.
Is there a way to not only merge stdout and stderr but also prepend
each line from stdout with a prefix and prepend each line from stderr
with a different prefix (e.g., 'o' and 'e' respectively) so that they
are distinguishable (we assume that there will be no merged lines
coming from both stdout and stderr)?
Maybe this can be done through an external program, but I am not sure
how to implement, especially, if I want the order of the lines
combined from both streams unchanged whether there are prefixes or
not.
Does anybody know a solution to this problem? Thanks.
--
Regards,
Peng
Please see https://mywiki.wooledge.org/BashFAQ/106

Greg Wooledge

2018-02-05 13:54:50 UTC

Post by Peng Yu
`cmd 2>&1` can be used to merge stdout stderr into one stream. But the
content from stdout and stderr are not distinguishable anymore.
Is there a way to not only merge stdout and stderr but also prepend
each line from stdout with a prefix and prepend each line from stderr
with a different prefix (e.g., 'o' and 'e' respectively) so that they
are distinguishable (we assume that there will be no merged lines
coming from both stdout and stderr)?

No.

For more details on that "no", see <https://mywiki.wooledge.org/BashFAQ/106>.

But ultimately it's still a "no".

Post by Peng Yu
I tried it multiple times. But the order is not guaranteed to be
maintained,

Correct! THAT is the real issue here.

Tadeus Prastowo

2018-02-05 14:31:12 UTC

Post by Greg Wooledge

Post by Peng Yu
I tried it multiple times. But the order is not guaranteed to be
maintained,

Correct! THAT is the real issue here.

Won't `stdbuf' help?

Assuming that cmd is single threaded, `stdbuf -o0 -e0 cmd' should wake
up either the stdout or stderr reader process after writing a single
character to either stdout or stderr. So, if the reader processes are
also run with stdbuf -i0 -oL, it increases the chance for the cmd and
its reader processes to run in a lock step. And since the cmd is
single threaded, the relative ordering of the entries in the merged
stdout-stderr pipe should be almost correct.

BTW, I am having the following in mind when I talk about "merged
stdout-stderr pipe":
exec {mystdout}>&1
coproc pA (sed 's%^%[From stdout] %' >&${mystdout})
coproc pB (sed 's%^%[From stderr] %' >&${mystdout})
coproc (echo 'Forget pB as pA has already been forgotten' >/dev/null)
cmd 1>&${pA[1]} 2>&${pB[1]}; exec {pA[1]}>&- {pB[1]}>&- {mystdout}>&-

A sample cmd that I have in mind is attached, and its expected output
with regard to the correct relative ordering in the merged
stdout-stderr pipe is as follows:
[From stdout] 1
[From stdout] 2
[From stdout] 3
[From stdout] 4
[From stderr] 5
[From stderr] 6
[From stderr] 7
[From stderr] 8
[From stdout] 9
[From stderr] 10
[From stdout] 11
[From stderr] 12
[From stdout] 13
[From stderr] 14
[From stdout] 15
[From stderr] 16
[From stdout] 17
[From stdout] 18
[From stdout] 19
[From stderr] 20
[From stderr] 21
[From stdout] 22

--
Best regards,
Tadeus

Greg Wooledge

2018-02-05 15:22:09 UTC

Post by Tadeus Prastowo

Post by Greg Wooledge

Post by Peng Yu
I tried it multiple times. But the order is not guaranteed to be
maintained,

Correct! THAT is the real issue here.

Won't `stdbuf' help?

No, because it's not a buffering issue. It's a race condition.

When you take stream A and run it through filter program A, filter
program A introduces some nondeterministic delay between the time the
input is received and the time the filtered output is written.

Then you take stream B and run it through filter program B, which
introduces *its* own delay.

If the delay of line 1 (stream A) is just a few milliseconds longer
than expected for any reason, then it may occur *after* line 2
(stream B) is written.

If you want to work around that, you could timestamp each line in
addition to filtering it, and then run both timestamped-and-filtered
streams through a third program that merges the lines back into the
correct order by using the timestamps, and then removes the timestamps.

Russell Lewis

2018-02-05 16:29:15 UTC

I agree with you, Greg about the race condition.

However, I'd quibble about your last paragraph just a hair: if you have the
ability to modify the source application so as to change how it prints
output, then it's easy to solve this problem - just annotate the output as
the original poster desired. :)

Russ

Post by Greg Wooledge

Post by Tadeus Prastowo

Post by Greg Wooledge

Post by Peng Yu
I tried it multiple times. But the order is not guaranteed to be
maintained,

Correct! THAT is the real issue here.

Won't `stdbuf' help?

No, because it's not a buffering issue. It's a race condition.
When you take stream A and run it through filter program A, filter
program A introduces some nondeterministic delay between the time the
input is received and the time the filtered output is written.
Then you take stream B and run it through filter program B, which
introduces *its* own delay.
If the delay of line 1 (stream A) is just a few milliseconds longer
than expected for any reason, then it may occur *after* line 2
(stream B) is written.
If you want to work around that, you could timestamp each line in
addition to filtering it, and then run both timestamped-and-filtered
streams through a third program that merges the lines back into the
correct order by using the timestamps, and then removes the timestamps.

Tadeus Prastowo

2018-02-05 16:25:10 UTC

Post by Greg Wooledge

Post by Tadeus Prastowo

Post by Greg Wooledge

Post by Peng Yu
I tried it multiple times. But the order is not guaranteed to be
maintained,

Correct! THAT is the real issue here.

Won't `stdbuf' help?

No, because it's not a buffering issue. It's a race condition.
When you take stream A and run it through filter program A, filter
program A introduces some nondeterministic delay between the time the
input is received and the time the filtered output is written.
Then you take stream B and run it through filter program B, which
introduces *its* own delay.
If the delay of line 1 (stream A) is just a few milliseconds longer
than expected for any reason, then it may occur *after* line 2
(stream B) is written.
If you want to work around that, you could timestamp each line in
addition to filtering it, and then run both timestamped-and-filtered
streams through a third program that merges the lines back into the
correct order by using the timestamps, and then removes the timestamps.

Then, timestamping won't help either, will it? Suppose cmd output
line L at time t_1 at stdout and were still be able to output line L'
at time t_1+D at stderr. The scheduler then happened to decide that
the process waiting at stderr were to go first. So, the process
stamped L' with time t_1+D+D'. Afterwards, the scheduler gave the
processor to the process waiting at stdout, which would stamp L with
time t_1+D+D'+D''. So, the correct relative order of L and L' as
output by cmd is screwed because by their timestamps, L' would be
ordered before L.

--
Best regards,
Tadeus

John Kearney

2018-02-06 08:34:53 UTC

There is a small tool called "logapp" that can do this, realising it in a
bash script is not really worth the hassle.

Post by Tadeus Prastowo

Post by Greg Wooledge

Post by Tadeus Prastowo

Post by Greg Wooledge

Post by Peng Yu
I tried it multiple times. But the order is not guaranteed to be
maintained,

Correct! THAT is the real issue here.

Won't `stdbuf' help?

No, because it's not a buffering issue. It's a race condition.
When you take stream A and run it through filter program A, filter
program A introduces some nondeterministic delay between the time the
input is received and the time the filtered output is written.
Then you take stream B and run it through filter program B, which
introduces *its* own delay.
If the delay of line 1 (stream A) is just a few milliseconds longer
than expected for any reason, then it may occur *after* line 2
(stream B) is written.
If you want to work around that, you could timestamp each line in
addition to filtering it, and then run both timestamped-and-filtered
streams through a third program that merges the lines back into the
correct order by using the timestamps, and then removes the timestamps.

Then, timestamping won't help either, will it? Suppose cmd output
line L at time t_1 at stdout and were still be able to output line L'
at time t_1+D at stderr. The scheduler then happened to decide that
the process waiting at stderr were to go first. So, the process
stamped L' with time t_1+D+D'. Afterwards, the scheduler gave the
processor to the process waiting at stdout, which would stamp L with
time t_1+D+D'+D''. So, the correct relative order of L and L' as
output by cmd is screwed because by their timestamps, L' would be
ordered before L.
--
Best regards,
Tadeus

Tadeus Prastowo

2018-02-05 09:57:56 UTC

If I can assume that a write is atomic at line level, then the
following should work:

exec {mystdout}>&1
coproc pA (sed 's%^%[From stdout] %' >&${mystdout})
coproc pB (sed 's%^%[From stderr] %' >&${mystdout})
coproc (echo 'Forget pB as pA has already been forgotten' >/dev/null)
cmd 1>&${pA[1]} 2>&${pB[1]}; exec {pA[1]}>&- {pB[1]}>&-

--
Best regards,
Tadeus

Post by Peng Yu
Hi,
`cmd 2>&1` can be used to merge stdout stderr into one stream. But the
content from stdout and stderr are not distinguishable anymore.
Is there a way to not only merge stdout and stderr but also prepend
each line from stdout with a prefix and prepend each line from stderr
with a different prefix (e.g., 'o' and 'e' respectively) so that they
are distinguishable (we assume that there will be no merged lines
coming from both stdout and stderr)?
Maybe this can be done through an external program, but I am not sure
how to implement, especially, if I want the order of the lines
combined from both streams unchanged whether there are prefixes or
not.
Does anybody know a solution to this problem? Thanks.
--
Regards,
Peng

15 Replies
2 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Peng Yu 2018-02-04 21:34:00 UTC

João Eiras 2018-02-04 22:24:04 UTC

Andy Chu 2018-02-04 22:33:17 UTC

Peng Yu 2018-02-05 00:03:37 UTC

Andy Chu 2018-02-05 03:40:30 UTC

Tadeus Prastowo 2018-02-05 10:20:12 UTC

Dennis Williamson 2018-02-04 22:34:06 UTC

Peng Yu 2018-02-05 00:05:51 UTC

Russell Lewis 2018-02-05 05:01:47 UTC

Greg Wooledge 2018-02-05 13:54:50 UTC

Tadeus Prastowo 2018-02-05 14:31:12 UTC

Greg Wooledge 2018-02-05 15:22:09 UTC

Russell Lewis 2018-02-05 16:29:15 UTC

Tadeus Prastowo 2018-02-05 16:25:10 UTC

John Kearney 2018-02-06 08:34:53 UTC

Tadeus Prastowo 2018-02-05 09:57:56 UTC

about - legalese

Loading...