Discussion:
[Help-bash] pipe character at end of command ?
Ulf Andersson A
2016-11-23 15:46:17 UTC
Permalink
Hello,

I am at my wits end. I have searched the bash manual and man page as well as numerous wikis and tutorials. All to no avail. I am trying this out on a Red Hat Linux engine. From uname -a I get this

Linux themachine 2.6.32-642.6.1.el6.x86_64 #1 SMP Thu Aug 25 12:42:19 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

Here is my little example:
--8><--------------------------------------------
#!/bin/sh
# Ape
# Banana
# Ladder
# A random comment

spunk()
{
sed '/Ape/d' |
sed '/Banana/d' |
sed '/Ladder/d'
}

cat $0 | spunk
--8><--------------------------------------------
I have figured out what the three sed commands do each by themselves, but I have still fo figure out what the pipe characters actually do here. And no, I did not forget to put any continuation characters at the end of the lines.

The above example produces this output:
--8><--------------------------------------------
#!/bin/sh
# A random comment

spunk()
{
}

cat $0 | spunk
--8><--------------------------------------------

Now if I remove the pipe characters from the script above, I get this output:
--8><--------------------------------------------
#!/bin/sh
# Banana
# Ladder
# A random comment

spunk()
{
sed '/Banana/d'
sed '/Ladder/d'
}

cat $0 | spunk# A random comment
--8><--------------------------------------------

Clearly there is some hand waving with the pipe going on, but as I said above, I have not found any kind of documentation of this behaviour. At least not any that I could understand. I might be blind, or something... :)

Could anyone explain this to me, please?

Best regards,

/Ulf Andersson A
Davide Brini
2016-11-23 16:07:58 UTC
Permalink
On Wed, 23 Nov 2016 15:46:17 +0000, Ulf Andersson A
Post by Ulf Andersson A
Hello,
I am at my wits end. I have searched the bash manual and man page as well
as numerous wikis and tutorials. All to no avail. I am trying this out on
a Red Hat Linux engine. From uname -a I get this
Linux themachine 2.6.32-642.6.1.el6.x86_64 #1 SMP Thu Aug 25
12:42:19 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
--8><--------------------------------------------
#!/bin/sh
# Ape
# Banana
# Ladder
# A random comment
spunk()
{
sed '/Ape/d' |
sed '/Banana/d' |
sed '/Ladder/d'
}
cat $0 | spunk
--8><--------------------------------------------
I have figured out what the three sed commands do each by themselves, but
I have still fo figure out what the pipe characters actually do here. And
no, I did not forget to put any continuation characters at the end of the
lines.
--8><--------------------------------------------
#!/bin/sh
# A random comment
spunk()
{
}
cat $0 | spunk
--8><--------------------------------------------
Now if I remove the pipe characters from the script above, I get this
output: --8><--------------------------------------------
#!/bin/sh
# Banana
# Ladder
# A random comment
spunk()
{
sed '/Banana/d'
sed '/Ladder/d'
}
cat $0 | spunk# A random comment
--8><--------------------------------------------
Clearly there is some hand waving with the pipe going on, but as I said
above, I have not found any kind of documentation of this behaviour. At
least not any that I could understand. I might be blind, or
something... :)
Could anyone explain this to me, please?
In your second example, the first sed command consumes the whole input
and writes directly to stdout; the second and third sed commands are
executed but with an empty stdin, so they do nothing.

In the first example, on the other hand, the pipes allow the data to "flow"
from the first sed, to the second, to the third.
--
D.
Greg Wooledge
2016-11-23 16:11:51 UTC
Permalink
Post by Ulf Andersson A
spunk()
{
sed '/Ape/d' |
sed '/Banana/d' |
sed '/Ladder/d'
}
cat $0 | spunk
--8><--------------------------------------------
I have figured out what the three sed commands do each by themselves, but I have still fo figure out what the pipe characters actually do here. And no, I did not forget to put any continuation characters at the end of the lines.
The function is exactly equivalent to this:

spunk() {
sed '/Ape/d' | sed '/Banana/d' | sed '/Ladder/d'
}

When you write a line in such a way that there *must be more* of it in
order to make a complete command, bash continues reading the next
line to get the rest of the command. In these cases, an explicit \ is
not needed.

Try it interactively, with simple cases, and you'll see:

imadev:~$ echo true |
Post by Ulf Andersson A
cat
true

imadev:~$ true &&
Post by Ulf Andersson A
echo yes
yes

imadev:~$ echo "hello
Post by Ulf Andersson A
world"
hello
world

"> " is bash's default value for PS2, the internal variable that's used
to prompt for the continuation of a multi-line command.

When writing long commands in a script, most people will try to divide
up the command in a way that makes it easy to read (for humans). When
there are natural divisions like |, it makes sense to use these.

The following two multi-line commands are completely equivalent, but
one of them is much easier to read than the other:

sed '/Ape/d' |
sed '/Banana/d' |
sed '/Ladder/d'

sed '/Ape/d' | sed \
'/Banana/d' | sed '/Ladder/d'
Ulf Andersson A
2016-11-23 17:29:34 UTC
Permalink
My replies interspliced below.
-----Original Message-----
Sent: den 23 november 2016 17:12
Subject: Re: [Help-bash] pipe character at end of command ?
Post by Ulf Andersson A
spunk()
{
sed '/Ape/d' |
sed '/Banana/d' |
sed '/Ladder/d'
}
cat $0 | spunk
--8><--------------------------------------------
I have figured out what the three sed commands do each by themselves,
but I have still fo figure out what the pipe characters actually do here. And
no, I did not forget to put any continuation characters at the end of the lines.
spunk() {
sed '/Ape/d' | sed '/Banana/d' | sed '/Ladder/d'
}
When you write a line in such a way that there *must be more* of it in order
to make a complete command, bash continues reading the next line to get
the rest of the command. In these cases, an explicit \ is not needed.
Yes! That's it. Thank you.

OK, so now I know what happened. Where do I find this documented?
I am pretty sure I will need to explain this to other people in the future.
imadev:~$ echo true |
Post by Ulf Andersson A
cat
true
imadev:~$ true &&
Post by Ulf Andersson A
echo yes
yes
imadev:~$ echo "hello
Post by Ulf Andersson A
world"
hello
world
"> " is bash's default value for PS2, the internal variable that's used
to prompt for the continuation of a multi-line command.
When writing long commands in a script, most people will try to divide
up the command in a way that makes it easy to read (for humans). When
there are natural divisions like |, it makes sense to use these.
That is true. The old geezers way was to add the continuation characters.
Clearly things have changed since last I studied shell scripting.
The following two multi-line commands are completely equivalent, but
sed '/Ape/d' |
sed '/Banana/d' |
sed '/Ladder/d'
sed '/Ape/d' | sed \
'/Banana/d' | sed '/Ladder/d'
Greg Wooledge
2016-11-23 18:17:53 UTC
Permalink
Post by Ulf Andersson A
That is true. The old geezers way was to add the continuation characters.
Clearly things have changed since last I studied shell scripting.
The shells haven't changed. Your learning material and examples simply
never covered it.

Even the pre-posix Bourne shell acts exactly the same way, even using
the PS2 variable to hold the prompt.

imadev:~$ /usr/old/bin/sh
$ PS2='*#> '
$ true &&
*#>

As for documentation, see "PROMPTING" in the bash man page. It's
another one of those things that's so common, and has been around for
so long, that it may not even be spelled out explicitly, but the
PROMPTING section at least mentions it.
Andy Chu
2016-11-23 19:24:16 UTC
Permalink
Post by Greg Wooledge
As for documentation, see "PROMPTING" in the bash man page. It's
another one of those things that's so common, and has been around for
so long, that it may not even be spelled out explicitly, but the
PROMPTING section at least mentions it.
FWIW it is in the POSIX grammar:

pipe_sequence : command
| pipe_sequence '|' linebreak command


http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_10_02

Although I'm not sure of the history, I'm pretty sure the grammar
hasn't changed much in a long time. All shells I've tested (bash,
dash, busybox ash, mksh, zsh to some extent) are very POSIX compliant
in terms of their parsing -- the grammar matches reality. It only
covers a small portion of the language, but shells seem to all agree
on that portion.

I just wrote a parser that handles almost all real bash scripts, and
documented some shell trivia I found along the way here:

http://www.oilshell.org/blog/

index by topic: http://www.oilshell.org/blog/2016/11/20.html

I was partially inspired by Chet Ramey's chapter on bash in the AOSA
book [1], where he mentions that yacc causes a bunch of problems. My
parser is top-down like essentially all other parsers except bash (and
the old but recently open-sourced mwc-sh [2]):

http://www.aosabook.org/en/bash.html

https://github.com/roytam1/mwc-sh/blob/master/parse.y
https://en.wikipedia.org/wiki/Coherent_(operating_system)

Andy

Eric Blake
2016-11-23 16:19:36 UTC
Permalink
Post by Ulf Andersson A
I am at my wits end. I have searched the bash manual and man page as well as numerous wikis and tutorials.
It's there: look for the section titled Pipelines.
Post by Ulf Andersson A
spunk()
{
sed '/Ape/d' |
sed '/Banana/d' |
sed '/Ladder/d'
By the way, this form is comparatively expensive (three fork/exec
pairs). You can achieve the same results with:

spunk()
{
sed '/Ape/d; /Banana/d; /Ladder/d'
}

which only forks once. (In general, MOST cases of piping 'sed' or
'grep' into a second 'sed' pass are candidates for evaluation on whether
a single 'sed' invocation can do all the work itself.)
Post by Ulf Andersson A
--8><--------------------------------------------
I have figured out what the three sed commands do each by themselves, but I have still fo figure out what the pipe characters actually do here.
Without the pipelines, you are executing three separate command lines,
each with the same stdin and stdout (the first such command eats all of
stdin, so the second and third have no input to consume). With the
pipeline, you are executing a single command line (the pipe control
operator is one of the shell operators that works without needing a line
continuation backslash); within that command line, the stdout of the
process on the left of a | is hooked to the stdin of the process on the
right. So the sed on Ape sees your original stdin, and but only the sed
on Ladder writes to your original stdout; the other four fds (Ape stdout
and Banana stdin, and Banana stdout and Ladder stdin) are using the
results of the pipe(2) system call to pass data between processes
without it ever landing on disk.
Post by Ulf Andersson A
Clearly there is some hand waving with the pipe going on, but as I said above, I have not found any kind of documentation of this behaviour.
It's documented; it's just that pipes are something that is SO
fundamental to shell operation that most users already know what they
do, so it's hard to point out good beginner's documentation to something
that so many of us already take for granted.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Ulf Andersson A
2016-11-23 17:32:48 UTC
Permalink
My comment interspliced below.
-----Original Message-----
From: Ulf Andersson A
Sent: den 23 november 2016 18:30
Subject: RE: [Help-bash] pipe character at end of command ?
My comments interspliced below.
-----Original Message-----
Sent: den 23 november 2016 17:20
Subject: Re: [Help-bash] pipe character at end of command ?
Post by Ulf Andersson A
I am at my wits end. I have searched the bash manual and man page as
well
as numerous wikis and tutorials.
It's there: look for the section titled Pipelines.
Well, I looked at the section titled Pipelines and it says a lot about pipelines
but it says nothing at all about implicit continuation characters. If the mystery
lines had had explicit continuation characters, I would have had absolutely no
problem understanding the pipeline whatsoever. But Greg Woodledge
explained this implicit thing in his reply to my question.
I am soo sorry. It is Greeg Wooledge, nothing else.
I blame the mittens... :(
Post by Ulf Andersson A
spunk()
{
sed '/Ape/d' |
sed '/Banana/d' |
sed '/Ladder/d'
By the way, this form is comparatively expensive (three fork/exec pairs).
spunk()
{
sed '/Ape/d; /Banana/d; /Ladder/d'
}
Yes.
which only forks once. (In general, MOST cases of piping 'sed' or
'grep' into a second 'sed' pass are candidates for evaluation on whether a
single 'sed'
invocation can do all the work itself.)
Post by Ulf Andersson A
--8><--------------------------------------------
I have figured out what the three sed commands do each by
themselves,
but I have still fo figure out what the pipe characters actually do here.
Without the pipelines, you are executing three separate command lines,
each with the same stdin and stdout (the first such command eats all
of stdin, so the second and third have no input to consume). With the
pipeline, you are executing a single command line (the pipe control
operator is one of the shell operators that works without needing a
line continuation backslash); within that command line, the stdout of
the process on the left of a | is hooked to the stdin of the process
on the right. So the sed on Ape sees your original stdin, and but
only the sed on Ladder writes to your original stdout; the other four
fds (Ape stdout and Banana stdin, and Banana stdout and Ladder stdin)
are using the results of the pipe(2) system call to pass data between
processes without it ever landing on disk.
Post by Ulf Andersson A
Clearly there is some hand waving with the pipe going on, but as I
said
above, I have not found any kind of documentation of this behaviour.
It's documented; it's just that pipes are something that is SO
fundamental to shell operation that most users already know what they
do, so it's hard to point out good beginner's documentation to
something that so many of us already take for granted.
As I said, my problem was not the pipes. I got that. The problem was the
bash behaviour of implicitly adding continuation characters. I didn't get that
magic.
It is all much clearer to me now that I have got that magic.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Loading...