Discussion:
[Help-bash] multiline random paste or how to properly use (named) streams
Garreau\, Alexandre
2018-03-13 15:49:47 UTC
Permalink
Hi,

Recently a friend of mine asked me if I would have a right away solution
to interleaves line of a file foo and a file bar (m3u files actually),
in a way that each (at random) 1 to 5 lines of “foo” would be surrounded
by (at random) 2 to 4 (preferably but not necessarily) random unique
(not appearing several times in total) lines of “bar” (the variable
number of interleaved lines is what makes this problem beyond the scope
of paste from coreutils).

As the problem interested me, I generalized it into “interleaving
arbitrary random interval of lines of an arbitrary number of files”,
with the option of filtering some input with sort -R at the end. I could
have put all the contents in a variable but I thought it wouldn’t end
simple and readable, I didn’t want to open any file several times. Hence
I started learning named streams (also process substitution but I didn’t
found this useful), since, even thought I was going to work on an
arbitrary number of (thus numbered but not named) files, I saw in bash
manual that behavior with fd superior to 9 was unspecified, and I wanted
something robust yet able to operate on more than 9 files.

The interface I came with is (with later option to specify randomness):
interleave [<file> <interval-begin>-<interval-end>] ...

I ended with this code whose the unreadability introduced by the correct
opening and naming of each stream doesn’t satisfy me as a proof of
having learning and mastered well enough named streams (I don’t like
“parse all arguments then only then do actual stuff”). I’m especially
dissatisfied with the complexity of usage of my array (that I use to
number the streams (which are in arbitrary number)) as a consequence of
the arguments of my function being interleaved (I also thought of maybe
complexifying my interface to something like “[<file>:<interval>]...” in
order to unify and simplify retrieval of correct arguments):
Greg Wooledge
2018-03-13 16:57:45 UTC
Permalink
Post by Garreau\, Alexandre
Recently a friend of mine asked me if I would have a right away solution
to interleaves line of a file foo and a file bar (m3u files actually),
Music play lists. Good. Having the context is helpful. An m3u file
has one filename per line (either CR-LF or newline terminated), as I
understand it. No blank lines, no markup or groupings or comments or
anything else.
Post by Garreau\, Alexandre
in a way that each (at random) 1 to 5 lines of “foo” would be surrounded
by (at random) 2 to 4 (preferably but not necessarily) random unique
(not appearing several times in total) lines of “bar”
When you say "surrounded by", do you actually mean "followed by"?

So, as I understand it, the problem is as follows:

1) We have two text files containing multiple lines. There is no semantic
meaning to the lines (they are filenames, but we're not going to be
opening or statting the files). They are of roughly equal sizes.

2) The first file is to be used in sequential order. The second file is
to be randomly shuffled.

2a) A recent GNU coreutils is available for the shuffling.

3) A single output file is to be produced, containing lines from both
input files, following a sequence of steps. At each step, read 1-5
lines from the first file, and write them to the output. Then read
2-4 lines from the (shuffled) second file, and write those to the
output.

If this is the full problem spec, then here's one approach:


#!/bin/bash

exec 3< file1
exec 4< <(shuf file2)

while true; do
# Read 1-5 lines from first file.
n=$((1 + RANDOM%5))
for ((i=1; i<=n; i++)); do
IFS= read -r line <&3 || break
printf %s\\n "$line"
done

# Read 2-4 lines from second file.
n=$((2 + RANDOM%3))
for ((i=1; i<=n; i++)); do
IFS= read -r line <&4 || break
printf %s\\n "$line"
done
done

exec 3<&- 4<&-


This terminates as soon as it reaches EOF in either file. This may
cause some lines from the *other* file (whichever one didn't hit EOF)
to be unused. If that's a problem, well, you can fix it. ;-)
Greg Wooledge
2018-03-13 17:47:47 UTC
Permalink
Post by Greg Wooledge
while true; do
# Read 1-5 lines from first file.
n=$((1 + RANDOM%5))
for ((i=1; i<=n; i++)); do
IFS= read -r line <&3 || break
Both of these "break" commands should have been "break 2". Sorry 'bout
that. Also, the script as written is completely untested, so there
could be other bugs lurking.
Greg Wooledge
2018-03-13 19:16:40 UTC
Permalink
Ideally I wanted to work on a more general problem,
Then you and bash are going to be at war very shortly.

Bash is NOT suited for solving generalized anything, or creating reusable
anything. Every script is a one-off, to solve a single problem. You
can learn techniques and tricks that can be applied to future scripts,
but any kind of "framework" or "library" is not happening.

Loading...