Discussion:
[Help-bash] group
Val Krem
2016-12-10 19:52:22 UTC
Permalink
Hi all,

In one folder a I have several files( >200). These files do have the same columns(variables). The file names do have two patterns and want concatenate these files into two groups based on their file name pattern.

The file names look like as follow.
Region-19940726.csv
Region-19940816.csv
Region-19940920.csv
Region-19941018.csv
Region-19941122.csv
Region.csv
RegionAB-19940726.csv
RegionAB-19940816.csv
RegionAB-19940920.csv
RegionAB-19941018.csv
RegionAB-19941122.csv
RegionAB.csv

The first file (file1) should include all files that starts with 'RegionAB' and inlcude the following files

RegionAB-19940726.csv
RegionAB-19940816.csv
RegionAB-19940920.csv
RegionAB-19941018.csv
RegionAB-19941122.csv
RegionAB.csv

The second output file (file2) should include all files that start with 'Region'

Region-19940726.csv
Region-19940816.csv
Region-19940920.csv
Region-19941018.csv
Region-19941122.csv
Region.csv

Is there an efficient way of doing this?

thank you in advance
Greg Wooledge
2016-12-12 13:25:49 UTC
Permalink
Post by Val Krem
The file names look like as follow.
Region-19940726.csv
Region-19940816.csv
Region-19940920.csv
Region-19941018.csv
Region-19941122.csv
Region.csv
RegionAB-19940726.csv
RegionAB-19940816.csv
RegionAB-19940920.csv
RegionAB-19941018.csv
RegionAB-19941122.csv
RegionAB.csv
The first file (file1) should include all files that starts with 'RegionAB' and inlcude the following files
cat RegionAB-*.csv RegionAB.csv > file1
Post by Val Krem
The second output file (file2) should include all files that start with 'Region'
Region-19940726.csv
Region-19940816.csv
Region-19940920.csv
Region-19941018.csv
Region-19941122.csv
Region.csv
cat Region-*.csv Region.csv > file2
John McKown
2016-12-12 14:08:28 UTC
Permalink
Post by Val Krem
Post by Val Krem
The file names look like as follow.
Region-19940726.csv
Region-19940816.csv
Region-19940920.csv
Region-19941018.csv
Region-19941122.csv
Region.csv
RegionAB-19940726.csv
RegionAB-19940816.csv
RegionAB-19940920.csv
RegionAB-19941018.csv
RegionAB-19941122.csv
RegionAB.csv
The first file (file1) should include all files that starts with
'RegionAB' and inlcude the following files
cat RegionAB-*.csv RegionAB.csv > file1
Post by Val Krem
The second output file (file2) should include all files that start
with 'Region'
Post by Val Krem
Region-19940726.csv
Region-19940816.csv
Region-19940920.csv
Region-19941018.csv
Region-19941122.csv
Region.csv
cat Region-*.csv Region.csv > file2
​Greg gave you the canonical answer. But, if you just want to read those
files and (1) you don't want to merge them together into a cumulative file
(perhaps due to space) and (2) your application is not written to handle
multiple separate files, I will mention "process substitution". In most
shells, you might need to do something like:

cat Region-*.csv Region.csv >tempfile.csv
some-program tempfile.csv
rm tempfile.csv

But with BASH, you can use process substitution:

some-program <(cat Region{,-*}.csv​)

Of course, the above only works if "some-program" only wants to _read_
those files, not update them.

​And, just to be complete, note that "Region{,-*}​.csv" expands into
"Region.csv" and "Region-*.csv" and then "Region-*.csv" expands into every
file which starts with "Region-" and ends with ".csv". I don't know how
many people use the {} or "brace expansion". But I really like it for
three reasons (none really good): (1) I'm lazy and don't like to retype
stuff; (2) it "looks cool"; (3) it messes with other people's minds.
--
Heisenberg may have been here.

http://xkcd.com/1770/

Maranatha! <><
John McKown
Greg Wooledge
2016-12-12 15:22:45 UTC
Permalink
some-program <(cat Region{,-*}.csv???)
For the record, note that you need a relatively new version of bash to
use a brace expansion inside a process substitution.

imadev:~$ bash-3.2 -c 'echo <(x {y,z})'
/var/tmp//sh-np-1481557898 /var/tmp//sh-np-3906145805
imadev:~$ bash-4.0 -c 'echo <(x {y,z})'
/var/tmp//sh-np-1481549372

In 3.2 and older, <(x {y,z}) was parsed in such a way that the brace
expansion happened FIRST, and was therefore equivalent to

<(x y) <(x z)

This was changed in bash 4.0.

Personally I never use brace expansions in a script. I might VERY
occasionally use one in an interactive shell.
John McKown
2016-12-12 16:07:59 UTC
Permalink
Post by Greg Wooledge
some-program <(cat Region{,-*}.csv???)
For the record, note that you need a relatively new version of bash to
use a brace expansion inside a process substitution.
​Thanks, I hadn't realized that. I tend to stay very current on Fedora.​
Post by Greg Wooledge
imadev:~$ bash-3.2 -c 'echo <(x {y,z})'
/var/tmp//sh-np-1481557898 /var/tmp//sh-np-3906145805
imadev:~$ bash-4.0 -c 'echo <(x {y,z})'
/var/tmp//sh-np-1481549372
In 3.2 and older, <(x {y,z}) was parsed in such a way that the brace
expansion happened FIRST, and was therefore equivalent to
<(x y) <(x z)
This was changed in bash 4.0.
Personally I never use brace expansions in a script. I might VERY
occasionally use one in an interactive shell.
​I am also very conservative in an actual script. Which is why I use
#!/bin/sh as my "magic" line instead of #!/bin/bash. If I understand the
doc correctly, that makes BASH run more like a general Bourne shell.​
That's more portable.
--
Heisenberg may have been here.

http://xkcd.com/1770/

Maranatha! <><
John McKown
Pierre Gaston
2016-12-12 17:36:09 UTC
Permalink
Post by John McKown
Post by Greg Wooledge
some-program <(cat Region{,-*}.csv???)
For the record, note that you need a relatively new version of bash to
use a brace expansion inside a process substitution.
​Thanks, I hadn't realized that. I tend to stay very current on Fedora.​
Bash 4 is already almost eight years old, so depending on your mileage, it
may not seem too new to you ;)
Val Krem
2016-12-13 00:40:32 UTC
Permalink
John and Greg,


Thank you so much for the help.

Val
Post by Val Krem
Post by Val Krem
The file names look like as follow.
Region-19940726.csv
Region-19940816.csv
Region-19940920.csv
Region-19941018.csv
Region-19941122.csv
Region.csv
RegionAB-19940726.csv
RegionAB-19940816.csv
RegionAB-19940920.csv
RegionAB-19941018.csv
RegionAB-19941122.csv
RegionAB.csv
The first file (file1) should include all files that starts with
'RegionAB' and inlcude the following files
cat RegionAB-*.csv RegionAB.csv > file1
Post by Val Krem
The second output file (file2) should include all files that start
with 'Region'
Post by Val Krem
Region-19940726.csv
Region-19940816.csv
Region-19940920.csv
Region-19941018.csv
Region-19941122.csv
Region.csv
cat Region-*.csv Region.csv > file2
​Greg gave you the canonical answer. But, if you just want to read those
files and (1) you don't want to merge them together into a cumulative file
(perhaps due to space) and (2) your application is not written to handle
multiple separate files, I will mention "process substitution". In most
shells, you might need to do something like:

cat Region-*.csv Region.csv >tempfile.csv
some-program tempfile.csv
rm tempfile.csv

But with BASH, you can use process substitution:

some-program <(cat Region{,-*}.csv​)

Of course, the above only works if "some-program" only wants to _read_
those files, not update them.

​And, just to be complete, note that "Region{,-*}​.csv" expands into
"Region.csv" and "Region-*.csv" and then "Region-*.csv" expands into every
file which starts with "Region-" and ends with ".csv". I don't know how
many people use the {} or "brace expansion". But I really like it for
three reasons (none really good): (1) I'm lazy and don't like to retype
stuff; (2) it "looks cool"; (3) it messes with other people's minds.
--
Heisenberg may have been here.

http://xkcd.com/1770/

Maranatha! <><
John McKown
Loading...