Discussion:
[Help-bash] Using PE to specify an array
Bruce Hohl
2018-09-14 05:09:07 UTC
Permalink
Hello List & thanks for any comments on this:

These commands work:
var1='ugly_hash'; declare -a _$var1; _ugly_hash+=(123);

But this does not:
_$var1+=(456); Error = "bash: syntax error near unexpected token `456'"

Question:
Is there a way to specify _ugly_hash using Parameter Expansion of $var1? In
the above I desired _$var1 to expand to _ugly_hash in order to add (+=) an
additional element to that array.
Bruce Hohl
2018-09-14 05:34:28 UTC
Permalink
I retract my question as some trial & error with quoting revealed this:

$ var1='ugly_hash'; declare -a _$var1; _ugly_hash+=(123);

$ declare -p _$var1
declare -a _ugly_hash=([0]="123")

# QUOTES ADDED ON RIGHT SIDE addresses the problem
$ declare -a _$var1+="(456)"
$ declare -p _$var1
declare -a _ugly_hash=([0]="123" [1]="456")

# I DID EXPECT THIS TO WORK as it seems to expand as desired.
$ _$var1+="(789)"
bash: _ugly_hash+=(789): command not found

# AND the expansion works when used 'literally'
$ _ugly_hash+=(789)
$ declare -p _$var1
declare -a _ugly_hash=([0]="123" [1]="456" [2]="789")
Post by Bruce Hohl
var1='ugly_hash'; declare -a _$var1; _ugly_hash+=(123);
_$var1+=(456); Error = "bash: syntax error near unexpected token `456'"
Is there a way to specify _ugly_hash using Parameter Expansion of $var1?
In the above I desired _$var1 to expand to _ugly_hash in order to add (+=)
an additional element to that array.
Jesse Hathaway
2018-09-14 15:06:33 UTC
Permalink
Post by Bruce Hohl
# I DID EXPECT THIS TO WORK as it seems to expand as desired.
$ _$var1+="(789)"
bash: _ugly_hash+=(789): command not found
eval is a possible option

eval "_${var1}+=(789)"

unless there is a more idiomatic way of having bash perform the
interpolation on the variable name.
Bruce Hohl
2018-09-17 15:36:25 UTC
Permalink
@Greg, it is an interesting happen-stance that you replied as my question
arose from my pass at completing your duplicate file finder "exercise" at
mywiki.wooledge.org/BashProgramming/04: "If you want to "fix" this
"problem", you might suppress all the printing until the end, and then
iterate over the whole array and print only those values that contain a
newline. (This is left as an exercise.)" So with your suggestion to use
nameref vars the following seems to work:

=== Duplicate file finder exercise === (NO comments)
#!/bin/bash
while read -r md5_hash file; do
var_hash=md5_$md5_hash
declare -n ind_var_hash=$var_hash
[[ ${#ind_var_hash[@]} -eq 1 ]] && declare -a dup_array+="($var_hash)"
declare -a ${!ind_var_hash}+="('$file')"
done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {} +)

declare -n e
for e in ${dup_array[@]}; do
echo ${!e}
for f in ${e[@]}; do echo " $f"; done
done

=== Duplicate file finder exercise === (WITH comments):
#!/bin/bash

# Usage: finddups [directory]
# If no directory is specified, start in .

# This script uses Linux-specific md5sum
# and bash 4.X nameref

while read -r md5_hash file; do
var_hash=md5_$md5_hash
# prefix hash so the string does not begin
# with a numeric (not allowed as var name)

declare -n ind_var_hash=$var_hash
# use nameref var here & below to avoid substitution errors
# that occur with attempts at nested parameter expansion

[[ ${#ind_var_hash[@]} -eq 1 ]] && declare -a dup_array+="($var_hash)"
# if array ind_var_hash has a second element (from a prior iteration)
# then there is at least one 1 duplicate of the file, thus add
$var_hash
# to dup_array

declare -a ${!ind_var_hash}+="('$file')"
# declare/add $file to related $var_hash array, regarding "('$file')"
# outside quotes required else assignment fails with syntax error
# inside single quotes suppress word splitting

done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {} +)

declare -n e
for e in ${dup_array[@]}; do
echo ${!e}
for f in ${e[@]}; do echo " $f"; done
done
# 'declare -n e' -n assigns the bash nameref attribute to var e
# see man page regarding effect within a for loop

Thanks for your reply & I definitely appreciate your wiki.
It's been very helpful for improving my understanding of bash.
Post by Jesse Hathaway
Post by Bruce Hohl
# I DID EXPECT THIS TO WORK as it seems to expand as desired.
$ _$var1+="(789)"
bash: _ugly_hash+=(789): command not found
eval is a possible option
eval "_${var1}+=(789)"
unless there is a more idiomatic way of having bash perform the
interpolation on the variable name.
Greg Wooledge
2018-09-17 16:19:41 UTC
Permalink
Post by Bruce Hohl
@Greg, it is an interesting happen-stance that you replied as my question
arose from my pass at completing your duplicate file finder "exercise" at
mywiki.wooledge.org/BashProgramming/04: "If you want to "fix" this
"problem", you might suppress all the printing until the end, and then
iterate over the whole array and print only those values that contain a
newline. (This is left as an exercise.)" So with your suggestion to use
=== Duplicate file finder exercise === (NO comments)
#!/bin/bash
while read -r md5_hash file; do
var_hash=md5_$md5_hash
declare -n ind_var_hash=$var_hash
declare -a ${!ind_var_hash}+="('$file')"
done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {} +)
declare -n e
echo ${!e}
done
So your approach was to experiment with bash commands until you found
something that would approximate giving you the ability to have a hash
of lists (associative array of indexed arrays).

And what you came up with was using the entire bash variable namespace
as your hash, and storing each list as a separate indexed array within
that namespace.

That's... definitely not how I would have done it. ;-)

You're also missing some quotes.

Anyway, here is the solution that I had in mind for that:

=====================================================
#!/bin/bash
declare -A seen
while read -r md5 file; do
if [[ ${seen[$md5]} ]]; then
seen[$md5]+=$'\n'$file
else
seen[$md5]=$file
fi
done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {} +)

for i in "${!seen[@]}"; do
if [[ ${seen[$i]} = *$'\n'* ]]; then
printf 'Matching MD5:\n%s\n\n' "${seen[$i]}"
fi
done
=====================================================

The stuff I wrote in the text was really quite literal: "store multiple
filenames for each MD5 value (in a newline-delimited pseudo-list)" and
"iterate over the whole array and print only those values that contain
a newline". That's what I'm doing here.

This is also a hack, using newlines to store multiple elements of a list
in a string variable, and this only works because we're already excluding
filenames that have a newline in them. This frees up the newline character
to act as a list delimiter.

In the absence of that opening, I would simply have written the program
in a different language -- one that allows you to create a hash of lists
without needing special hacks and tricks.

For example, a relatively straight conversion to Tcl:

=====================================================
#!/usr/bin/env tclsh
if {[llength $argv]} {set start [lindex $argv 0]} else {set start .}
foreach line [split \
[exec find $start -name "*\n*" -prune -o -type f -exec md5sum "{}" +] \
\n] {
set md5 [string range $line 0 31]
set file [string range $line 34 end]
lappend seen($md5) $file
}

foreach i [array names seen] {
if {[llength $seen($i)] < 2} continue
puts [format "Matching MD5: %s" [join $seen($i) { }]]
}
=====================================================

The output format is slightly different, but of course that can
be adjusted. The elements of "seen" are simply lists of filenames,
as this language supports this directly. I'm sure a similar solution
could be written in Python (which I don't know well enough to write in).

The only reason this solution is excluding filenames with newlines is
because of the md5sum command's output format.
Bruce Hohl
2018-09-17 19:32:00 UTC
Permalink
Thanks for those comments and a clean answer. I wasn't really all that
excited about my solution as it seemed overly complicated. I was a bit
biased toward shoe horning in use of nameref feature. In hindsight
creating a variable for every unique file hash seems more ridiculous now
than at the time :) I understand your comments about the limits-of /
appropriate-use-of bash. Just trying to kick my bask understanding up a
few steps.
Post by Greg Wooledge
Post by Bruce Hohl
@Greg, it is an interesting happen-stance that you replied as my question
arose from my pass at completing your duplicate file finder "exercise" at
mywiki.wooledge.org/BashProgramming/04: "If you want to "fix" this
"problem", you might suppress all the printing until the end, and then
iterate over the whole array and print only those values that contain a
newline. (This is left as an exercise.)" So with your suggestion to use
=== Duplicate file finder exercise === (NO comments)
#!/bin/bash
while read -r md5_hash file; do
var_hash=md5_$md5_hash
declare -n ind_var_hash=$var_hash
declare -a ${!ind_var_hash}+="('$file')"
done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {}
+)
Post by Bruce Hohl
declare -n e
echo ${!e}
done
So your approach was to experiment with bash commands until you found
something that would approximate giving you the ability to have a hash
of lists (associative array of indexed arrays).
And what you came up with was using the entire bash variable namespace
as your hash, and storing each list as a separate indexed array within
that namespace.
That's... definitely not how I would have done it. ;-)
You're also missing some quotes.
=====================================================
#!/bin/bash
declare -A seen
while read -r md5 file; do
if [[ ${seen[$md5]} ]]; then
seen[$md5]+=$'\n'$file
else
seen[$md5]=$file
fi
done < <(find "${1:-.}" -name $'*\n*' -prune -o -type f -exec md5sum {} +)
if [[ ${seen[$i]} = *$'\n'* ]]; then
printf 'Matching MD5:\n%s\n\n' "${seen[$i]}"
fi
done
=====================================================
The stuff I wrote in the text was really quite literal: "store multiple
filenames for each MD5 value (in a newline-delimited pseudo-list)" and
"iterate over the whole array and print only those values that contain
a newline". That's what I'm doing here.
This is also a hack, using newlines to store multiple elements of a list
in a string variable, and this only works because we're already excluding
filenames that have a newline in them. This frees up the newline character
to act as a list delimiter.
In the absence of that opening, I would simply have written the program
in a different language -- one that allows you to create a hash of lists
without needing special hacks and tricks.
=====================================================
#!/usr/bin/env tclsh
if {[llength $argv]} {set start [lindex $argv 0]} else {set start .}
foreach line [split \
[exec find $start -name "*\n*" -prune -o -type f -exec md5sum "{}" +] \
\n] {
set md5 [string range $line 0 31]
set file [string range $line 34 end]
lappend seen($md5) $file
}
foreach i [array names seen] {
if {[llength $seen($i)] < 2} continue
puts [format "Matching MD5: %s" [join $seen($i) { }]]
}
=====================================================
The output format is slightly different, but of course that can
be adjusted. The elements of "seen" are simply lists of filenames,
as this language supports this directly. I'm sure a similar solution
could be written in Python (which I don't know well enough to write in).
The only reason this solution is excluding filenames with newlines is
because of the md5sum command's output format.
Greg Wooledge
2018-09-14 12:42:32 UTC
Permalink
Post by Bruce Hohl
Is there a way to specify _ugly_hash using Parameter Expansion of $var1? In
the above I desired _$var1 to expand to _ugly_hash in order to add (+=) an
additional element to that array.
It sounds like you're looking for namerefs. declare -n.

Of course, like almost every other "feature" in bash, they only work
some of the time.

If what you REALLY REALLY WANT is a pointer to an array, or to pass
an array to a function by reference, just switch to a real programming
language. Bash is not going to make you happy.
Loading...