Discussion:
[Help-bash] Announcement: Started project to parse bash script to AST.
Mike Mestnik
2017-07-22 03:16:26 UTC
Permalink
Off list, please cc me on reply.

[1]Abstract Syntax Tree, to be used to beautify or minify bash source.
It's a work in progress but already the [2]code passes a handful of
example [3]tests. I'd appreciate a collection of small, less than 5
lines, examples that express the various edge cases supported by the
bash shell. If the maintainer of a online bash script testing site
wouldn't mind running a DB query for me that would be awesome. I'm
also looking for a volunteer or two that can assist with development
in perl.

Thank you.

1. https://en.wikipedia.org/wiki/Abstract_syntax_tree
2. https://github.com/cheako/MarpaX-Languages-Bash-AST/blob/9952b3e0f5a3b80aab2f9801c9e6a745350ffb83/lib/MarpaX/Languages/Bash/AST.pm
3. https://travis-ci.org/cheako/MarpaX-Languages-Bash-AST/builds/256283379#L331
Andy Chu
2017-07-22 17:13:34 UTC
Permalink
Hi Mike,

You might be interested in my project Oil, which has a well-tested bash
parser. It parses the program up front in a single pass, rather than
interleaving parsing and execution like bash and every other shell does. I
gave a summary in this comment (it was partially inspired by the bash AOSA
book chapter):

https://news.ycombinator.com/item?id=14550523

The blog goes into detail:

http://www.oilshell.org/blog/

The first 10 or 20 posts are all about parsing, including what algorithms
are used, and testing it on about a million lines of real bash code found
in the wild.

I've also been in contact with the author of this shell formatter, which
does a similar thing, for a different purpose: https://github.com/mvdan/sh

My parser is hand-written, but I would be interested in specifying it in a
meta-language more suitable than Yacc. Yacc is not the right tool for it.
For one, top-down parsing suits the shell better than bottom-up.

What I found is that if the lexer is relatively sophisticated, then the
parser doesn't need to be very powerful.

I had come across the Marpa parsing algorithm in my search. The general
idea I got was that it was more powerful and sophisticated than I need, and
maybe it trades off memory for speed. But I could be wrong about that.
However, my parser is also linear time, and it needs just a single token of
lookahead. (The parser asks for tokens in one of 13 lexical modes.)

I'd be interested in hearing more about the Marpa algorithm and how it
relates the bash use case.

Also if you want to try it, you can download and run it with "osh -n
foo.sh". It will give you a pretty-printed AST as in this post:

https://www.oilshell.org/blog/2017/01/21.html

I think the instructions for setup might be a little bit out of date -- if
so let me know:

https://github.com/oilshell/oil/wiki/Contributing

I'm about to make a release which will have easier instructions (configure,
make, etc.)

Andy
Post by Mike Mestnik
Off list, please cc me on reply.
[1]Abstract Syntax Tree, to be used to beautify or minify bash source.
It's a work in progress but already the [2]code passes a handful of
example [3]tests. I'd appreciate a collection of small, less than 5
lines, examples that express the various edge cases supported by the
bash shell. If the maintainer of a online bash script testing site
wouldn't mind running a DB query for me that would be awesome. I'm
also looking for a volunteer or two that can assist with development
in perl.
Thank you.
1. https://en.wikipedia.org/wiki/Abstract_syntax_tree
2. https://github.com/cheako/MarpaX-Languages-Bash-AST/blob/
9952b3e0f5a3b80aab2f9801c9e6a745350ffb83/lib/MarpaX/Languages/Bash/AST.pm
3. https://travis-ci.org/cheako/MarpaX-Languages-Bash-AST/
builds/256283379#L331
Mike Mestnik
2017-07-23 00:33:16 UTC
Permalink
I started my project because I was looking for a beautifier that would
break long lines, combine short lines, and generally do much more than
one can do with a handful of regexes as with
https://github.com/shri314/beautify_bash. Oil might satisfy my
requirements, I didn't think I would find an AST generator that would
keep comments.

Despite the name my project is written in pure perl, it doesn't use
Marpa. The name was chosen because that's where some of the other AST
generators are. I did make a good attempt at using Marpa's scanless
interface, but found that it wouldn't easily(for me at least) handle
the most basic tasks like...

* Parsing out a comment, it's only good if you want to /dev/null
comments. I'm also needing Lossless Syntax Tree.
* Splitting n words on n-1 delimiters. It can only handle preceding
or trailing delimiters... at least I hope it can do that.

Failing to be able to do that I stopped any effort to use Marpa. I
started writing a huge elsif tree, but was later shown Discrete Finite
Automaton and started using my own variant of that.

I've been working off of observation and thus far I've seen bash do a
number of things I wish it wouldn't like failing on both of these.

***@debian:~$ case in in a) ;; esac) ;;
bash: syntax error near unexpected token `)'
***@debian:~$ case in in a) ;; esac | a) ;;
bash: syntax error near unexpected token `)'

I forget what the other was, but when I discovered it I was totally
like if this was corrected there would only be a 0.001% chance anyone
has this code and depend on the current "do nothing" behaviour. I
remember bringing it up on freenode, but I can't find a log for that
channel. It didn't seem like anyone around at the time was keen on
taking up changing anything related to bash... as if there was never
going to be another version.
Post by Andy Chu
Hi Mike,
You might be interested in my project Oil, which has a well-tested bash
parser. It parses the program up front in a single pass, rather than
interleaving parsing and execution like bash and every other shell does. I
gave a summary in this comment (it was partially inspired by the bash AOSA
https://news.ycombinator.com/item?id=14550523
http://www.oilshell.org/blog/
The first 10 or 20 posts are all about parsing, including what algorithms
are used, and testing it on about a million lines of real bash code found in
the wild.
I've also been in contact with the author of this shell formatter, which
does a similar thing, for a different purpose: https://github.com/mvdan/sh
My parser is hand-written, but I would be interested in specifying it in a
meta-language more suitable than Yacc. Yacc is not the right tool for it.
For one, top-down parsing suits the shell better than bottom-up.
What I found is that if the lexer is relatively sophisticated, then the
parser doesn't need to be very powerful.
I had come across the Marpa parsing algorithm in my search. The general
idea I got was that it was more powerful and sophisticated than I need, and
maybe it trades off memory for speed. But I could be wrong about that.
However, my parser is also linear time, and it needs just a single token of
lookahead. (The parser asks for tokens in one of 13 lexical modes.)
I'd be interested in hearing more about the Marpa algorithm and how it
relates the bash use case.
Also if you want to try it, you can download and run it with "osh -n
https://www.oilshell.org/blog/2017/01/21.html
I think the instructions for setup might be a little bit out of date -- if
https://github.com/oilshell/oil/wiki/Contributing
I'm about to make a release which will have easier instructions (configure,
make, etc.)
Andy
Post by Mike Mestnik
Off list, please cc me on reply.
[1]Abstract Syntax Tree, to be used to beautify or minify bash source.
It's a work in progress but already the [2]code passes a handful of
example [3]tests. I'd appreciate a collection of small, less than 5
lines, examples that express the various edge cases supported by the
bash shell. If the maintainer of a online bash script testing site
wouldn't mind running a DB query for me that would be awesome. I'm
also looking for a volunteer or two that can assist with development
in perl.
Thank you.
1. https://en.wikipedia.org/wiki/Abstract_syntax_tree
2.
https://github.com/cheako/MarpaX-Languages-Bash-AST/blob/9952b3e0f5a3b80aab2f9801c9e6a745350ffb83/lib/MarpaX/Languages/Bash/AST.pm
3.
https://travis-ci.org/cheako/MarpaX-Languages-Bash-AST/builds/256283379#L331
Chet Ramey
2017-07-24 15:17:21 UTC
Permalink
Post by Mike Mestnik
I've been working off of observation and thus far I've seen bash do a
number of things I wish it wouldn't like failing on both of these.
bash: syntax error near unexpected token `)'
bash: syntax error near unexpected token `)'
These are both syntax errors. Why would you rather the shell not
catch them?
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://cnswww.cns.cwru.edu/~chet/
Barry Walker
2017-07-23 10:07:34 UTC
Permalink
Hi Mike...
Post by Mike Mestnik
1. https://en.wikipedia.org/wiki/Abstract_syntax_tree
2. https://github.com/cheako/MarpaX-Languages-Bash-AST/blob/9952b3e0f5a3b80aab2f9801c9e6a745350ffb83/lib/MarpaX/Languages/Bash/AST.pm
3. https://travis-ci.org/cheako/MarpaX-Languages-Bash-AST/builds/256283379#L331
You might be interested in this, licenced GPL3:-

https://www.shellcheck.net/
--
73...

Bazza, G0LCU...

Team AMIGA...

The less that I speak, the smarter I sound.
Loading...