EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Parsers for extensible grammars?

Started by Don Y October 22, 2014
In article <m2gps1$12c$1@dont-email.me>, rickman  <gnuarm@gmail.com> wrote:
>On 10/25/2014 2:14 PM, Les Cargill wrote: > >In many ways Forth is amazingly simple. You define words (subroutines) >that have actions. Words are stored in the dictionary. Forth has built >in a "parser" that scans the input for words and numbers (in that >order). The dictionary is searched for words in the input stream and >when found they are executed. If no word is found Forth checks to see >if the "word" is actually a number. If it is a number it is pushed onto >the stack. Pretty simple, no?
This is one implementation. Other systems just look up the word in the dictionary. If it matches, it is executed. Numbers are handled by accepting an incomplete match, i.e. the word "1" accepts all numbers starting with 1. (I see two advantages. You can have strings as easy as numbers, and the system is extensible.)
> >The action of a word can make use of system words to further parse the >input stream. This is done if the input "grammar" is not RPN style with >the values first (nouns) and the words (verbs) last. This is done even >for some Forth words like "TO" which is used to store a value in a >variable, e.g. "99 TO BottlesOfBeer".
Although it is stated in the Standard that TO is required to parse, it has been discovered that one can't write a standard program to test this. So implementations of TO are possible that do not parse at all and are standard compliant. That is not to say that parsing words are not possible or useful. One of my favorites is the defining word OS-IMPORT : "/usr/bin/ee" OS-IMPORT my-favorite-editor This defines a word my-favorite-editor with a path as given. my-favorite-edit a b c generates a string "/usr/bin/ee a b c" and passes it to SYSTEM.
> >Most people find the use of the stack to be a problem for them while it >is really no big deal. It's just different. Forth has other issues >which relate to the fact that not so many people use it. But it seems >to be a very useful tool to me.
I'm inclined to say that a stack works purely sequential, and and infix notation with high priority operators and brackets. give much more of a visual clue about what is intended. a*b+c*d (after training) groups things together. For a b * c d * + the question "what is the first term of the addition?" is indeed slightly harder to answer. You have to think, the visual system doesn't help you here.
> >-- > >Rick
Groetjes Albert. -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
On 10/26/2014 3:21 PM, Albert van der Horst wrote:
> In article <m2gps1$12c$1@dont-email.me>, rickman <gnuarm@gmail.com> wrote: >> >> Most people find the use of the stack to be a problem for them while it >> is really no big deal. It's just different. Forth has other issues >> which relate to the fact that not so many people use it. But it seems >> to be a very useful tool to me. > > I'm inclined to say that a stack works purely sequential, and and > infix notation with high priority operators and brackets. give > much more of a visual clue about what is intended. > a*b+c*d (after training) groups things together. > For > a b * c d * + > the question "what is the first term of the addition?" is indeed > slightly harder to answer. You have to think, the visual system > doesn't help you here.
Really? You can't see the bias from your early education where you learned to use parentheses? I can clearly pick out the terms from your example. If you asked about a more complex equation, then yes, postfix would not be as clear. But that is a very tiny part of Forth and I have yet to find it to be a problem. When I have an equation much more complex than the example you give I usually break it down into simpler pieces anyway no matter the language. -- Rick
Hi George,

On 10/26/2014 11:36 AM, George Neuner wrote:

>>> The relevance (if any) depends on whether you want to allow grammar >>> extensions direct access to your device API: i.e. whether the >>> extensions should be allowed to manipulate your (part of the) device >>> or whether they have to go through your provided language to do it. >> >> The goal is for people to be able to extend the "system" -- in whichever >> directions THEY choose -- and provide a simple way of configuring those >> enhancements (software/hardware/systems). It's a "low value" operation >> and, as such, doesn't merit lots of resources. OTOH, it shouldn't >> require "surgery" to tweek variables deep in the sources (these sorts >> of "selections" should be very visible without detailed inspection). > > I guess the real question is why *you* should be providing that? > You've already given them a way to control your gizmo and they have > the gizmo's API with which to construct a different command set if > they want.
I have to provide some of these features to get ANY system configured, etc. I can do this as a completely independant "service" and require others to invent their own solution to THEIR problems (the customizations that they may implement). *Or*, I can put facilities in place that encourage them to "do as I have done" in the hopes of a tighter, more consistent implementation. I expect people to be inherently lazy and opt to *modify* something that is proven as "working" rather than trying to create something new, from scratch. But, in doing so, I have to make a decent attempt at anticipating the sorts of things they *may* want/need to do. (e.g., expecting everything to be VERB NOUN pairs might rub some folks the wrong way)
>> I've considered it in the context of "settings" -- almost like twiddling >> envars -- except it need not be. I.e., the "commands" could initiate >> *actions* if so desired. >> >> I *hadn't* considered it as a "programming/scripting language" (which >> is why I called it a "command parser" and presented command line >> argument parsing and configuration file parsing as the examples to >> illustrate its intent. >> >> There *could* be some value to a real procedural language, there. >> But, when I conveyed the "Forth" suggestion to colleagues, it didn't >> fare well (I suspect that had more to do with "Forth" than the >> "programming language aspect" of the idea -- almost provincial! > > Forth - and Lisp, too - has a bad rep in many circles ... which is > unfortunate because sometimes it may be the simplest answer to the > problem.
That's true of all tools. How many folks avoid dynamic memory allocation in the belief that it is (inherently!) unsafe? Or, recursive implementations ("what happens if the recursion never stops?" "Well, same thing if your iteration never stops!")
> From what I've read I don't think you're really needing a programming > language. But then I'm still trying to figure out how sophisticated > the commands are likely to be ... they don't look like much from your > examples, but the discussion hints at more complexity.
I don't know as I can't envision the varieties of hardware (and software) that will be globbed on. E.g., I had not envisioned the need for a conditional operation -- yet, in hindsight, it seems like it could be extremely useful! ("Do this -- unless you CAN'T! In which case, do this OTHER thing!") But, how far do you take this before you begin to delve into the realm of "just add it to the executable directly"? [When I offered the "C interpreter" as an alternative to Forth -- to see if the resistance was due to the language or the mechanism -- this was the reply! "If you're going to write code, then why create another means of writing code OUTSIDE the system??"]
>>>> It's a tough call -- assume competence in EVERYTHING? Or, just >>>> assume competence in their knowledge of their "extensions"?? >>> >>> It's reasonable to expect that many developers will have little >>> experience with _formal_ parsing methods and tools ... quite a lot of >>> applications can get by with RegEx or even just separator tokenization >>> and have no need of any formal grammar. >> >> Exactly. That was my (upthread) observation of what I've encountered >> in config file parsers, command line option parsing, etc. When you >> consider how "rich" some of those environments are (options, etc.), >> it's a wonder that so much ad hoc parsing is done! I can see >> "evolution"/feeping creaturism explaining some of it (i.e., "it wasn't >> this complex when we started") but I doubt that's the real cause. >> >> I suspect it's more one of familiarity: you write "generic code" >> everyday. So, writing something to extract a numeric value from >> the "second whitespace-delimited field" in a statement is trivial. >> OTOH, writing a formal grammar so that a *tool* can do this for >> you AUTOMATICALLY probably means you spend more time relearning the >> tool than you would have spent writing the code! > > I think it's more panic upon seeing the manual's introduction to the > method theory. The vast majority of programmers today have little or > no formal schooling, and language theory waters (doesn't matter > syntactic or semantic) get deep very fast.
Or, a lack of patience -- as exemplified by wanting to write code "on day one" (before they even know where that code is *headed*).
> As with most things, deep understanding isn't necessary to use the > tools, but a programmer does need at least a passing familiarity with > the tool's method to understand what it is trying to do and why it is > failing. In this regard parser generators are less forgiving than the > average compiler and it doesn't much matter which method(s) the tool > uses - you can make as big a mess with LL(n) as with (LA|S)?LR(1) and > quite easily hang yourself using PEG or GLR.
My experiences with lex/yacc were far less "pleasant" than writing arbitrarily complex code in a procedural/object language! Too much head scratching as there were interactions "across" the grammar that were only apparent in hindsight. The way I thought of the various constructs "as a human being" was very different from the way the tools NEEDED me to think of them. This doesn't appear in *most* languages (aside from unfortunate issues like "foo =- 2;")
> Learning to use a parser generator isn't really any harder than > learning any other programming language, but the documentation tends > to be scarier. This isn't helped by the tendency of tool developers > to minimally document and to relate their wares to existing tools that > the user is presumed already to be familiar with.
There tends to be far more abstraction involved. People seem to have a problem with abstraction -- always wanting to know "why do you want to do that" or "what are you trying to do". "Timmy has three apples. Mary has one. How many apples would Timmy have to give to Mary in order for them to each have the same number of apples?" "Why should Timmy have to give her ANY? Why couldn't *Mary* bring the right number of apples to school??" [Of course, the correct answer is that Timmy should quickly EAT two of his apples and the problem then disappears! :> ]
>> <shrug> > > Nothing to be done about it except try to convince people that it > really isn't that hard and that, for most purposes, they really should > be using a generator tool rather than rolling their own.
Check your mail. I think the solution I proposed will be *so* tempting to others that they'll just "cut and paste; edit to taste". I figured easier to take this offlist than deal with the inevitable complaints about "lengthy posts"... :>
> The only really valid exceptions are very simple interface "languages" > and very tiny systems 8-) that can't afford the overhead of a > generated parser. The constant overhead of a generated parser is > fairly large: with care using Bison you can shoehorn a fairly complex > language into ~10KB ... but even the simplest VERB NOUN command parser > is difficult to bring in under ~3KB. And Bison actually is pretty > good at making small parsers - there are a few tools that are better, > but many more are worse. > > However, I think even the parser size argument is losing weight as the > average "small" system keeps getting bigger (32-bit ARM running Linux > toasting bread, etc.). Moreover, many small system developers are > comfortable using (some form of) RegEx to handle interfaces and > probably most have no clue about its intrinsic overhead.
But even big systems often have "regions" in which severe constraints are placed on the implementation. E.g., many of those "big Linux implementation" compress the kernel to economize on FLASH/ROM -- this suggests an unwillingness to add "unnecessary resources" for a one-time event (decompressing the kernel/system on IPL). The same sorts of arguments can apply to "setting configuration options" ("Can't we just set some bitmaps in a hidden corner of FLASH?") Or, things like delivering options in BOOTP packets, etc. Sometimes you just can't "do what you want" (or what would be "ideal")
On Sun, 26 Oct 2014 12:37:06 -0700, Don Y <this@is.not.me.com> wrote:

>On 10/26/2014 11:36 AM, George Neuner wrote: > >[When I offered the "C interpreter" as an alternative to Forth -- to see >if the resistance was due to the language or the mechanism -- this was >the reply! "If you're going to write code, then why create another >means of writing code OUTSIDE the system??"]
That shows a lack of imagination. The reason for providing a scripting language is obvious - assuming there is any possible use for scripting in the 1st place.
>> As with most things, deep understanding isn't necessary to use [parser] >> tools, but a programmer does need at least a passing familiarity with >> the tool's method to understand what it is trying to do and why it is >> failing. In this regard parser generators are less forgiving than the >> average compiler and it doesn't much matter which method(s) the tool >> uses - you can make as big a mess with LL(n) as with (LA|S)?LR(1) and >> quite easily hang yourself using PEG or GLR. > >My experiences with lex/yacc were far less "pleasant" than writing >arbitrarily complex code in a procedural/object language! Too much >head scratching as there were interactions "across" the grammar >that were only apparent in hindsight. The way I thought of the various >constructs "as a human being" was very different from the way the tools >NEEDED me to think of them. This doesn't appear in *most* languages >(aside from unfortunate issues like "foo =- 2;")
You would have had a much different experience with a recursive decent LL tool. You still would have had problems until you understood the grammar model, but your errors would have been somewhat easier to locate. Some tools like Antlr (particularly using the Antlrworks GUI) are really helpful for locating problems.
>> Learning to use a parser generator isn't really any harder than >> learning any other programming language, but the documentation tends >> to be scarier. This isn't helped by the tendency of tool developers >> to minimally document and to relate their wares to existing tools that >> the user is presumed already to be familiar with. > >There tends to be far more abstraction involved. People seem to have >a problem with abstraction -- always wanting to know "why do you want >to do that" or "what are you trying to do". > > "Timmy has three apples. Mary has one. How many apples would Timmy have > to give to Mary in order for them to each have the same number of apples?" > > "Why should Timmy have to give her ANY? Why couldn't *Mary* bring the > right number of apples to school??" > >[Of course, the correct answer is that Timmy should quickly EAT two of his >apples and the problem then disappears! :> ]
Mary has 5 kittens. If she gives 2 kittens to Timmy, how many kittens will Mary have? 5! Mary won't give Timmy any kittens. Your point re: abstraction is valid ... but the response to it is that the abstraction of a formal grammar is the entire point of using a parser generator.
>> The only really valid exceptions are very simple interface "languages" >> and very tiny systems 8-) that can't afford the overhead of a >> generated parser. >> : >> However, I think even the parser size argument is losing weight as the >> average "small" system keeps getting bigger (32-bit ARM running Linux >> toasting bread, etc.). > >But even big systems often have "regions" in which severe constraints >are placed on the implementation. E.g., many of those "big Linux >implementation" compress the kernel to economize on FLASH/ROM -- this >suggests an unwillingness to add "unnecessary resources" for a one-time >event (decompressing the kernel/system on IPL). The same sorts of >arguments can apply to "setting configuration options" ("Can't we just >set some bitmaps in a hidden corner of FLASH?")
True. But Linux tries to be all things to all people. In the "normal" desktop/server environment, there really is no reason for the kernel to be compressed - it's only ~100KB and damn few distributions will even start in less than 256MB. Besides which, the uncompressed kernel is there on the disk too - used for paging kernel code if that is enabled. And how often is compiling for a small system done *on* the small system? That's what cross-development is for and I think you can count on small developers to understand that (at least for the foreseeable future). George
The 2026 Embedded Online Conference