EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Parsers for extensible grammars?

Started by Don Y October 22, 2014
On 10/25/2014 8:29 PM, Paul E Bennett wrote:
> rickman wrote: >> >> Don may be looking for something that will let him check the input for >> the correct syntax, number of values, etc. But as usual he has not >> really defined the problem he is trying to solve. > > I don't know if Don has tried playing with Forth or not, I just made a > suggestion that what he was seeking to do sounded somewhat Forth-like in > nature. With somewhat more difficulty he could probably look at re-creating > a MSDOS type environment which would also suit the bill (new commands in > batch files etc or added programmes in the COMMAND directory). > > You are probably right about a lack of Clear, Concise, Correct, Coherent, > Complete & Confirmable (Testable) specification of the requirements.
It sounds like you are setting the bar a bit higher than I was looking for. I'd just like to get an idea of what he is doing. There is a long history of Don's posts asking for info on a very specific item. When people ask him for details which are often where the crux of the problem lies, he shares sparingly... but only in the sense of info, while being very prolific with words. As people try to suggest he is decomposing the problem in an odd way which makes the problem harder to solve he throws out more details that seem to justify a rather unique approach. In the end those who are trying to help get frustrated because of the difficulty and Don gets frustrated because people seem to be argumentative rather than helpful. I think his original post on this was something along the lines of, "How long is a piece of string?" or maybe a better analogy is, "Where can I get some string that will be long enough to do my job and strong enough to not break on the job but will break when I want to break it." I haven't read all the chit-chat that has evolved from that. I was going to suggest Forth might be useful and saw your post. -- Rick
On 10/25/2014 6:05 PM, Simon Clubley wrote:
> On 2014-10-25, Don Y <this@is.not.me.com> wrote: >> On 10/25/2014 4:15 AM, Simon Clubley wrote: >>> >>> Commands can exist outside of the above infrastructure if one wishes; >>> they are called foreign commands and are treated pretty much like a >>> Unix command would be - there's no validation of options prior to >>> the executable starting and hence there's no nicely pre-parsed options >>> and option values ready to be read from within the program itself. >> >> You can *freely* mark any command as either type? I.e., can I mark >> a command that relies on the DCL stuff for option parsing as a FOREIGN >> command? And, thus, screw it up (at runtime)? > > Yes, but it would fail in a clean way with a status code returned from > the CLI routine called by the program to return the (non-existent) > pre-parsed information.
Understood. I.e., the DCL-ed executables aren't *fed* arguments that the front-end has testified as "valid". Rather, it *fetches* that information from the front end. As such, if the front end wasn't in place when the executable was invoked, the executable is aware of that.
> The idea behind pointing you to the DCL CLD material was to give you > some possible ideas about how extendable CLIs with validated options > are handled in another environment (ie: VMS).
Yes.
> Like I said at the time, it's not an exact match for your requirements > but I thought you might be interested in seeing how a similar problem > was handled in VMS, including the syntax used in the command definition > file (and compare it to how the problem is fully pushed to the executable > itself in Unix land).
Exactly!
>>> IOW, having foreign commands available as an option doesn't stop you >>> from also having the native DCL integrated approach. If you try to run >>> a DCL integrated executable as a foreign command, no values will be >>> available to read from within your program so you are forced to run it >>> via the DCL mechanism above which means you also get the DCL level >>> validation as well. >> >> So, this (DCL) mechanism is meant as an *aid*/service -- not as a means >> of ensuring software integrity (?). > > It's a way of expressing an expected structure for a command line which > can be somewhat validated by DCL before the program even starts and > provides a robust, operating system level, method for a program to > obtain command line parameters and options in a way (and with > functionality) that leaves getopt and friends standing in the dust. > > The relevance here is that I've encountered people who have never > been exposed to the VMS way of handling this and who think that ad-hoc > getopt style functionality is the only _possible_ way to parse command > lines. I just wanted to make reference to another way of doing this in > case it gave you some ideas even though there's probably nothing you > can _directly_ use here.
Understood. I'm not fond of the VMS syntax. But, will be studying the implementation to see if I can extract some ideas and apply them to a more suitable syntax.
> Sometimes Don you leave your questions wide open, presumably in order > to invite a wide range of options in response, including ones you had > never even considered. The difficulty with that is that sometimes it's > hard to understand what additional unspoken constraints might exist. :-)
I pass along the minimum constraints under which I am operating. If I *add* constraints, then I risk steering solutions into a specific direction that isn't necessarily required for the problem at hand. Or, discounting solutions that could be viable. I engage clients with the same sort of approach: "Is what you are telling me a REQUIREMENT or just your idea of how I *might* approach the problem?" The fewer the requirements, the more flexibility in the solution. This almost always leads to a more optimized solution. By contrast, wanting extra ARTIFICIAL constraints boxes the design into a corner. "Why are you doing it like that?" "Because that's how everyone else does it!" or "Because that's how I *thought* it would be done." In this case, I asked for a "command parser" that was "extensible". And, indicated that those extensions would be implemented at compile time (so, any solution need not accommodate dynamic modifications). I also introduced contrived examples for folks who couldn't think entirely in the abstract: "Imagine these were the common commands and you wanted to add commands LIKE these..." Anything beyond this would change the problem definition. Note that I didn't ask for a "programming language" -- or even a scripting language (yet, at least one proposed solution fits that description AND fits my requirement!). OTOH, if I had augmented my question as "an extensible scripting language", I would risk YOUR solution potentially disqualifying itself. :-/
On 10/24/2014 7:46 AM, Don Y wrote:

[Forth]

> Admittedly, it lets you (the user) do lots of things that you > wouldn't be able to even consider in, e.g., a conventional BIOS. > (And, would be far more "future safe" from my point of view). > > I'll have to think real hard on that. If the "typical" sorts > of things "look mildly familiar", then it might be possible > to slip it in under the radar (without religious issues biasing > the decision/acceptance). OTOH, if things start to look too > funky, then it just gives people an excuse to dislike it. :-/
Wow! I was *amazed* at the intensity of the reactions to this suggestion when I circulated it among colleagues! In hindsight, I wish I had presented the suggestion as "a generic scripting language" instead of "Forth" as many of the replies seemed to *choke* on the mention of Forth! <frown> Despite the fact that several of us use Open Firmware. [Still waiting for replies from two other folks, but...] I guess I don't really understand the "disapproval" (for want of a better word). Granted, it's been almost 40 years since I used Forth but I don't have a "bad taste" associated with the experience. I tried to quickly shift the discussion away from Forth by offering other "simple"/small languages as alternatives. But, I think the idea had already taken on a taint. I even proposed a small C interpreter for that role ("Why interpret C when you can COMPILE it??"). I have an even simpler "language" I'll try proposing but fear it won't fly, either! (sigh) I will have to wait for attitudes to settle back down before looking for clarification of this "rejection". And, perhaps reexamine Forth to get a feel for what folks might be objecting to...
rickman wrote:
> On 10/25/2014 2:14 PM, Les Cargill wrote: >> rickman wrote: >>> On 10/23/2014 11:30 PM, Les Cargill wrote: >>>> Don Y wrote: >>>>> Hi, >>>>> >>>>> [I probably should direct this at George... :> ] >>>>> >>>>> I'm writing a command parser. Some set of commands are "common". >>>>> But, other instances of the parser are augmented with additional >>>>> commands/syntax. >>>>> >>>>> [These additions are known at compile time, not "dynamic"] >>>>> >>>>> Ideally, I want a solution that allows folks developing those >>>>> "other" commands to just "bolt onto" what I have done. E.g., >>>>> creating a single "grammar definition" (lex/yacc) is probably >>>>> not a good way to go (IME, most small parsers tend to be ad hoc). >>>>> >>>>> [Note: I can't even guarantee that the extensions will be >>>>> consistent or "harmonious" with the grammar that I implement] >>>>> >>>>> A naive approach (?) might be for my code to take a crack at >>>>> a particular "statement"/command and, in case of FAIL, invoke >>>>> some bolt-on parser to give it a chance to make sense out of >>>>> the input. If *it* FAILs, then the input is invalid, etc. >>>>> >>>>> This sounds like an incredible kluge. Any better suggestions? >>>> >>>> >>>> If all else fails, strstr() keywords, then have a corresponding >>>> parsers for each keyword. >>>> >>>> A better approach is a container (struct) of keywords or arguments ( if >>>> it's not a keyword, it's an argument ) , an integer for the index of >>>> keywords in s string table, and an index for the position of the token >>>> within the command. >>>> >>>> Extensibility should not be difficult. >>> >>> That is starting to sound a lot like Forth. >>> >> >> Forth could easily be a better choice. I need to spend >> more time on Forth. >> >> I tend towards Tcl because I have a large codebase of scripts >> for it. It also excels at socket/serial port handling. > > In many ways Forth is amazingly simple. You define words (subroutines) > that have actions. Words are stored in the dictionary. Forth has built > in a "parser" that scans the input for words and numbers (in that > order). The dictionary is searched for words in the input stream and > when found they are executed. If no word is found Forth checks to see > if the "word" is actually a number. If it is a number it is pushed onto > the stack. Pretty simple, no? >
Very.
> The action of a word can make use of system words to further parse the > input stream. This is done if the input "grammar" is not RPN style with > the values first (nouns) and the words (verbs) last. This is done even > for some Forth words like "TO" which is used to store a value in a > variable, e.g. "99 TO BottlesOfBeer". > > Most people find the use of the stack to be a problem for them while it > is really no big deal. It's just different. Forth has other issues > which relate to the fact that not so many people use it. But it seems > to be a very useful tool to me. >
-- Les Cargill
On Fri, 24 Oct 2014 07:38:12 -0700, Don Y <this@is.not.me.com> wrote:

>On 10/23/2014 1:49 PM, George Neuner wrote: > >> Your "core" can be operated with it's provided command set, but you >> can't easily provide for new commands targeting an environment you >> know nothing about. > >Yes. And, I can't "guess" to the types of data/actions that may >need to be "encoded" in that command set. > >Nor can I ensure the next guy will have the same approach to the >problem -- half the commands may "feel like mine" while the other >half "feel like his/hers". And, neither group "feels consistent" >with the other.
I like Simon's idea re: DCL. I only used VMS briefly in school and I wasn't aware that DCL could be extended in that way. It's a nice way to regularize at least the argument input. If all you want is to regularize input, another way might be to borrow from the HTML fast-cgi interface. It doesn't guarantee anything about interoperability per se, but it does arrange that programs get their arguments in recognizable key:value pairs.
>The (forced) "computer upgrade" cost me my mail archive >(I don't leave mail stored on server :< )
Ouch. I don't leave anything on provider servers either, but my own IMAP server is backed up regularly. What I was thinking about is less important given compile time parser generation ... we were talking about providing API object and entry point addresses to runtime generated code. The relevance (if any) depends on whether you want to allow grammar extensions direct access to your device API: i.e. whether the extensions should be allowed to manipulate your (part of the) device or whether they have to go through your provided language to do it.
>>>> In either case, the additional parser should get the first crack at >>>> input with Don's own parser as a fallback. >>> >>> Hmmm... I had considered the opposite approach: let me verify the >>> input is/isn't a "common command" before exposing the input to the >>> "other" parser. The thought being: common commands are then >>> implicitly invariant (you can't change the syntax of them) AND >>> they "always work" (even if the additional parser has bugs in its >>> implementation). >> >> You don't want to "front end" the extensions ... if one of your >> commands is redefined, it won't work if you grab it first. > >That was exactly the point: that the common commands are *common* >and always behave exactly the same. (as well as "always working" >regardless of how much the other guy mucks with the interface).
That's the safe approach, but it's not terribly friendly ... particularly in the case of a reusable component where you don't have any say in the use. A new developer may want to completely change the approach to the command set.
>It's a tough call -- assume competence in EVERYTHING? Or, just >assume competence in their knowledge of their "extensions"??
It's reasonable to expect that many developers will have little experience with _formal_ parsing methods and tools ... quite a lot of applications can get by with RegEx or even just separator tokenization and have no need of any formal grammar. George
On 2014-10-26, Don Y <this@is.not.me.com> wrote:
> On 10/24/2014 7:46 AM, Don Y wrote: >> >> I'll have to think real hard on that. If the "typical" sorts >> of things "look mildly familiar", then it might be possible >> to slip it in under the radar (without religious issues biasing >> the decision/acceptance). OTOH, if things start to look too >> funky, then it just gives people an excuse to dislike it. :-/ > > Wow! I was *amazed* at the intensity of the reactions to this > suggestion when I circulated it among colleagues! In hindsight, > I wish I had presented the suggestion as "a generic scripting > language" instead of "Forth" as many of the replies seemed to > *choke* on the mention of Forth! <frown> Despite the fact > that several of us use Open Firmware. >
Do any of these people use RPN calculators ? Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
Hi George,

On 10/26/2014 12:47 AM, George Neuner wrote:
> On Fri, 24 Oct 2014 07:38:12 -0700, Don Y <this@is.not.me.com> wrote: >> On 10/23/2014 1:49 PM, George Neuner wrote: >> >>> Your "core" can be operated with it's provided command set, but you >>> can't easily provide for new commands targeting an environment you >>> know nothing about. >> >> Yes. And, I can't "guess" to the types of data/actions that may >> need to be "encoded" in that command set. >> >> Nor can I ensure the next guy will have the same approach to the >> problem -- half the commands may "feel like mine" while the other >> half "feel like his/hers". And, neither group "feels consistent" >> with the other. > > I like Simon's idea re: DCL. I only used VMS briefly in school and I > wasn't aware that DCL could be extended in that way. It's a nice way > to regularize at least the argument input.
I'm looking into the implementation to see which ideas I can borrow. I think I have a "prettier" way of creating the syntax.
> If all you want is to regularize input, another way might be to borrow > from the HTML fast-cgi interface. It doesn't guarantee anything about > interoperability per se, but it does arrange that programs get their > arguments in recognizable key:value pairs. > >> The (forced) "computer upgrade" cost me my mail archive >> (I don't leave mail stored on server :< ) > > Ouch. I don't leave anything on provider servers either, but my own > IMAP server is backed up regularly.
I move my mail archive to a little server that does my DNS, TFTP, font, DHCP, etc. services periodically. Lets me view it as mail (instead of as a raw "mailbox"). That box is the only thing that runs 24/7/365, here so it's the destination for just about everything! [It's more of an appliance than a server] In my eagerness to get rid of kit before the annual equipment upgrade cycle, I opted to replace that box -- it had been running for several years (uptime was over 1000 days) so I felt it "deserved" to be replaced. I moved everything onto a *smaller*, faster, less power-hungry box. Life was good. Another bit of kit into the bin! :> Unfortunately, the box only lasted a few weeks before the disk (apparently) spun a bearing. Current thinking is it wasn't a suitable orientation to *mount* the disk and/or too close to the main heatsink (the box is REALLY small! had to shoehorn the 2.5" disk in there just to get it to fit!). Anyway, easy to recreate everything for which I had sources -- even my databases! Mail archive has never been something I considered "precious" so that was *the* backup (aside from what was on my MUA at the time). Rather than risk another disk in the "new" appliance (which was never intended to have a physical disk drive within), I replaced *that* with another appliance that *does* accommodate a disk drive (even having a fan just for the drive). <shrug> Moral: "If it ain't broke, don't fix it!" When the box that hosts the MUA went down, I took the opportunity to replace it (preserving the mail that *was* on it at the time). So, I managed to get rid of some kit -- and the most recent chunk of my mail archive in the process. <frown>
> What I was thinking about is less important given compile time parser > generation ... we were talking about providing API object and entry > point addresses to runtime generated code. > > The relevance (if any) depends on whether you want to allow grammar > extensions direct access to your device API: i.e. whether the > extensions should be allowed to manipulate your (part of the) device > or whether they have to go through your provided language to do it.
The goal is for people to be able to extend the "system" -- in whichever directions THEY choose -- and provide a simple way of configuring those enhancements (software/hardware/systems). It's a "low value" operation and, as such, doesn't merit lots of resources. OTOH, it shouldn't require "surgery" to tweek variables deep in the sources (these sorts of "selections" should be very visible without detailed inspection). I've considered it in the context of "settings" -- almost like twiddling envars -- except it need not be. I.e., the "commands" could initiate *actions* if so desired. I *hadn't* considered it as a "programming/scripting language" (which is why I called it a "command parser" and presented command line argument parsing and configuration file parsing as the examples to illustrate its intent. There *could* be some value to a real procedural language, there. But, when I conveyed the "Forth" suggestion to colleagues, it didn't fare well (I suspect that had more to do with "Forth" than the "programming language aspect" of the idea -- almost provincial! [No desire to take on THAT fight, thankyouverymuch!]
>>>>> In either case, the additional parser should get the first crack at >>>>> input with Don's own parser as a fallback. >>>> >>>> Hmmm... I had considered the opposite approach: let me verify the >>>> input is/isn't a "common command" before exposing the input to the >>>> "other" parser. The thought being: common commands are then >>>> implicitly invariant (you can't change the syntax of them) AND >>>> they "always work" (even if the additional parser has bugs in its >>>> implementation). >>> >>> You don't want to "front end" the extensions ... if one of your >>> commands is redefined, it won't work if you grab it first. >> >> That was exactly the point: that the common commands are *common* >> and always behave exactly the same. (as well as "always working" >> regardless of how much the other guy mucks with the interface). > > That's the safe approach, but it's not terribly friendly ...
That's why they were called "common commands" -- things that you can *rely* upon regardless of implementation.
> particularly in the case of a reusable component where you don't have > any say in the use. A new developer may want to completely change the > approach to the command set.
Well, he can always rewrite the subsystem! :> I doubt it will take me more than a week or two to put it in place (though in this first instance, there are two other subsystems that I want to add/modify in the process for a more integrated experience)
>> It's a tough call -- assume competence in EVERYTHING? Or, just >> assume competence in their knowledge of their "extensions"?? > > It's reasonable to expect that many developers will have little > experience with _formal_ parsing methods and tools ... quite a lot of > applications can get by with RegEx or even just separator tokenization > and have no need of any formal grammar.
Exactly. That was my (upthread) observation of what I've encountered in config file parsers, command line option parsing, etc. When you consider how "rich" some of those environments are (options, etc.), it's a wonder that so much ad hoc parsing is done! I can see "evolution"/feeping creaturism explaining some of it (i.e., "it wasn't this complex when we started") but I doubt that's the real cause. I suspect it's more one of familiarity: you write "generic code" everyday. So, writing something to extract a numeric value from the "second whitespace-delimited field" in a statement is trivial. OTOH, writing a formal grammar so that a *tool* can do this for you AUTOMATICALLY probably means you spend more time relearning the tool than you would have spent writing the code! <shrug>
On 10/26/2014 5:15 AM, Simon Clubley wrote:
> On 2014-10-26, Don Y <this@is.not.me.com> wrote: >> On 10/24/2014 7:46 AM, Don Y wrote: >>> >>> I'll have to think real hard on that. If the "typical" sorts >>> of things "look mildly familiar", then it might be possible >>> to slip it in under the radar (without religious issues biasing >>> the decision/acceptance). OTOH, if things start to look too >>> funky, then it just gives people an excuse to dislike it. :-/ >> >> Wow! I was *amazed* at the intensity of the reactions to this >> suggestion when I circulated it among colleagues! In hindsight, >> I wish I had presented the suggestion as "a generic scripting >> language" instead of "Forth" as many of the replies seemed to >> *choke* on the mention of Forth! <frown> Despite the fact >> that several of us use Open Firmware. > > Do any of these people use RPN calculators ?
I'm sure they/we all have in the past (I don't use a calculator). And, several of us "regularly" use Forth in configuring, for example, SPARCstations, etc. (OpenFirmware)! But, it's not something that is done *often* so there's always a relearning experience involved. Forth is quirky -- remembering which words serve which functionality, etc. (what word displays my IP address? how do I remove an alias from NVRAMrc? etc.) By contrast, they/we have probably used a BASIC dialect even LESS recently than OFW -- yet, I would imagine all of us could craft a little BASIC "script" in a matter of seconds... and, be assured it would work first time. <shrug> Dunno. I'm only guessing as to the source of the "resistance" (resentment?). As I said, I tried to shift the discussion away from Forth, per se, to see if the idea of a procedural language was the issue ("We don't need that level of functionality -- just parse COMMANDS!") or the proposed language... I'll revisit it, later (just for my own edification) but won't chase that approach after this sort of "reception" :-/
On 10/26/2014 12:02 PM, Don Y wrote:
> On 10/26/2014 5:15 AM, Simon Clubley wrote: >> On 2014-10-26, Don Y <this@is.not.me.com> wrote: >>> On 10/24/2014 7:46 AM, Don Y wrote: >>>> >>>> I'll have to think real hard on that. If the "typical" sorts >>>> of things "look mildly familiar", then it might be possible >>>> to slip it in under the radar (without religious issues biasing >>>> the decision/acceptance). OTOH, if things start to look too >>>> funky, then it just gives people an excuse to dislike it. :-/ >>> >>> Wow! I was *amazed* at the intensity of the reactions to this >>> suggestion when I circulated it among colleagues! In hindsight, >>> I wish I had presented the suggestion as "a generic scripting >>> language" instead of "Forth" as many of the replies seemed to >>> *choke* on the mention of Forth! <frown> Despite the fact >>> that several of us use Open Firmware. >> >> Do any of these people use RPN calculators ? > > I'm sure they/we all have in the past (I don't use a calculator). > > And, several of us "regularly" use Forth in configuring, for example, > SPARCstations, etc. (OpenFirmware)! > > But, it's not something that is done *often* so there's always > a relearning experience involved. Forth is quirky -- remembering > which words serve which functionality, etc. (what word displays > my IP address? how do I remove an alias from NVRAMrc? etc.) > > By contrast, they/we have probably used a BASIC dialect even > LESS recently than OFW -- yet, I would imagine all of us could > craft a little BASIC "script" in a matter of seconds... and, be > assured it would work first time. > > <shrug> Dunno. I'm only guessing as to the source of the > "resistance" (resentment?). As I said, I tried to shift the > discussion away from Forth, per se, to see if the idea of a > procedural language was the issue ("We don't need that level > of functionality -- just parse COMMANDS!") or the proposed > language... > > I'll revisit it, later (just for my own edification) but > won't chase that approach after this sort of "reception" :-/
What exactly is the environment for this command parser? Will it run as a program under an OS? As a command processor for the OS? At the same level as OpenFirmware? If it runs as an app, then any Forth for that OS is a nearly instant solution. You can just ignore the added capabilities. The memory foot print of any Forth I have used is very small compared to the OS. As to the comment, "We don't need that level > of functionality -- just parse COMMANDS!", Forth largely *is* a command parser. I can't explain why you can remember how to display an IP address in BASIC but not in Forth. That would seem to be a personal problem. How do you "remove an alias from NVRAMrc?" using BASIC? Is that really a program you can write in seconds and have work the first time? I'm not sure what that even means, lol. -- Rick
Hi Don,

On Sun, 26 Oct 2014 08:41:42 -0700, Don Y <this@is.not.me.com> wrote:

>On 10/26/2014 12:47 AM, George Neuner wrote: > >> What I was thinking about is less important given compile time parser >> generation ... we were talking about providing API object and entry >> point addresses to runtime generated code. >> >> The relevance (if any) depends on whether you want to allow grammar >> extensions direct access to your device API: i.e. whether the >> extensions should be allowed to manipulate your (part of the) device >> or whether they have to go through your provided language to do it. > >The goal is for people to be able to extend the "system" -- in whichever >directions THEY choose -- and provide a simple way of configuring those >enhancements (software/hardware/systems). It's a "low value" operation >and, as such, doesn't merit lots of resources. OTOH, it shouldn't >require "surgery" to tweek variables deep in the sources (these sorts >of "selections" should be very visible without detailed inspection).
I guess the real question is why *you* should be providing that? You've already given them a way to control your gizmo and they have the gizmo's API with which to construct a different command set if they want.
>I've considered it in the context of "settings" -- almost like twiddling >envars -- except it need not be. I.e., the "commands" could initiate >*actions* if so desired. > >I *hadn't* considered it as a "programming/scripting language" (which >is why I called it a "command parser" and presented command line >argument parsing and configuration file parsing as the examples to >illustrate its intent. > >There *could* be some value to a real procedural language, there. >But, when I conveyed the "Forth" suggestion to colleagues, it didn't >fare well (I suspect that had more to do with "Forth" than the >"programming language aspect" of the idea -- almost provincial!
Forth - and Lisp, too - has a bad rep in many circles ... which is unfortunate because sometimes it may be the simplest answer to the problem. From what I've read I don't think you're really needing a programming language. But then I'm still trying to figure out how sophisticated the commands are likely to be ... they don't look like much from your examples, but the discussion hints at more complexity.
>>> It's a tough call -- assume competence in EVERYTHING? Or, just >>> assume competence in their knowledge of their "extensions"?? >> >> It's reasonable to expect that many developers will have little >> experience with _formal_ parsing methods and tools ... quite a lot of >> applications can get by with RegEx or even just separator tokenization >> and have no need of any formal grammar. > >Exactly. That was my (upthread) observation of what I've encountered >in config file parsers, command line option parsing, etc. When you >consider how "rich" some of those environments are (options, etc.), >it's a wonder that so much ad hoc parsing is done! I can see >"evolution"/feeping creaturism explaining some of it (i.e., "it wasn't >this complex when we started") but I doubt that's the real cause. > >I suspect it's more one of familiarity: you write "generic code" >everyday. So, writing something to extract a numeric value from >the "second whitespace-delimited field" in a statement is trivial. >OTOH, writing a formal grammar so that a *tool* can do this for >you AUTOMATICALLY probably means you spend more time relearning the >tool than you would have spent writing the code!
I think it's more panic upon seeing the manual's introduction to the method theory. The vast majority of programmers today have little or no formal schooling, and language theory waters (doesn't matter syntactic or semantic) get deep very fast. As with most things, deep understanding isn't necessary to use the tools, but a programmer does need at least a passing familiarity with the tool's method to understand what it is trying to do and why it is failing. In this regard parser generators are less forgiving than the average compiler and it doesn't much matter which method(s) the tool uses - you can make as big a mess with LL(n) as with (LA|S)?LR(1) and quite easily hang yourself using PEG or GLR. Learning to use a parser generator isn't really any harder than learning any other programming language, but the documentation tends to be scarier. This isn't helped by the tendency of tool developers to minimally document and to relate their wares to existing tools that the user is presumed already to be familiar with.
><shrug>
Nothing to be done about it except try to convince people that it really isn't that hard and that, for most purposes, they really should be using a generator tool rather than rolling their own. The only really valid exceptions are very simple interface "languages" and very tiny systems 8-) that can't afford the overhead of a generated parser. The constant overhead of a generated parser is fairly large: with care using Bison you can shoehorn a fairly complex language into ~10KB ... but even the simplest VERB NOUN command parser is difficult to bring in under ~3KB. And Bison actually is pretty good at making small parsers - there are a few tools that are better, but many more are worse. However, I think even the parser size argument is losing weight as the average "small" system keeps getting bigger (32-bit ARM running Linux toasting bread, etc.). Moreover, many small system developers are comfortable using (some form of) RegEx to handle interfaces and probably most have no clue about its intrinsic overhead. George
The 2026 Embedded Online Conference