On 10/23/2014 9:36 AM, Stefan Reuther wrote:> Don Y wrote: >> On 10/22/2014 1:13 PM, Tim Wescott wrote: >>> If it's always something simple like >>> >>> <command> <argument> <argument> <argument> >>> >>> and the only thing that's changing is the command and (possibly) the >>> argument types (text, numerical, etc.) that are bound to it, then a >>> parser >>> should be pretty easy -- I wouldn't even bother with Lex and Yacc. >> >> The problem I'm stuck with is not knowing what is likely to be added. >> While I suspect most will be similarly trivial, I can also imagine >> some esoteric additions! > > You could try to bring every command into the <command> <argument>... > form with some clever syntax. I'm tempted to say "Lisp", but that > probably doesn't enjoy universal love :-) However, you can easily bring > if (foo) { bar(); baz(); } else { fred(); } > into "<command> <argument>..." form by making "{}" string delimiters. > This makes this a command 'if' with four parameters: an expression > '(foo)', a string for the 'if' command, an unquoted word 'else', and a > string for the 'else' command.Note the (contrived) example I posted (elsewhere this thread) attempts to do that -- by decomposing what might otherwise be "complex" (multivalued) commands into a series of trivial (single parameter) commands. E.g., easier to accept an 8 digit HEX value for an IP address (and another for a mask) than to accept four dotted decimal values (for each). But, I can't assume others will take a similar approach. Nor can I imagine the variety of things that they might need/want to address via this interface. (sigh)> I recently built a command shell like that, and I believe Tcl does it > the same way. So all you would need to give your users would be a > possiblity to register a command verb, and a way to evaluate strings as > command sequences.I suspect I will end up with a set of routines that parse specific types of tokens (number, hex number, word, etc.) that the DEVELOPER could call upon in fashioning an ad hoc parser. Perhaps I'm too jaded but I bet that's how additions will end up being implemented -- past observations suggest this. I think it's probably easier for folks to think in terms of: "OK, now I'll expect an integer in the range of XXX to YYY". Rather than let them naively do something like read the characters into a fixed size buffer (buffer overrun potential!), I can handle SAFELY scanning the next token AS AN INTEGER and report a value that they can just test (plus an error code if the token isn't an integer). That, at least, gives something that can be leveraged safely.
Parsers for extensible grammars?
Started by ●October 22, 2014
Reply by ●October 23, 20142014-10-23
Reply by ●October 23, 20142014-10-23
On 10/23/2014 11:45 AM, Tim Wescott wrote:> On Wed, 22 Oct 2014 22:54:13 -0700, Don Y wrote:>>>> Ideally, I want a solution that allows folks developing those "other" >>>> commands to just "bolt onto" what I have done. E.g., creating a >>>> single "grammar definition" (lex/yacc) is probably not a good way to >>>> go (IME, most small parsers tend to be ad hoc). >>>> >>>> [Note: I can't even guarantee that the extensions will be consistent >>>> or "harmonious" with the grammar that I implement]>> The problem I'm stuck with is not knowing what is likely to be added. >> While I suspect most will be similarly trivial, I can also imagine some >> esoteric additions! > > Who's going to be adding stuff, why, and why won't you have control over > the process?Other developers. To address the needs of their environments. Because its not my codebase.>> The "common" commands are trivial enough that, if I knew ALL would be of >> similar form, I would almost adopt an approach of sequentially >> attempting to parse a command with individual templates -- until one >> "fit". Extensions would just be tacked onto the end of a (short) series >> of conditionals: >> >> if (parse(string, template1)) >> command1(); >> else if (parse(string, template2)) >> command2(); >> else if ... > > I have an in-house parser that I use for debugging and service on a number > of projects. It's OO in C++, but it basically supports a syntax of > > <keyword> <anything else> > > You create an object and give it the keyword, which automagically > registers it with the first-layer parser. When someone types something in > the first-layer parser looks for the keyword, and if its found, hands it > to the object for further parsing. > > It's crude but effective. I keep telling myself that I'll make more > library-ish functionality for parsing the <anything else> bit, but so far > that's all ad-hoc.I actually think a viable solution might be as above. I.e., write something that parses for *just* one particular command. If no match, try invoking the thing that parses for the *next* particular command. Lather, rinse, repeat until a match is found (successful parse) *or* no more possibilities (FAIL). Again, this gets used with very low frequency so it doesn't have to be super efficient. With subroutines/functions that the developer can invoke (because I had to write them to parse the commands that *I* implemented), it could be just a few lines of code extracting parameters from syntactic sugar, etc. I *don't* want to intimidate a developer into having to embrace some more formal framework with which he may not be comfortable. That way lies bugs.
Reply by ●October 23, 20142014-10-23
Hi Don On 2014-10-22, Don Y <this@is.not.me.com> wrote:> > Ideally, I want a solution that allows folks developing those > "other" commands to just "bolt onto" what I have done. E.g., > creating a single "grammar definition" (lex/yacc) is probably > not a good way to go (IME, most small parsers tend to be ad hoc). > > [Note: I can't even guarantee that the extensions will be > consistent or "harmonious" with the grammar that I implement]To be honest I'm not clear on what that last part means. Adding new "commands" is relatively straightforward but new syntax patterns are surely impossible in the general case without revising the grammar directly. To clarify that anything you could express inside a C function is probably straightforward but a new language construct such as a new form of loop or a switch statement that accepts non-constant case labels is going to be tricky. Hell in the truly general case even basic stuff such as determining where a statement begins and ends is impossible if arbitrary modifications are permissible. I did write a parser for the former simpler case a year or two ago: it's easy enough approached from the right direction at the outset. The key isn't so much in the parser but the lexer. In the parser my (slightly simplified) YYSTYPE was something along the lines of: %union { struct { char *text; } string; struct { char *text; int64_t value; } number; struct { char *text; int (*function)(YYSTYPE *params); } command; } (If you're wondering why every option has a text argument it's because any token could be converted to the original input if a string is expected in context.) The lexer then proceeded more or less as normal but only a single pattern matching a generic potential command name i.e. a sequence of all alphabetic characters. The candidate command name was then given to a lookup function that did a binary search of a static array, but obviously how the lookup function works depends on the context. If no match the lexer returned the token as a string, if there was a match the function pointer is obtained from that array, along with a token code typing its parameters, e.g. a command with two numeric parameters, or a command with a numeric and two string parameters. That approach means that the "plumbing" to add a new command is simply an extra line in the array definition. Things get a little more complex with optional parameters but it remains practical provided the number of possible combinations remains smallish. The parameter code is what yylex() returns as its result so the parser knows what should follow. In my case yylex() was handwritten, but that was so it could directly access my app's internal representation, there's no reason the same approach couldn't be done in lex. If you wanted to completely generalise it I suppose you could have a syntax rule to convert any possible parameter token to a single "parameter" term and have a single rule for "command (array of parameters)" but I'd try and avoid that if possible - it moves checking from the parser to each individual command. Still, it's possibly something to have in reserve for oddball commands that don't fit a general pattern. -- Andrew Smallshaw andrews@sdf.lonestar.org
Reply by ●October 23, 20142014-10-23
Hi Don, On Thu, 23 Oct 2014 04:14:09 -0700, Don Y <this@is.not.me.com> wrote:>On 10/23/2014 1:45 AM, George Neuner wrote: > >> Or just parallel parsers. >> >> Don said "bolt onto" and "harmonious", but it isn't clear to me >> whether what he wants is to provide some kind of macro facility in >> which new commands are fully expressible in *his* language, or whether >> he wants to allow development of a parallel language that remains >> interoperable with his but also is separate from it. > >I hadn't considered the idea of expressing "new" commands in terms >of "old"/common commands.You and I have talked about this before: it doesn't really matter how simple the interface - the way to think about it is as if you are creating a scripting language in which "operators" are the control functions exposed by the device.>What I am looking for is the ability to define a set of commands >that will "always" be used/useful. Then, allow others to add >to that set of commands to address specific needs that are >necessary/unique to their environment. (i.e., the environment >in which these common commands are deployed) > >Contrived example: > > : > >I.e., the "arguments" can't be easily foreseen from the perspective >of the first/common commands.The problem is, that also means the complete set of *functionality* can't be foreseen ... else you would have provided for it already. Your "core" can be operated with it's provided command set, but you can't easily provide for new commands targeting an environment you know nothing about. You may want to look up our (ancient) exchange about runtime parser generators. You don't need that here, but the discussion about providing for JIT generated code to link to your device API is relevant.>> In either case, the additional parser should get the first crack at >> input with Don's own parser as a fallback. > >Hmmm... I had considered the opposite approach: let me verify the >input is/isn't a "common command" before exposing the input to the >"other" parser. The thought being: common commands are then >implicitly invariant (you can't change the syntax of them) AND >they "always work" (even if the additional parser has bugs in its >implementation).You don't want to "front end" the extensions ... if one of your commands is redefined, it won't work if you grab it first. George
Reply by ●October 24, 20142014-10-24
Don Y wrote:> Hi, > > [I probably should direct this at George... :> ] > > I'm writing a command parser. Some set of commands are "common". > But, other instances of the parser are augmented with additional > commands/syntax. > > [These additions are known at compile time, not "dynamic"] > > Ideally, I want a solution that allows folks developing those > "other" commands to just "bolt onto" what I have done. E.g., > creating a single "grammar definition" (lex/yacc) is probably > not a good way to go (IME, most small parsers tend to be ad hoc). > > [Note: I can't even guarantee that the extensions will be > consistent or "harmonious" with the grammar that I implement] > > A naive approach (?) might be for my code to take a crack at > a particular "statement"/command and, in case of FAIL, invoke > some bolt-on parser to give it a chance to make sense out of > the input. If *it* FAILs, then the input is invalid, etc. > > This sounds like an incredible kluge. Any better suggestions?If all else fails, strstr() keywords, then have a corresponding parsers for each keyword. A better approach is a container (struct) of keywords or arguments ( if it's not a keyword, it's an argument ) , an integer for the index of keywords in s string table, and an index for the position of the token within the command. Extensibility should not be difficult. -- Les Cargill
Reply by ●October 24, 20142014-10-24
Don Y wrote:> Hi, > > [I probably should direct this at George... :> ] > > I'm writing a command parser. Some set of commands are "common". > But, other instances of the parser are augmented with additional > commands/syntax. > > [These additions are known at compile time, not "dynamic"] > > Ideally, I want a solution that allows folks developing those > "other" commands to just "bolt onto" what I have done. E.g., > creating a single "grammar definition" (lex/yacc) is probably > not a good way to go (IME, most small parsers tend to be ad hoc). > > [Note: I can't even guarantee that the extensions will be > consistent or "harmonious" with the grammar that I implement] > > A naive approach (?) might be for my code to take a crack at > a particular "statement"/command and, in case of FAIL, invoke > some bolt-on parser to give it a chance to make sense out of > the input. If *it* FAILs, then the input is invalid, etc. > > This sounds like an incredible kluge. Any better suggestions?Hi Don, Have you looked at Forth? What you are describing sounds like the problem that Forth solved back in 1968. It is the sort of thing that Forth programmers have been doing for nearly 50 years and it works very successfully for us. See <http://www.mpeforth.com/> and <http://www.forth.com/> -- ******************************************************************** Paul E. Bennett IEng MIET.....<email://Paul_E.Bennett@topmail.co.uk> Forth based HIDECS Consultancy.............<http://www.hidecs.co.uk> Mob: +44 (0)7811-639972 Tel: +44 (0)1235-510979 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************
Reply by ●October 24, 20142014-10-24
On Thursday, October 23, 2014 12:54:03 AM UTC-5, Don Y wrote:> On 10/22/2014 1:13 PM, Tim Wescott wrote: > > On Wed, 22 Oct 2014 12:14:06 -0700, Don Y wrote: > > The problem I'm stuck with is not knowing what is likely to be added. > While I suspect most will be similarly trivial, I can also imagine > some esoteric additions! >Once upon a time, realized that if one uses distinct left and right operator precedence values one can include brackets as operators (high outside precedence and matching low inside precedence). Never figured out how to parse an operator that had simultaneous infix, prefix and suffix versions. The appeal is that the parser is table driven and that the table is simple and easy to amend. Am told that this approach has limitations and has been tried.
Reply by ●October 24, 20142014-10-24
Hi George, On 10/23/2014 1:49 PM, George Neuner wrote:> On Thu, 23 Oct 2014 04:14:09 -0700, Don Y <this@is.not.me.com> wrote:>> What I am looking for is the ability to define a set of commands >> that will "always" be used/useful. Then, allow others to add >> to that set of commands to address specific needs that are >> necessary/unique to their environment. (i.e., the environment >> in which these common commands are deployed) >> >> Contrived example: >> >> : >> >> I.e., the "arguments" can't be easily foreseen from the perspective >> of the first/common commands. > > The problem is, that also means the complete set of *functionality* > can't be foreseen ... else you would have provided for it already.Exactly. Because I can't anticipate the functionality that will be needed in the future. E.g., the contrived example.> Your "core" can be operated with it's provided command set, but you > can't easily provide for new commands targeting an environment you > know nothing about.Yes. And, I can't "guess" to the types of data/actions that may need to be "encoded" in that command set. Nor can I ensure the next guy will have the same approach to the problem -- half the commands may "feel like mine" while the other half "feel like his/hers". And, neither group "feels consistent" with the other. Like me using infix operators and he using RPN.> You may want to look up our (ancient) exchange about runtime parser > generators. You don't need that here, but the discussion about > providing for JIT generated code to link to your device API is > relevant.Can't. The (forced) "computer upgrade" cost me my mail archive (I don't leave mail stored on server :< )>>> In either case, the additional parser should get the first crack at >>> input with Don's own parser as a fallback. >> >> Hmmm... I had considered the opposite approach: let me verify the >> input is/isn't a "common command" before exposing the input to the >> "other" parser. The thought being: common commands are then >> implicitly invariant (you can't change the syntax of them) AND >> they "always work" (even if the additional parser has bugs in its >> implementation). > > You don't want to "front end" the extensions ... if one of your > commands is redefined, it won't work if you grab it first.That was exactly the point: that the common commands are *common* and always behave exactly the same. (as well as "always working" regardless of how much the other guy mucks with the interface). E.g., if "run" was a common command and he wanted to "overload" it to add/alter functionality, he'd have to create a new command ("walk_very_very_quickly") to bind to the actions he desires. Perhaps he wants to turn on a big red light prior to activating the mechanism ("run"-ing). If I let him redefine "run" to do this, I risk him forgetting to verify safety mechanisms are in place (which other users would ASSUME when invoking "run"). It's a tough call -- assume competence in EVERYTHING? Or, just assume competence in their knowledge of their "extensions"??
Reply by ●October 24, 20142014-10-24
Hi Paul, On 10/24/2014 1:56 AM, Paul E Bennett wrote:> Don Y wrote: > >> I'm writing a command parser. Some set of commands are "common". >> But, other instances of the parser are augmented with additional >> commands/syntax. >> >> [These additions are known at compile time, not "dynamic"]> Have you looked at Forth? What you are describing sounds like the problem > that Forth solved back in 1968. It is the sort of thing that Forth > programmers have been doing for nearly 50 years and it works very > successfully for us.I am leary of taking things in a different direction from what is (and has been) expected in the codebase. IIRC, someone did something like this as part of an early FreeBSD release (?). And, I think SPARCs have a Forth-based "Open Firmware". So, it's not a "revolutionary" concept that I'd have to expend effort defending... Admittedly, it lets you (the user) do lots of things that you wouldn't be able to even consider in, e.g., a conventional BIOS. (And, would be far more "future safe" from my point of view). I'll have to think real hard on that. If the "typical" sorts of things "look mildly familiar", then it might be possible to slip it in under the radar (without religious issues biasing the decision/acceptance). OTOH, if things start to look too funky, then it just gives people an excuse to dislike it. :-/ Thanks! --don
Reply by ●October 24, 20142014-10-24
On 10/24/2014 7:05 AM, jim.brakefield@ieee.org wrote:> On Thursday, October 23, 2014 12:54:03 AM UTC-5, Don Y wrote: >> On 10/22/2014 1:13 PM, Tim Wescott wrote: >>> On Wed, 22 Oct 2014 12:14:06 -0700, Don Y wrote: >> >> The problem I'm stuck with is not knowing what is likely to be added. >> While I suspect most will be similarly trivial, I can also imagine some >> esoteric additions! >> > Once upon a time, realized that if one uses distinct left and right operator > precedence values one can include brackets as operators (high outside > precedence and matching low inside precedence). Never figured out how to > parse an operator that had simultaneous infix, prefix and suffix versions. > The appeal is that the parser is table driven and that the table is simple > and easy to amend. Am told that this approach has limitations and has been > tried.The ideal would be to write a usage() for each command -- to be displayed when "help" is invoked. Then, algorithmically extract the parse information from the example syntax present in that usage() presentation. But, I think that's too lofty a goal. It would, by necessity, make the usage() very verbose -- to qualify all the restrictions on the tokens in the example syntax (unless the code was smart enough to recognize common constructs and document them elsewhere). "Small" tends to preclude much of that added smarts.







