EmbeddedRelated.com
Forums

Parser, again!

Started by jmariano December 14, 2013
On Sunday, December 15, 2013 9:18:39 PM UTC+2, Tom Gardner wrote:
> ... > I've even seen Them strip out all comments from a > well-documented library on the /principal/ that "comments > get out of sync with code" and "good code doesn't need > comments". Often valid, but not when the comments > describe the subtleties of /why/ the library is > implemented that way and /how/ to use it.
Oh come on, uncommented code is not worth the disk space it occupies. At least half the source text must be comments. On Sunday, December 15, 2013 10:47:24 PM UTC+2, Les Cargill wrote:
> ... > All programming paradigms uncover the latent "Spanish Inquisitor" > in people.
Hey, that is the wisest observation I have read in a long time! :D :D . Very true - yet people constantly try to create more and more complete phrase books instead of simply using the language into which the phrase book has been written (admittedly most people don't know what to do with the language and have it simpler with a few readily available sentences to repeat though...). Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On 15/12/13 20:47, Les Cargill wrote:
> Tom Gardner wrote: >> On 15/12/13 18:41, Les Cargill wrote: >>> Tom Gardner wrote: >>>> On 14/12/13 21:08, jmariano wrote: >>>>> Dear All, >>> <snip> >>>> >>>> If that is a possibility, it is probably much cleaner >>>> simpler, faster (speed and soon) to embed a Forth interpreter >>>> from the outset. >>>> >>> >>> Great idea, although it depends on how much trouble that is >>> to actually do in cases. >>> >>>> Yes, I know the XP/agile fraternity will frown on that. >>>> Tough; some of that brigade doesn't know their limits! >>>> >>> >>> I'd think Agile would *support* that, since it should be >>> easier to test. >> >> It tends to go against the holy commandments of "don't >> do big up-front design" to "do the simplest thing", because >> "you can always refactor it later". >> > > How is Forth anything *but* the simplest thing? My own > uncertainty is simply the actions necessary to install the > interpreter for a given environment - the various ones > linked to look pretty easy to use.
Very rational, which is quite irrelevant in the face of the fashionable religion du jour.
> This being said, if you just have a simple line-oriented > thing, a table-driven "parser" is pretty easy. I posted > some actual partial-code in the thread...
That's the rationale! If I've written it it must be easier/simpler/better than something I can't be bothered to understand. Me a cynic? Shurely shome mishtake.
> Forth is *extremely* hackerish, and therefore ( IMO ) > consonant with XP. Flexibility ot the goal, right?
NIH. A common complaint.
>> Too many XPers/agilistas, IMNSHO, treat XP/agile as a >> religion, i.e. something in which there are (12 IIRC) >> Commandments To Be Obeyed in order that the magic >> recipe works. >> > > We can't help that; what my takeaway from Agile/XP is > test-first, in cases pairs and otherwise eschewing big waterfall.
You clearly have applied common sense, and haven't read the religious texts nor derived religious texts written by wannabe acolytes or "XP trainers".
> All programming paradigms uncover the latent "Spanish Inquisitor" > in people.
Yup.
>> That's ridiculous, of course, since XP/agile is immensely >> valuable when deployed intelligently in appropriate >> circumstances. >> > Of course. I'd say I first used it in the late '80s, > although it wasn't called that. We got stuff done.
Yup. Earlier in my case!
>> I've even seen Them strip out all comments from a >> well-documented library on the /principal/ that "comments >> get out of sync with code" and "good code doesn't need >> comments". Often valid, > > Not if you have good shop culture to support comments. > reviews help, too - if you can manage them well.
Ah, but the religion dictates that the comments /always/ get out of sync with the code. The code is The One Executable Truth (fair enough), therefore comments (which will become wrong) are a heresy that should be eliminated in case they mislead. Remarkable, but I've heard it. The next sound was that of me going ballistic.
>> but not when the comments >> describe the subtleties of /why/ the library is >> implemented that way and /how/ to use it.
On 15/12/13 21:05, dp wrote:
> On Sunday, December 15, 2013 9:18:39 PM UTC+2, Tom Gardner wrote: >> ... >> I've even seen Them strip out all comments from a >> well-documented library on the /principal/ that "comments >> get out of sync with code" and "good code doesn't need >> comments". Often valid, but not when the comments >> describe the subtleties of /why/ the library is >> implemented that way and /how/ to use it. > > Oh come on, uncommented code is not worth the disk space > it occupies. At least half the source text must be comments.
Very rational, although I wouldn't specify a proportion since the religious acolytes will want to prove their devotion to the cause (and thereby be promoted) and will turn it into a commandment.
> On Sunday, December 15, 2013 10:47:24 PM UTC+2, Les Cargill wrote: >> ... >> All programming paradigms uncover the latent "Spanish Inquisitor" >> in people. > > Hey, that is the wisest observation I have read in a long time! > :D :D . > Very true - yet people constantly try to create more and more > complete phrase books instead of simply using the language into > which the phrase book has been written (admittedly most people > don't know what to do with the language and have it simpler > with a few readily available sentences to repeat though...).
There's a very interesting debate there. Is it better to have - a domain specific language, or - a domain specific library I prefer the latter, because of tool support, ease of hiring more staff and usually because of higher quality.
On Sunday, December 15, 2013 11:51:30 PM UTC+2, Tom Gardner wrote:
> On 15/12/13 21:05, dp wrote: > > On Sunday, December 15, 2013 9:18:39 PM UTC+2, Tom Gardner wrote: > >> ... > >> I've even seen Them strip out all comments from a > >> well-documented library on the /principal/ that "comments > >> get out of sync with code" and "good code doesn't need > >> comments". Often valid, but not when the comments > >> describe the subtleties of /why/ the library is > >> implemented that way and /how/ to use it. > > > > Oh come on, uncommented code is not worth the disk space > > it occupies. At least half the source text must be comments. > > Very rational, although I wouldn't specify a proportion > since the religious acolytes will want to prove their > devotion to the cause (and thereby be promoted) > and will turn it into a commandment.
I agree, one should be careful not to give ideas too easy to twist :D .
> > On Sunday, December 15, 2013 10:47:24 PM UTC+2, Les Cargill wrote: > >> ... > >> All programming paradigms uncover the latent "Spanish Inquisitor" > >> in people. > > > > Hey, that is the wisest observation I have read in a long time! > > :D :D . > > Very true - yet people constantly try to create more and more > > complete phrase books instead of simply using the language into > > which the phrase book has been written (admittedly most people > > don't know what to do with the language and have it simpler > > with a few readily available sentences to repeat though...). > > There's a very interesting debate there. Is it better to have > - a domain specific language, or > - a domain specific library > > I prefer the latter, because of tool support, ease of > hiring more staff and usually because of higher quality.
I think - if I understand you correctly - the library based on a language is indeed a lot more efficient. Humans can learn languages, translation from one into another language is generally possible (though it might take some extension to the destination language). I can't think of a decent novel written by someone using a language at phrasebook level. Obviously I am generalizing around your statement, probably because I have some strong feelings about languages (not suprisingly so since I use mostly VPA, "Virtual Processor Assembler" which is my creation on top of 68k assembler, having evolved for over a decade now). Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On Sun, 15 Dec 2013 19:25:29 +0200, Tauno Voipio
<tauno.voipio@notused.fi.invalid> wrote:

>On 15.12.13 17:34, upsidedown@downunder.com wrote: >> On Sun, 15 Dec 2013 15:58:55 +0200, Tauno Voipio >> <tauno.voipio@notused.fi.invalid> wrote: >> >>> On 15.12.13 02:10, Vladimir Vassilevsky wrote: >>>> On 12/14/2013 3:08 PM, jmariano wrote: >>>>> >>>>> The box is under the command of a PC, using RS232 or USB, in a >>>>> master-slave model, the PC being the master. I want to use a message >>>>> based command language, similar to SCPI, but not so complicated (no >>>>> tree structure). I was thinking in something like START, STOP SETADC >>>>> 1000, REAADC 1, etc. >>>> >>>> MODBUS protocol? >>>> >>>> >>>> Vladimir Vassilevsky >>>> DSP and Mixed Signal Designs >>>> www.abvolt.com >>> >>> Before he start on it - DO NOT! >>> >>> Modbus has been designed with far too many networking blunders. >> >> Modbus ASCII is OK (start character, end characters) as well as >> Modbus/TCP is OK (frame size in header) but Modbus RTU is not, since >> it depends on very critical timing (which is a no no for any PC based >> systems). >> >> While you might be able to work around the timing issues on RTU master >> (but not as a terminal emulator), things gets quite ugly on the slave >> side, especially in multidrop circuits. > >Agreed. The problem is in finding the framing in the binary >Modbus RTU format. The message boundaries should be findable >without parsing the whole message. This breaks the protocol >layering badly and complicates the data link layer code. >The addition of timing constraints to framing is poison >to sensible line drivers, e.g. a FIFO -buffered interface >cannot be used, as it destroys the inter-character timing >information. The same applies to delays caused by the other >tasks of the operating system. I have not yet met a Modbus >handler in a PC which did obey the timing constraints. > >Modbus/TCP is wrong because it uses TCP. The Modbus messages are, >by definition, datagram messages, and the only correct transport >in the TCP/IP suite is UDP. Using TCP ports the frame boundary >problem to the receiving program. Contrary to popular belief, >TCP does not preserve record boundaries. The only transport >guarantee is that all octets sent will arrive in the same order >as sent, but they may be packed to entirely different set of >TCP segments.
Finding the boundaries is no harder than with the ASCII version. And with UDP you have to build in a lot more error recovery.
jmariano <jmariano65@gmail.com> writes:
> I want to use a message based command language, similar to SCPI, but > not so complicated (no tree structure).
There are probably SCPI libraries you can download, rather than writing something yourself. This showed up as first hit in a web search for "SCPI library": https://github.com/j123b567/scpi-parser
> 1 - Regarding the language definition: Are there god examples of such > language that I can get inspiration from?
What about SCPI?
> 2 - Regarding the parser: Is it really a parser that I need... I just > don't what to read the full dragon book
You don't need anything as complicated as the types of parsers you find in the dragon book. If you really find you want a programmable extension language and if your microprocessor is relatively powerful, try Lua or maybe Guile. If it's a mid-sized microprocessor, Forth might be ok, though it's not for everyone. Any of these will bring you into a level of software hacking that you might not want to deal with given that you're (I'm guessing) mostly a hardware person.
On Sun, 15 Dec 2013 17:05:20 -0600, Robert Wessel
<robertwessel2@yahoo.com> wrote:

>On Sun, 15 Dec 2013 19:25:29 +0200, Tauno Voipio ><tauno.voipio@notused.fi.invalid> wrote: > >>On 15.12.13 17:34, upsidedown@downunder.com wrote: >>> On Sun, 15 Dec 2013 15:58:55 +0200, Tauno Voipio >>> <tauno.voipio@notused.fi.invalid> wrote: >>> >>>> On 15.12.13 02:10, Vladimir Vassilevsky wrote: >>>>> On 12/14/2013 3:08 PM, jmariano wrote: >>>>>> >>>>>> The box is under the command of a PC, using RS232 or USB, in a >>>>>> master-slave model, the PC being the master. I want to use a message >>>>>> based command language, similar to SCPI, but not so complicated (no >>>>>> tree structure). I was thinking in something like START, STOP SETADC >>>>>> 1000, REAADC 1, etc. >>>>> >>>>> MODBUS protocol? >>>>> >>>>> >>>>> Vladimir Vassilevsky >>>>> DSP and Mixed Signal Designs >>>>> www.abvolt.com >>>> >>>> Before he start on it - DO NOT! >>>> >>>> Modbus has been designed with far too many networking blunders. >>> >>> Modbus ASCII is OK (start character, end characters) as well as >>> Modbus/TCP is OK (frame size in header) but Modbus RTU is not, since >>> it depends on very critical timing (which is a no no for any PC based >>> systems). >>> >>> While you might be able to work around the timing issues on RTU master >>> (but not as a terminal emulator), things gets quite ugly on the slave >>> side, especially in multidrop circuits. >> >>Agreed. The problem is in finding the framing in the binary >>Modbus RTU format. The message boundaries should be findable >>without parsing the whole message. This breaks the protocol >>layering badly and complicates the data link layer code. >>The addition of timing constraints to framing is poison >>to sensible line drivers, e.g. a FIFO -buffered interface >>cannot be used, as it destroys the inter-character timing >>information. The same applies to delays caused by the other >>tasks of the operating system. I have not yet met a Modbus >>handler in a PC which did obey the timing constraints.
The reason is that PCs are typically used as Modbus masters, in which the timing is not so critical, as larger than minimum delays are used. The same applies for a point-to-point slave. In multidrop slaves, the only critical event is when an other slave has sent a responded and the master send the next request to you. At this event, you must know where the request frame starts after the 3.5 character time gap.
>>Modbus/TCP is wrong because it uses TCP. The Modbus messages are, >>by definition, datagram messages, and the only correct transport >>in the TCP/IP suite is UDP.
Both TCP and UDP should have been specified. In fact many vendors implement Modbus/UDP. Modbus/TCP is nice in WANs, we have even run Modbus/TCP connection using GPRS across the continent. Modbus/UDP is the natural choice for a small LAN.
>>Using TCP ports the frame boundary >>problem to the receiving program.
There are hundreds of implement ions that simply prefixes a 1 or 2 byte count field in front of an existing message. These seem to work quite well, regardless this trivial structure. As long as the receiver flushed any characters received during a previous connect, a new connect should always start from the beginning of the frame and the frame synchronization should always be maintained. Modbus/TCP has a fixed 6 byte header, containing the remaining byte count (2 bytes). Unfortunately, the protocol-ID is 0000h, a better choice might be some less likely value as 1234h.
>>Contrary to popular belief, >>TCP does not preserve record boundaries. The only transport >>guarantee is that all octets sent will arrive in the same order >>as sent, but they may be packed to entirely different set of >>TCP segments.
For this reason, most simple TCP based protocols use the byte count to split the stream into frames.
>Finding the boundaries is no harder than with the ASCII version.
The Modbus/Ascii is the simplest, just skip everything, until you get the colon. In a TCP base protocol, if you loose synch, unless you have a robust preamble, the only practical way is to disconnect and then start over with a new connect.
>And >with UDP you have to build in a lot more error recovery.
The error recovery is practically the same if serial, raw Ethernet (MAC) or UDP frames are used. The master shoots out the request and if no (valid) response is received within a specified time, just repeat the request one or more times. In a TCP/IP based protocol, you must also monitor the state of the connection (how? Keep-alive-time is usually faaar too long) and if the link failed you must try to re-establish a connection. If the response does not arrive in time (communication failure/slave busy), what to do ? Resend the request on the same TCP connection, or should the old connection be (orderly?) shot down and a new connection created and the request sent over this connection. Since the TCP transmitting side will try to deliver the message for a long time by buffering the requests, even if they have been obsoleted by the upper level protocol. Think about the situation that a heavy machine fails and someone goes into the controlled heavy machine to study the problem. An other person without visual contact with the heavy equipment, notes "Hey, this Ethernet cable has been disconnected from the switch, I put it back", all the commands in the master TCP Tx queue are sent at once and the slave executes all these queued commands, starting the heavy machine, injuring the person within it. Thus, you really have to be careful with the TCP Tx queue to inhibit backlog as well as detecting obsolete commands on the slave side. With serial or UDP, the sent and lost requests are not being queued.
Dear All,

Thank you very much for your valuable input. I lost myself a little on the XP/Agile debate of Tom, Les and Dimiter [:-)] but as for the rest this was a first class lesson!

To make things a little more clear, my gizmo behaves more like an programmable instrument, an oscilloscope or a multimeter, than a PLC. It measures ph, conductivity and temp and was to activate relays (pumps). For those who like chemistry, this is a titration experiment and my device is an automatic titrator. The Pc and the device will talk to each other by RS232 on a point-to-point connection, the PC being the master, asking all the questions and setting values. I need a readable language to make things more easy to debug and test (with a terminal) and to give to the users the possibility of making their own programs. 

Actually I have done this before, the syntax and the decoding routines, 3 or 4 years ago, using brute force. I was looking for a more formal way to approach the problem.

What I could understand from your contributions is:
- For my application, using Forth or Lisp is a little like using a sledgehammer to crack a nut. I don't think will need that. A simple ascii protocol will do the job. Thanks for the suggestion anyway. 
- Modbus is not very human-readable but is frame format was interesting features that I can use. With this and the input of Dimiter, Mel, Robert, upsid.. and Y, I'll be able to come up with a syntax. I finally understood that the commas and spaces and stuff are used for the benefit of humans, not machines. The only important things for the machine is the existence of SOM or EOM or both. I think I'll go for the :xxxx>CRLF> format.
- The parsing routine is not difficult but there is no generally accepted algorithm (Paul). Thanks for the ideas Les and Y.   
- Paul, last time I had this problem was with a research nmr machine I build. At that time I spend a lot of time searching the wed for free SCPI libraries without success. This was 4 years ago. This time a just din't search! Thanks for the tip.   


Regards

Mariano
jmariano <jmariano65@gmail.com> writes:
> - For my application, using Forth or Lisp is a little like using a > sledgehammer to crack a nut. I don't think will need that. A simple > ascii protocol will do the job. Thanks for the suggestion anyway. ...
One thing that should have been asked: what is the target processor? Why are you even writing the code in C? Yes, using something like Lisp (or these days Python) might be "overkill", but just in the sense that it's overkill to drive a 2000 pound car to pick up a quart of milk from a store 2 km away, especially if the "car" burns no fuel. If it's a small MCU then you kind of have to use C, but if it's something like a Raspberry Pi then you can use almost anything. Simplest practice with such a system might be to use XML or Json, depending. The parsing and encoding libraries are already there.
On Monday, December 16, 2013 10:55:00 PM UTC+2, jmariano wrote:
> ... > To make things a little more clear, my gizmo behaves more like an > programmable instrument, an oscilloscope or a multimeter, than a PLC. > It measures ph, conductivity and temp and was to activate relays (pumps). > For those who like chemistry, this is a titration experiment and my > device is an automatic titrator.
A few years ago I designed for a guy a PH & conductivity meter, ensuring the conductivity meter did not interfere with the PH electrode was why this came my way. But I only did the analog front end which delivered the two analog values, that's where help was needed.
> The only important things for the machine is the existence of SOM or > EOM or both. I think I'll go for the :xxxx>CRLF> format.
Since your application will likely involve long times from host command to device knowing it has succeeded (pumps staying active etc.) you might want to do something similar to the command-ack-outcome_notification sequence in my protocol which I posted earlier. [ the text is at http://tgi-sci.com/misc/sdvctl.txt ] Makes life a lot easier; host issues a command, gets immediately an ACK reply which also carries information about how long (max) will it take for the command to finish; upon finishing, the device notifies the host how it worked out. This can take seconds if not minutes so in the meantime the host can keep on issuing other commands to the device, like read status, various measured values, make some light blink etc. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/