Forums

Ftn I/Os documentation best practices

Started by Don Y June 26, 2022
I add a boilerplate to each function definition that
declares constraints on inputs, expectations of outputs,
performance issues, etc.  I use this to add invariants
to the code to detect/enforce these conditions.

But, there is nothing that ensures that I've done
this -- other than discipline.

I'm looking at ways to create an IDL that will allow
for more specific criteria to be included in the
declaration that could also drive the IDL compiler
to add suitable invariants as applicable.

[This makes RPC much more effective but can also
benefit traditional ftn invocations]

Any pointers to similar schemes?  I've been looking
through CORBA et al. for hints but they seem to
focus on bigger machines (where there is more tolerance
over data types and more overhead expected).
On 26/06/2022 21:35, Don Y wrote:
> I add a boilerplate to each function definition that > declares constraints on inputs, expectations of outputs, > performance issues, etc.  I use this to add invariants > to the code to detect/enforce these conditions. > > But, there is nothing that ensures that I've done > this -- other than discipline. > > I'm looking at ways to create an IDL that will allow > for more specific criteria to be included in the > declaration that could also drive the IDL compiler > to add suitable invariants as applicable. > > [This makes RPC much more effective but can also > benefit traditional ftn invocations] > > Any pointers to similar schemes?  I've been looking > through CORBA et al. for hints but they seem to > focus on bigger machines (where there is more tolerance > over data types and more overhead expected).
What programming language are you using? If your answer is "C", it's wrong. If you are just putting these things in comments, then they will get out of sync with the code. The best you can do is writing something like a Python script that will read the C code and check for the pattern of comments. If you want something really useful, you need a programming language that will let you write the contracts in the language itself - then they can be checked and enforced. Ada, D, and Scala are examples. C++ has a Boost.Contracts library, and language support for contracts is due in C++23 (last I heard - but it might be delayed again).
On 2022-06-27, David Brown <david.brown@hesbynett.no> wrote:
> On 26/06/2022 21:35, Don Y wrote: >> I add a boilerplate to each function definition that >> declares constraints on inputs, expectations of outputs, >> performance issues, etc.
> What programming language are you using? If your answer is "C", > it's wrong. > > If you are just putting these things in comments, then they will get out > of sync with the code.
I'd have to agree. I've worked with many projects and third-party libraries over the decades which had a big template of comments for every function which described the input/ouput parameters, return value, global variables used, and so on. Often these templates generated documents by using something like Doxygen. And on _every_single_one_ of those projects and libraries, the comments were wrong often enough that nobody who knew which way was up paid any attention to them. If you wanted to know what the parameters were for, what the function returned, and so on, you read the C code. A lot of the time, even the numbers and names of the parmeters described in the template didn't match the code. The auto-generated PDF documents and HTML web site looked nice, though. -- Grant
On 27/06/2022 17:18, Grant Edwards wrote:
> On 2022-06-27, David Brown <david.brown@hesbynett.no> wrote: >> On 26/06/2022 21:35, Don Y wrote: >>> I add a boilerplate to each function definition that >>> declares constraints on inputs, expectations of outputs, >>> performance issues, etc. > >> What programming language are you using? If your answer is "C", >> it's wrong. >> >> If you are just putting these things in comments, then they will get out >> of sync with the code. > > I'd have to agree. I've worked with many projects and third-party > libraries over the decades which had a big template of comments for > every function which described the input/ouput parameters, return > value, global variables used, and so on. > > Often these templates generated documents by using something like > Doxygen. > > And on _every_single_one_ of those projects and libraries, the > comments were wrong often enough that nobody who knew which way was up > paid any attention to them. If you wanted to know what the parameters > were for, what the function returned, and so on, you read the C code. > > A lot of the time, even the numbers and names of the parmeters > described in the template didn't match the code. > > The auto-generated PDF documents and HTML web site looked nice, though. >
Accuracy of such in-code documentation varies, but there is generally no way to check it automatically. That's one of the reasons it is better to use constructs in the programming language, where possible, rather than documentation and comments. For preconditions, postconditions and invariants, you need a language that has support for contracts. For other languages, usually the best you can do is careful choice of names and types, along with assert statements. Still, Doxygen-like comments in code are usually better synchronised with the code than external documentation!
On 6/27/2022 8:18 AM, Grant Edwards wrote:
> On 2022-06-27, David Brown <david.brown@hesbynett.no> wrote: >> On 26/06/2022 21:35, Don Y wrote: >>> I add a boilerplate to each function definition that >>> declares constraints on inputs, expectations of outputs, >>> performance issues, etc. > >> What programming language are you using? If your answer is "C", >> it's wrong. >> >> If you are just putting these things in comments, then they will get out >> of sync with the code. > > I'd have to agree. I've worked with many projects and third-party > libraries over the decades which had a big template of comments for > every function which described the input/ouput parameters, return > value, global variables used, and so on.
You perhaps missed the balance of my post: "I use this to add invariants to the code to detect/enforce these conditions." ... "I'm looking at ways to create an IDL that will allow for more specific criteria to be included in the declaration that could also drive the IDL compiler to add suitable invariants as applicable." I.e., a "specification language" FROM WHICH the IDL compiler can (I am currently using an enhanced form of OCL) create the code -- in whatever language binding is selected AT COMPILE TIME. So, if I say: month > 0 AND month < 13 as constraints *in* the function's "prototype", then the IDL compiler generates an invariant that throws a "range error" OR panics (depending on IDL compiler switch) AT RUN TIME if the function is invoked with the "month" parameter not compliant with those constraints. The OCL *documents* the calling constraints of the function (and its return values) in a language neutral manner. I.e., you could create an ASM binding for the IDL compiler's output and the programmer would be none the wiser. The advantage of driving the code generator this way is the "documentation" creates the code -- if you don't *document* (declare) a constraint, then it isn't enforced. It ensures the code and documentation agree and that every bit of documentation has a corresponding bit of code (but not necessarily the other way around)
> Often these templates generated documents by using something like > Doxygen. > > And on _every_single_one_ of those projects and libraries, the > comments were wrong often enough that nobody who knew which way was up > paid any attention to them. If you wanted to know what the parameters > were for, what the function returned, and so on, you read the C code.
You *always* read the code. The OCL declarations *are* effectively code; the stub generated *will* reference "month" and not "moth" or "monday" (or whatever). But, they are formally expressed in a syntax defined by the "specification language" (~OCL in my case). Invoking the exemplar with a month of "13" could possibly work within the body of the function, as implemented -- perhaps treating this as year++ with month=1 -- but the invariant won't let the value *into* the function. Because the intent was *not* to invoke the function with a bogus month value. 19A0 is not 2000! The whole point is to encourage the developer to codify (in OCL) the constraints on the code so that the IDL compiler can create the actual instruction sequence (in the language bound to that set of command line switches) to enforce those constraints. *But*, you are still reliant on discipline; if the developer doesn't declare those constraints, then the compiler can't create any code to do this and simply is resigned to creating the code to marshal arguments and pack the message for transport. One can casually inspect the IDL files to see if there is an abundance -- or a dearth -- of constraints without having to parse countless source files. The IDL files *generate* the "header" files so you can't skip that step. Additionally, it can generate the sever side stubs (in whichever language binding is appropriate *there*) to unpack and parse the message, convert the arguments to whatever format is "native" for the server (knowing that their values are "legitimized" by the client-side stub) and hand them off to the server-side function. [similarly handling the return message]
> A lot of the time, even the numbers and names of the parmeters > described in the template didn't match the code. > > The auto-generated PDF documents and HTML web site looked nice, though.
There's no point in generating "prose" from such a specification. What are you going to do, pretty-print the generated stubs? Or, the OCL-expressed constraints?
On 27 Jun 2022 at 17:18:07 CEST, "Grant Edwards" <invalid@invalid.invalid>
wrote:

>> If you are just putting these things in comments, then they will get out >> of sync with the code. > > I'd have to agree. I've worked with many projects and third-party > libraries over the decades which had a big template of comments for > every function which described the input/ouput parameters, return > value, global variables used, and so on. > > Often these templates generated documents by using something like > Doxygen.
For the last 20 years or so, virtually all our manuals have been created by our own "literate programming" system called DocGen. DocGen is optimised for Forth, but it would not be a big job to write a version for C. DocGen diverges from Doxygen and friends in a several ways. In particular it does not need template blocks. If your C code is so bad that another programmer cannot read the declaration, you need far more help than DocGen or Doxgen can give you. The main entry for a function follows the declaration float someFunc( int how, double x, double y ) // *G The purpose of *\c{someFunc} is ... // ** ... { ... } The lines starting // *x are formal comments to be processed by DocGen. The *X parts are formatting commands, and the *\<name>{} parts are text macros. The ideas behind DocGen are that the code and the documentation are never separated, and that the DocGen portion is not much larger than the descriptive comments you should have in your code anyway. Keeping the code in sync with the documentation is a matter of company culture and management. Whenever we receive third party code to include in our products, we *always* DocGen it before release and we *always* find some bugs. Overall, I estimate that writing the documentation alongside the code costs about 10% extra, paid for by the reduction in bug level. Stephen -- Stephen Pelc, stephen@vfxforth.com MicroProcessor Engineering, Ltd. - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, +44 (0)78 0390 3612, +34 649 662 974 http://www.mpeforth.com - free VFX Forth downloads
On 27/06/2022 23:34, Don Y wrote:
> On 6/27/2022 8:18 AM, Grant Edwards wrote:
>> >> The auto-generated PDF documents and HTML web site looked nice, though. > > There's no point in generating "prose" from such a specification. > What are you going to do, pretty-print the generated stubs?&nbsp; Or, > the OCL-expressed constraints?
That is /exactly/ what you do with tools like Doxygen - it extracts /interface/ information (function prototypes, type declarations, etc.), strips it of implementation-specific details, merges the comments (which should hopefully be in sync with the code), and generates clear, readable, searchable, cross-referenced documentation. You use tools like that precisely so that people using your library or code do /not/ read the C code. You don't even have to read the header files. And if you are formalising your prototypes with some kind of interface description language to include preconditions, postconditions and invariants, then you want them included in the generated documentation. Ideally, that's what people will read, rather than the IDL source code or the generated C headers. The key point of separation of interfaces and implementations is that people using the code should /only/ use the documented interfaces, and not rely on anything involved in the implementation. So make the information about those interfaces clear and precise - such as good quality generated documentation - and make it accurate - such as by using an IDL.
On 6/28/2022 1:30 AM, Stephen Pelc wrote:
> On 27 Jun 2022 at 17:18:07 CEST, "Grant Edwards" <invalid@invalid.invalid> > wrote: > >>> If you are just putting these things in comments, then they will get out >>> of sync with the code. >> >> I'd have to agree. I've worked with many projects and third-party >> libraries over the decades which had a big template of comments for >> every function which described the input/ouput parameters, return >> value, global variables used, and so on. >> >> Often these templates generated documents by using something like >> Doxygen. > > For the last 20 years or so, virtually all our manuals have been created > by our own "literate programming" system called DocGen. DocGen is > optimised for Forth, but it would not be a big job to write a version for C. > > DocGen diverges from Doxygen and friends in a several ways. In > particular it does not need template blocks. If your C code is so bad > that another programmer cannot read the declaration, you need far > more help than DocGen or Doxgen can give you. The main entry > for a function follows the declaration > > float someFunc( int how, double x, double y ) > // *G The purpose of *\c{someFunc} is ... > // ** ... > { > ... > } > > The lines starting // *x are formal comments to be processed by > DocGen. The *X parts are formatting commands, and the *\<name>{} > parts are text macros. > > The ideas behind DocGen are that the code and the documentation > are never separated, and that the DocGen portion is not much larger > than the descriptive comments you should have in your code anyway. > Keeping the code in sync with the documentation is a matter of > company culture and management. > > Whenever we receive third party code to include in our products, > we *always* DocGen it before release and we *always* find some > bugs. Overall, I estimate that writing the documentation alongside > the code costs about 10% extra, paid for by the reduction in bug level.
I do this by using a specific "paragraph tag" in FrameMaker documents (e.g., "Code") and then have a simple utility that extracts all thusly tagged paragraphs to create the "source file" -- which is then compiled <however>. [FM files are relatively easy to parse and the format has been consistent for many releases; I wouldn't think of this sort of approach with MSWord acting as "container"!] It adds an extra step to the process (because the source doesn't exist until extracted from the document). But, it is ill-suited to producing "manuals" as the presentation must be linear with the code; you can't tangle/weave to arrange the code in a different order than the documentation. OTOH, it is excellent for mixing multimedia with "code"; I can put an illustration between "if" and "then". Or, a sound snipet to indicate what a particular (audio) waveform -- expressed as an array of floats -- *sounds* like adjacent to those constants. This is particularly helpful with domain-specific constructs, mechanisms and phenomena with which a generic programmer might not have prior experience. I document the "rationale" and "strategy" behind the code, elsewhere. That can take the "30,000 ft view" of the code and usually needs infrequent maintenance. E.g., why was Q12.4 format chosen? Show me the error analysis behind that choice relative to other formats. Keeping modules short and supporting other non-text annotations makes it relatively easy for folks to understand the specifics of an implementation. But, all of these techniques (yours included) rely on discipline. There's nothing that mechanically verifies the code and comments agree. Even semi-automatic mechanisms rely on the developer having *created* them (e.g., #including an audio file that was generated by extracting those floats and converting them to audio). Too often, the "solution" is simply to remove comments rather than ensuring they are maintained. Sadly, my experience has been that folks aren't keen on keeping docs and code in sync and the more documentation, the less it tends to track the code. Especially for projects that "evolved" instead of being "designed". (each refactor requiring a substantial reframing of the commentary)
On 28 Jun 2022 at 14:49:41 CEST, "Don Y" <blockedofcourse@foo.invalid> wrote:
>> The ideas behind DocGen are that the code and the documentation >> are never separated, and that the DocGen portion is not much larger >> than the descriptive comments you should have in your code anyway. >> Keeping the code in sync with the documentation is a matter of >> company culture and management.
> Sadly, my experience has been that folks aren't keen on keeping > docs and code in sync and the more documentation, the less it > tends to track the code. Especially for projects that "evolved" > instead of being "designed". (each refactor requiring a substantial > reframing of the commentary)
As others have said it needs discipline. Discipline comes from management. As the boss, I have made it quite clear that use of DocGen is a requirement to work at the company. In turn it is my job to ensure that people know how to use the tool. Stephen -- Stephen Pelc, stephen@vfxforth.com MicroProcessor Engineering, Ltd. - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, +44 (0)78 0390 3612, +34 649 662 974 http://www.mpeforth.com - free VFX Forth downloads
On 6/28/2022 7:35 AM, Stephen Pelc wrote:
> On 28 Jun 2022 at 14:49:41 CEST, "Don Y" <blockedofcourse@foo.invalid> wrote: >>> The ideas behind DocGen are that the code and the documentation >>> are never separated, and that the DocGen portion is not much larger >>> than the descriptive comments you should have in your code anyway. >>> Keeping the code in sync with the documentation is a matter of >>> company culture and management. > >> Sadly, my experience has been that folks aren't keen on keeping >> docs and code in sync and the more documentation, the less it >> tends to track the code. Especially for projects that "evolved" >> instead of being "designed". (each refactor requiring a substantial >> reframing of the commentary) > > As others have said it needs discipline. Discipline comes from > management. As the boss, I have made it quite clear that use > of DocGen is a requirement to work at the company. In turn > it is my job to ensure that people know how to use the tool.
You can "legislate" the use of a tool or adherence to a standard. But, these are subjective issues -- not like "derate all caps by 40%" (which can be independently, mathematically verified). You rely on individual "employees" for their judgement as to the effectiveness of their documentation. Likewise, the efficacy of their test/validation efforts. EVERY employer and client I've ever worked with has had formal standards regarding code "style", documentation, testing, etc. "The Boss" in these cases have ranged from accountants, to mechanical engineers, to electrical engineers ("no longer practicing"), to economists. I.e., they can mandate but aren't qualified to evaluate the quality of the work performed. You can have peers review each others' work. But, I've not seen that improve the work of folks who just don't have the drive to "do better". (And I can't remember anyone EVER being fired for incompetence!) The true test of this is handing the design to another party (i.e., SELLING the design) and seeing how well the new owner can come up to speed on the product. If you have staff available "later" that can be consulted wrt their previous work on a design, then folks need not completely rely on print documentation.