Hi George,
[small earthquake ~100 miles east last night. Pretty lame, I
imagine, as earthquakes go. But, a first for me! Cool!]
On 6/27/2014 10:10 AM, George Neuner wrote:
>> What annoys me about most FOSS is that most don't treat their
>> "output" as a genuine (supportable) *product*.
>
> This shouldn't surprise you: it is traditional for hackers to release
> (what they think are) "useful" programs into the wild and then forget
> about them and move on to something else. If you're not making money
> from the project, apart from personal pride there's little incentive
> to keep supporting it.
Note that I didn't say "keep supporting it" -- I said produce a
supportable product! What they release isn't a product but,
rather, a smattering of code that did what *they* wanted it to
do (perhaps) -- for the effort they were willing to invest.
It's *not* something they are "proud" of but, rather, something
they are opting to "publish" instead of just "discard" -- in the
hope that someone else will wade through their *code* (not
documentation!) and try to figure out what they were trying to
do (probably "on a shoestring" instead of "Right") and, *possibly*,
be INSPIRED and WILLING to take it a step further (possibly *back*
to the direction it should have taken in the first place!).
For one of the TTS algorithms I'm currently implementing, I include:
- a "document" presenting the subset of the IPA appropriate for
(nominal) English. In addition to introducing the glyphs used
to represent each sound and "sample words" containing those sounds,
it contains actual sound recordings *of* those sounds *in* those
word samples -- because different region accents lead folks to
"hearing" (what's the aural version of "visualizing?) those PRINTED
examples differently. It also explains terms like front/back vowel,
palatalization, etc. The point behind the document is to bring
developers who may not be familiar with this application domain
"up to speed" so further discussions don't have to be trivialized.
- the original "research" defining the algorithm. This lets those
developers double-check *my* interpretation of the research by
returning to the unaltered source.
- my *explanation* of that research along with the errors I've uncovered
in that presentation (and implementation). Again, having the source
available allows others to correct any mistakes *I* may have made
and/or reinterpret the original material -- possibly in a different
(better?) light.
- a description of my *implementation* of the algorithm as this is
significantly different than the original source. Much of these
differences are attributable to the "non research" nature of my
implementation (e.g., I don't have a big disk to store rules;
a "mainframe" to execute them in some sluggish language; etc.).
I also discuss the enhancements I've made to the algorithm, any
micro-optimizations (e.g., rule reorderings), and how I've adapted
the rules to the sorts of input *I* expect to encounter. E.g., I
am less concerned that "aardvark" is pronounced correctly than I
am about "gigabytes". This allows developers wanting to cherry-pick
just this component of my system *out* for reuse in some other
system -- with different vocabulary requirements!
- the test suite that I used to evaluate the quality of the algorithm
along with "rationalizations" as to why each "failure" is accepted
instead of "fixed" (some fixes would require adding rules that would
break *other* things -- English is full of exceptions!). It also
provides statistics that tell me how often each rule is encountered
when processing the sample input (allows me to adjust the algorithm's
efficiency). And, lets me "watch" to see how a particular "word"
is processed (to help formulate better rules).
- a tool that converts for the human readable form of the rules to
the machine readable encoding. This allows a developer to deal with
the normalized representation (i.e., something that a linguist could
understand and assist) while freeing the implementation from its
particular form.
- a tool to build the tables that the algorithm uses from that input.
This reduces the probability of transcription errors between the
(hundreds of) cryptic rules and their representation for the
implementation (efficiency instead of readability).
- a tool to port changes in that input back into the documentation
(so the documentation can be kept up to date without transcription
errors creeping in, there!)
- a tool to evaluate *your* "sample text" for it's accuracy, tabulate
relative frequencies of occurrence, etc.
Those tools are written in the same language as the application that
"consumes" that data object (table). So, a future developer wanting
to change the run-time implementation of the algorithm need not learn
some other language to "alter" the table generated from the chart!
As the algorithm inherently processes a bunch of lists (lists of
affixes, list of letter sequences, lists of phonemes, etc.), I wrote
the preliminary "proof-of-concept" converters in Scheme and, later,
Limbo. This let me explore encoding options for the resulting
tables more easily as compile-time performance wasn't important!
Once the form of the table was settled, I rewrote this in C to mimic
the code that was *consuming* that table.
Now, imagine I had left the Scheme versions of these tools in the
final build environment (LESS work for me as I wouldn't have to
then create the C versions). Now, the future developer has to have
Scheme available; the version that he has available must be compatible
with any specific features/extensions/etc. upon which I relied; he
needs to *know* Scheme in order to be able to understand what that
algorithm is doing to sufficient degree to be able to *change* it AND
he has to be familiar with whatever development/debug/test environment
I've opted to employ in my maintenance of *that* tool!
I would end up raising the bar too high -- and, effectively forcing
developers to stick with *my* encoding scheme because it's the path
of least resistance (vs. having to drag in that other dependency).
This, then, discourages them from altering the run-time algorithms
that consume that data, etc. I.e., I've made it too tedious for them
to alter/improve upon the code. I've *tied* them to my implementation.
[When I was working on the Klatt synthesizer, this sort of thing
was very evident. The eyes that had poked at it previously were
much to tentative about attacking gross inefficiencies in the
code and/or restructuring it for fear of breaking something that
they didn't understand -- or didn't WANT to understand -- well.]
>> Yeah, I know... documentation and testing are "no fun". But,
>> presumably, you *want* people to use your "product" (even if it
>> is FREE) so wouldn't you want to facilitate that? I'm pretty
>> sure folks don't want to throw lots of time and effort into something
>> only to see it *not* used!
>
> That's a little harsh. I don't think it's fair in general to expect
> the same level of professionalism from part time developers as from
> full time.
But these same folks explain away the lack of testing and documentation
in their "professional" efforts by blaming their PHB! As if to say,
"Yeah, I know. I really wish I could do the formal testing and
documentation that I, AS A PROFESSIONAL, know is representative of
my PRODUCT... but, my boss just never gives me the time to do so...".
There's no one forcing you to release YOUR CODE, now. Or, forcing
you to *stop* working on a test suite, documentation, etc. AFTER a
"beta" release. Yet, despite the opportunities to finish it up
properly (the way you suggest you WISH you could to do in your 9-to-5),
you, instead, lose interest and hope someone else cleans up the mess
you've left.
[This isn't true of all FOSS projects but is true of probably *most*!]
> Lots of people who can find and download a program online aren't
> sophisticated enough to know whether a problem with that program is a
> bug or if they screwed up somehow in trying to use it. If you browse
> support forums (for projects that *are* supported), it quickly becomes
> apparent that many reported problems come from trying to use the
> software in ways for which it wasn't designed or in environments under
> which it hasn't been tested.
*Which* (one?) environments has it been "tested" in? What were the
"tests"? How can I even begin to come up with a set of tests for
*my* environment if I can't see what *you* did in yours? What is
the product *supposed* to do (not in flowery language but in
measurable specifics)? How can I know if it's working if I don't
know what to test against?
Yes, most "software consumers" just want to know "which button do I
press". They don't want to understand the product.
But, they would be just as happy with a CLOSED, PROPRIETARY solution
released as a "free" binary! (I.e., they just don't want to "spend
money"!) The whole point of FOSS is to enable others to modify and
improve upon the initial work. One would think you would want to
facilitate this!
> Sans a formal verification effort, it's almost impossible to guarantee
> that a project of any size is bug free ... the best you can hope for
> is that any bugs that are there are innocuous if they are triggered.
You don't have to ensure "bug free". But, you should be able to
point to a methodical attempt undertaken by you -- and repeatable
as well as augmentable by others -- that demonstrates the conditions
that you have verified *in* your code base. So, I can "suspect"
a problem, examine your test cases and see (or NOT see) that you
have (or have NOT) tested for that case before I waste additional
time chasing a possibly unlikely issue.
"Why is 'has' mispronounced? Oh, I see..."
I make no claim that, for example, the TTS algorithm above is
"bug free". Nor "optimal". But, I show what my thinking was
and the conditions under which I evaluated the algorithm. And,
provided that framework for the next guy who may want to
evaluate it under a different set of conditions.
[E.g., I am not concerned with "proper nouns" to the extent that
someone wanting to reappropriate the code to read names and
addresses out of a telephone directory would be! So, the rules
for that sort of application would be different, have different
relative priorities, etc. But, you'd still need a test framework
(populated with a different set of words/names) and a way to
evaluate the NEW algorithm's performance on that input set]
> No hobby developer - and damn few professionals - realistically can
> maintain test platforms for every possible configuration under which
> someone might try to run their software. Moreover, the majority of
> projects get no feedback whatsoever, so if a problem bug does slip
> through whatever testing is being done, only rarely does the developer
> find out about it.
How does *a* developer (need not be the original developer) know what
*has* been tested and what hasn't? Does he keep this on a scrap of
paper in his desk drawer? Or, does he just make up test cases WHILE
CODING THE ALGORITHM to increase his confidence in his solution?
(which has no direct bearing on how 'correct' the code is... just
how correct he *thinks* it is at that time!)
> I have developed FDA certified systems for diagnostic radiology and
> for pharmaceutical production. I have the experience of worrying
> about people being hurt if I screw up, large sums of money being lost,
> and of potentially being sued or even criminally prosecuted as a
> result. Percentage-wise, very few developers have experience of
> litigious and safety critical backgrounds to guide their efforts.
It shouldn't matter whether you are worrying about stockholders or
a life saved/lost. Unless you are implicitly saying "this product
isn't important... it doesn't HAVE TO WORK! It has NO INTRINSIC
VALUE -- because I make no guarantees that it does ANYTHING!".
All I am asking the FOSS developer to do is *state* what he claims
the value of his "product" to be. And, show me why he believes
that to be the case. If all it is guaranteed to do is burn CPU
cycles, then I won't bother wasting my time on it; there are lots
of permutations of opcodes that will yield those results! :>
I need something that allows me to decide where to invest *my*
time -- both as a consumer and contributor.
> There's a push to certify software developers in the same way that
> some engineers routinely are certified. It hasn't achieved much
> traction because so few software developers have the educational
> background to pass the proposed tests.
Ain't going to happen. Too many "programmers". And, business
only pays lip service to wanting those things -- they'd rather
just *hope* the diploma mills would crank out a new crop that
chases whatever their (business) current efforts are headed.
>> Granted, the "development" issue that I initially discussed is a
>> tough one -- how can I *expect* all FOSS developers to "settle"
>> on a common/shared set of tools/approach? While this is common
>> *within* an organization, it would be ridiculous to expect Company
>> A to use the same tools and process that Company B uses! And,
>> griping about it would be presumptuous.
>>
>> OTOH, it's fair to gripe when Company (Organization) X does things
>> in a way that is needlessly more complicated or dependent (on a
>> larger base of tools) than it needs to be.
>
> Yes and no. Working to the lowest common denominator often means
> exponentially more work. I agree that using oddball tools that few
> people have heard of is bad, but I disagree that using a relatively
> well known tool that just doesn't happen to be in the default OS
> installation is a problem.
I didn't imply it had to be *in* the default OS. Rather, that a
*set* of tools be used without adding more tools "willy nilly"
("I really like coding in Ruby so I'll do this little task in
Ruby instead of ________"). It's not just having a binary
available but, also, the skillset that then becomes a requirement
for the product's maintenance.
>> When I started my current set of (FOSS) projects, I was almost in
>> a state of panic over the "requirements" it would impose on others.
>> Too many tools, too much specialized equipment, skillsets, etc.
>>
>> After fretting about this for quite some time -- constantly trying
>> to eliminate another "dependency" -- I finally realized the "minimum"
>> is "a lot" and that's just the way it is!
>
> Yes. You and I have had a few conversations about this too.
And things will only get *worse* as software becomes increasingly more
complex!
E.g., it is almost impossible for <someone> to "trace" the execution
of a "program" in my environment with the same expectations they
have in a more traditional target. There are just too many virtual
layers involved, different physical processors, etc. Should I
build a tool to facilitate these efforts for others in the future?
Will that effectively NEEDLESSLY *tie* them into my particular
implementation?
Or, can I, instead, strive for a better functional partitioning
to reduce the need to comprehensively trace a "macro action"
as it wanders through the system? And, illustrate how you can
effectively debug just by watching subsystem interfaces?
>> ... And, *I* will endeavor to pick good tools that adequately
>> address their respective needs so you aren't forced to use two
>> *different* tools (e.g., languages) for the same class of "task".
>
> Above and beyond what most developers have come to expect. 8-)
<frown> There are still *huge* areas where I am unhappy with the level
of documentation, etc. that I am including. I just think most of our
tools (and practices) aren't mature enough to address the various
competing issues involved.
It would be like an architect having to use a blueprint to indicate
the placement of the support structures in an edifice -- and,
ENTIRELY DIFFERENT DOCUMENTS (format) to indicate the placement
of the various "services" (electric, plumbing, data, etc.) within
that structure (instead of indicating them on the same set of prints).
Mixing text, illustration, animation, interactive demos, sound,
video, etc. in *with* the "code" is a challenging environment.
Just this documentation aspect alone is what led me to ignore
the requirements I've imposed on "those who follow" -- it just
takes too many different tools to do all of these things and
trying to tie *my* hands just to cut down on what the next guy
needs to "invest" (esp if the next guy isn't modifying any of
those aspects of the documentation) was silly.
Sunday lunch. Finestkind. Then, make some ice cream for desert!
(that's the one consolation of dozens of hundred degree days -- you
can make ice cream OFTEN without raising eyebrows! :> )
--don