EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Constrained vocabulary speech synthesis

Started by Don Y June 3, 2014
Hi George,

On 6/6/2014 12:05 AM, George Neuner wrote:
> On Tue, 03 Jun 2014 09:09:19 -0700, Don Y<this@is.not.me.com> wrote: > >> [with a tiny bit of effort, you can imagine lots of >> similar constructs that require significant knowledge >> of grammar, PoS, and other context to "get right". Let >> alone oddities like Billerica, Worcester, Usa, etc.] > > Let's just agree that Massachusetts is hopeless and write it off. Even > making allowances for pronunciation, poor diction and odd > colloquialisms, there are too many MA natives who badly misspeak > [including many who theoretically have been well educated]. > > And don't pick on poor Worcester: it's a nice city ... in England ... > that isn't responsible for what Massachusetts did to it's name. > Historically it was pronounced the same as the English city - as was > Gloucester, Medford, Woburn, Salisbury, etc. The scratch-your-head > "huh?" pronunciations are all post Revolution (some post 1812).
There are lots of "towns" in the US that have bastardized their "original" foreign pronunciations. Berlin MA/CT/WI (BURR-lin) Italy TX (IT-lee) New Madrid MO (MAD-rid) Milan NY (MY-lun) Russia OH (ROO-shee) Cairo IL (KAY-row) etc.
> [I can mock MA because I'm from MA: I was born there and I live there > currently. Thankfully, during my formative years I lived elsewhere.]
Proper nouns are always good candidates for "exception dictionaries" (e.g., "Kurzweil" was one of the first words added to the KRM's exception dictionary -- for fairly obvious reasons :> ) But, English is so "screwed" that even commonplace words are exceptions (wrt the "rules" typically applied to other words in the language): two, of, this/these/them/that/etc., Wednesday, woman/women, etc. And, forget local variations: TX twang, southern drawl, Boston 'R' (vs. New York 'R'), "bash/mash" (esp in places like OH), "oil" in NJ, Louisiana cajun, Mainers, Wisconsin's odd stress assignment, etc. INsurance vs. inSURance, POlice vs. poLICE, etc. (I've heard arkansas pronounced areKANSAS) Trying for a "nominal" US speaking pattern is an exercise in futility! OTOH, most of us (US) are accustomed to encountering folks with different speech mannerisms -- let alone entirely different terms, colloquialisms, etc. <shrug>
Op Thu, 05 Jun 2014 19:41:26 +0200 schreef Don Y <this@is.not.me.com>:
> On 6/5/2014 6:10 AM, Boudewijn Dijkstra wrote: > > [attrs elided] > >>>>> But, I can't tell the user about issues that the (remote) >>>>> server wants to communicate -- unless I also constrain *it*! >>>>> (and never let it evolve without requiring software updates >>>>> of all potential clients). >>>> >>>> These are issues that are most likely not directly helpful to the >>>> user. >>>> At this point the device might output perfectly synthesized text, DTMF >>>> tones, a fax message, it doesn't really matter as the user cannot >>>> directly employ the information to make things work again. In other >>>> words, this kind of information is generally best relayed to a >>>> helpdesk >>>> of some sort. So, speech synthesis should not be an absolute >>>> requirement. >>> >>> How does the user *know* what the device is wanting to tell him >>> in order to relay that information to the help desk? I.e., you >>> have to get the information *to* the user before he can relay it >>> to "Support". >> >> Assuming that the device is not subdermally implanted, the user doesn't >> need to hear or understand the information at all! The device could say: >> "Please hook me up to a phone line, I wish to send a problem report" or >> something similar. Then the user could listen in and wait for the >> exchange to finish. > > So now the device needs to be able to connect to a phone line > (acoustically or otherwise) *and* there needs to be a phone > line *handy* (as well as accessible to the device's dialing > capabilities -- e.g., not behind a PBX).
Or ask the user to dial the helpdesk.
> And, to know how to > report dialing/connection problems there, as well.
The users are not deemed capable of that themselves?
> All this just to avoid being able to convey "alien" messages and > pronounce numbers in various formats in an intelligent manner?
Yes. To me it sounded like an option worth considering.
> [...] D'uh... :<
Indeed. -- (Remove the obvious prefix to reply privately.) Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/
Hi Boudewijn,

On 6/6/2014 4:53 AM, Boudewijn Dijkstra wrote:

[attrs elided]

>>>> How does the user *know* what the device is wanting to tell him >>>> in order to relay that information to the help desk? I.e., you >>>> have to get the information *to* the user before he can relay it >>>> to "Support". >>> >>> Assuming that the device is not subdermally implanted, the user doesn't >>> need to hear or understand the information at all! The device could say: >>> "Please hook me up to a phone line, I wish to send a problem report" or >>> something similar. Then the user could listen in and wait for the >>> exchange to finish. >> >> So now the device needs to be able to connect to a phone line >> (acoustically or otherwise) *and* there needs to be a phone >> line *handy* (as well as accessible to the device's dialing >> capabilities -- e.g., not behind a PBX). > > Or ask the user to dial the helpdesk.
OK. The device can, presumably, speak the phone number to the user so the user doesn't have to keep that "handy". If he happens to be on a (commuter) train at the time, he can presumably have a cell phone handy. Having dialed the help desk, how does the device *talk* to the help desk -- hold it up to the phone and have it "beep and bop" into the phone's mouthpiece? This would have to be a SIMPLEX data exchange else the help desk would have to alternately be telling the user "OK, now hold the mouthpiece of the device up to the earpiece of the phone", etc.
>> And, to know how to >> report dialing/connection problems there, as well. > > The users are not deemed capable of that themselves?
I assumed you would be thinking in terms of an *automated* help desk (since all that facility would be doing is interpreting beeps and bops from the device)
>> All this just to avoid being able to convey "alien" messages and >> pronounce numbers in various formats in an intelligent manner? > > Yes. To me it sounded like an option worth considering.
Now, imagine the user of a visual display device being forced into the same situation. Nice full-graphic display available for him but all he sees is "Error 2341. Please dial 555-1212 for assistance" followed by a string of hex digits that he must somehow communicate to the person on the help desk (e.g., DTMF if the user is mute). [Because the spec for the device and the servers says, "these are the a priori known error codes and what they really mean. any other error code will be reported as a 4 decimal digit code followed by a binary object presented in a form suitable for the device's output modality"] I don't see this as any easier -- and far less convenient (user will already be anxious because he is unable to use device) -- than just coming up with some reasonable constraints on the sorts of numerics that can be expressed in these "alien" messages. Falling back on "reading strings of digits" seems so kludgey and tedious...
>> [...] D'uh... :< > > Indeed.
So far, I have been impressed with the thoroughness of the "patterns" included! I'll have to inquire as to where he found them -- or, if he sat down and created them each, himself (though it appears he'd have had to do a fair bit of *research* to know what all of these data/number formats *are* before coding the regex's! -- license plate identifiers, phone numbers for various foreign dialing plans, etc.)
On Thu, 05 Jun 2014 10:41:26 -0700, Don Y <this@is.not.me.com> wrote:

>Hi Boudewijn, > >On 6/5/2014 6:10 AM, Boudewijn Dijkstra wrote: >> >> Assuming that the device is not subdermally implanted, the user doesn't >> need to hear or understand the information at all! The device could say: >> "Please hook me up to a phone line, I wish to send a problem report" or >> something similar. Then the user could listen in and wait for the >> exchange to finish. > >So now the device needs to be able to connect to a phone line >(acoustically or otherwise) *and* there needs to be a phone >line *handy* (as well as accessible to the device's dialing >capabilities -- e.g., not behind a PBX). And, to know how to >report dialing/connection problems there, as well.
My new washing machine can do just that. At some point in the conversation with customer service you can hold your phone up to a particular spot on the device, press another button for a few seconds, and it'll transmit a problem report. Obviously that sidesteps the problem of establishing the phone connection, as you have to be talking to support anyway.
Hi Robert,

On 6/6/2014 9:12 AM, Robert Wessel wrote:

[attrs elided]

>>> Assuming that the device is not subdermally implanted, the user doesn't >>> need to hear or understand the information at all! The device could say: >>> "Please hook me up to a phone line, I wish to send a problem report" or >>> something similar. Then the user could listen in and wait for the >>> exchange to finish. >> >> So now the device needs to be able to connect to a phone line >> (acoustically or otherwise) *and* there needs to be a phone >> line *handy* (as well as accessible to the device's dialing >> capabilities -- e.g., not behind a PBX). And, to know how to >> report dialing/connection problems there, as well. > > My new washing machine can do just that. At some point in the > conversation with customer service you can hold your phone up to a > particular spot on the device, press another button for a few seconds, > and it'll transmit a problem report. > > Obviously that sidesteps the problem of establishing the phone > connection, as you have to be talking to support anyway.
But, your washing machine probably doesn't have some *other* means of conveying generic messages to the user. E.g., a graphic/text display. If it had, presumably, it could display an N-digit "code" (embodying all of the pertinent information) along with instructions that you could dictate to the support tech. Or, a REAL MESSAGE! (gadzooks!) I've already got a means of communicating with the user. It's now a question of whether I let other devices (e.g., remote server) *use* that mechanism to convey their diagnostic/status messages to the user *or* force them to produce an "error code" that can *always* be conveyed to the user -- but, that the user then has to resolve via some other agency (e.g., help desk). It would be annoying to have to contact support only to be told "your device is telling you that your (paid) subscription has expired and, for a limited time, you could renew for $19.95 (plus shipping and handling)" :-/ Or, that the server is shedding load, "*PLEASE* use another server unless absolutely necessary. Otherwise, your request will be handled, shortly." Or, "A newer, faster service is available at XXXX. But, feel free to keep using this slower, outdated service if you like". Or, "Notice: a security exploit was detected last tuesday. Folks who used this service on XX/XX/XXXX should contact the System Administrator at 555-1212 x3-2211 ASAP." Or, "Upgraded <devices> are available at no cost from the service department from 9:00A - 5:00P. Ask for Bob." [BTW, Our washing machine makes all sensor information available to the user via the front panel in a "service mode" (published in the User Guide) using the numeric display inherent in the front panel]
On Fri, 06 Jun 2014 02:07:11 -0700, Don Y <this@is.not.me.com> wrote:

>Proper nouns are always good candidates for "exception dictionaries" >(e.g., "Kurzweil" was one of the first words added to the KRM's >exception dictionary -- for fairly obvious reasons :> )
No kidding. You wouldn't believe what people do to my name ... and it's relatively simple to guess the American if you don't know the proper German pronunciation (I answer to either, at least initially). George
Hi George,

On 6/7/2014 1:15 PM, George Neuner wrote:
> On Fri, 06 Jun 2014 02:07:11 -0700, Don Y<this@is.not.me.com> wrote: > >> Proper nouns are always good candidates for "exception dictionaries" >> (e.g., "Kurzweil" was one of the first words added to the KRM's >> exception dictionary -- for fairly obvious reasons :> ) > > No kidding. You wouldn't believe what people do to my name ... and > it's relatively simple to guess the American if you don't know the > proper German pronunciation (I answer to either, at least initially).
Too many different rule systems involved -- you'd need to be able to recognize *which* language's rules should apply, etc. E.g., Polish 'w'. So, "don't bother". :> I suspect my ruleset would even choke on many (common) *first* names -- though I've not run a formal test with that sort of input (e.g., Stephen, John, Valerie, Alan, etc.). I suspect stress assignment would also be incorrect. I'll add that to my ToDo list -- just to see (and laugh?). I think I have a list of names here, somewhere...
On Fri, 06 Jun 2014 10:43:46 -0700, Don Y <this@is.not.me.com> wrote:

>Hi Robert, > >On 6/6/2014 9:12 AM, Robert Wessel wrote: > >[attrs elided] > >>>> Assuming that the device is not subdermally implanted, the user doesn't >>>> need to hear or understand the information at all! The device could say: >>>> "Please hook me up to a phone line, I wish to send a problem report" or >>>> something similar. Then the user could listen in and wait for the >>>> exchange to finish. >>> >>> So now the device needs to be able to connect to a phone line >>> (acoustically or otherwise) *and* there needs to be a phone >>> line *handy* (as well as accessible to the device's dialing >>> capabilities -- e.g., not behind a PBX). And, to know how to >>> report dialing/connection problems there, as well. >> >> My new washing machine can do just that. At some point in the >> conversation with customer service you can hold your phone up to a >> particular spot on the device, press another button for a few seconds, >> and it'll transmit a problem report. >> >> Obviously that sidesteps the problem of establishing the phone >> connection, as you have to be talking to support anyway. > >But, your washing machine probably doesn't have some *other* >means of conveying generic messages to the user. E.g., a >graphic/text display. If it had, presumably, it could display >an N-digit "code" (embodying all of the pertinent information) >along with instructions that you could dictate to the support tech. > >Or, a REAL MESSAGE! (gadzooks!) > >I've already got a means of communicating with the user. It's >now a question of whether I let other devices (e.g., remote >server) *use* that mechanism to convey their diagnostic/status >messages to the user *or* force them to produce an "error code" >that can *always* be conveyed to the user -- but, that the user >then has to resolve via some other agency (e.g., help desk). > >It would be annoying to have to contact support only to be told >"your device is telling you that your (paid) subscription has >expired and, for a limited time, you could renew for $19.95 >(plus shipping and handling)" :-/ > >Or, that the server is shedding load, "*PLEASE* use another >server unless absolutely necessary. Otherwise, your request >will be handled, shortly." > >Or, "A newer, faster service is available at XXXX. But, feel >free to keep using this slower, outdated service if you like". > >Or, "Notice: a security exploit was detected last tuesday. >Folks who used this service on XX/XX/XXXX should contact >the System Administrator at 555-1212 x3-2211 ASAP." > >Or, "Upgraded <devices> are available at no cost from the >service department from 9:00A - 5:00P. Ask for Bob." > >[BTW, Our washing machine makes all sensor information available >to the user via the front panel in a "service mode" (published in >the User Guide) using the numeric display inherent in the >front panel]
I was merely providing an example of an implementation of something being discussed. In any event, the data transmitted is likely much larger than what the display (and there is one) could reasonably accommodate (or the user could usefully interpret, record, copy, etc.). The manual says it can take as long as 17 seconds to transmit the burst, and even with the most pessimistic assumptions, several hundred bytes of diagnostic information (after considering framing, error correction, etc.), should be possible.
Hi Robert,

On 6/8/2014 10:42 AM, Robert Wessel wrote:

>> [BTW, Our washing machine makes all sensor information available >> to the user via the front panel in a "service mode" (published in >> the User Guide) using the numeric display inherent in the >> front panel] > > I was merely providing an example of an implementation of something > being discussed.
Understood. OTOH, presumably these "diagnostic purposes" are more involved than providing information to the user that the *user* can make sense of. E.g., instead of "login failed", a server might return: - "login denied (which maps to 'login failed') due to nonpayment of fees" - "login denied; this account only accessible in non-peak hours. The current server time is XX:XX" - "login denied; this account suspended pending disciplinary action" Rather than trying to anticipate every *local* policy that might be implemented at some future date, allow the server to return a result code that the device *always* knows how to interpret ("login denied") along with a message INTENDED FOR THE HUMAN USER. This tends to be how most server replies are designed, nowadays. I.e., the "client" doesn't parse the text of the message but, rather, just the "error code" -- optionally passing the text on to the user for the user's perusal.
> In any event, the data transmitted is likely much larger than what the > display (and there is one) could reasonably accommodate (or the user > could usefully interpret, record, copy, etc.). The manual says it can > take as long as 17 seconds to transmit the burst, and even with the > most pessimistic assumptions, several hundred bytes of diagnostic > information (after considering framing, error correction, etc.), > should be possible.
Consumer kit tends to have skimpy displays. E.g., our washer has only a few seven segment displays (not counting "icons" or other annuciators/indicators). But, you can map a fair bit of information onto those beyond the "idiot light" that tells the user "something is wrong". [Granted, in our case, this is an interactive process but one that I could easily see a "tech" guiding a user through over the phone -- assuming they don't just want to dispatch a technician directly] In my case, I could conceivably recite *paragraphs* of speech (as it is an inherently serial output device and "capacity" is limited solely by the user's wetware). And, since I can already recite from a nontrivial vocabulary (i.e., not just digits), there's little to gain by placing a limit on that *now* -- especially when the issue is really just one of identifying likely numeric formats that have *implied* content that isn't explicitly conveyed by the individual "characters". Forcing the user to contact "support" when providing access to this message content would remove that *need* for the contact seems silly. (i.e., should I similarly HIDE the accompanying text message for users with *visual* display devices? "Call support if you want to know the text that follows this 'error number'") I think I've got a reasonably small "lexicon" of templates that cover most numeric presentations. It's enlightening to see how much we take for granted/assume/imply in these presentations! I've been browsing various server sources to get an idea of the types of "accompanying messages" that "result codes" with which they are tagged -- as well as poking at various online services to see what *they* want to disclose ("raw"). For "US" servers, I think I can cover almost everything that I've encountered (with the exception of personal names) -- let someone else worry about other localities! :) Thx, --don
The 2026 Embedded Online Conference