FPU vs soft library vs. fixed point

Hi,

I'm exploring tradeoffs in implementation of some computationally
expensive routines.

The easy (coding) solution is to just use doubles everywhere and
*assume* the noise floor is sufficiently far down that the ulp's
don't impact the results in any meaningful way.

But, that requires either hardware support (FPU) or a library
implementation or some "trickery" on my part.

Hardware FPU adds cost and increases average power consumption
(for a generic workload).  It also limits the choices I have
(severe cost and space constraints).

OTOH, a straight-forward library implementation burns more CPU
cycles to achieve the same result.  Eventually, I will have to
instrument a design to see where the tipping point lies -- how
many transistors are switching in each case, etc.

Fixed point solutions mean a lot more up-front work verifying
no loss of precision throughout the calculations.  Do-able but
a nightmare for anyone having to maintain the codebase.

OToOH, a non-generic library approach *could*, possibly, eek out
a win by eliminating unnecessary operations that an FPU (or a
generic library approach) would naively undertake.  But, this
similarly encumbers code maintainers -- to know *why* certain
operations can be elided at certain points in the algorithms, etc.

So, for a specific question:  anyone have any *real* metrics
regarding how efficient (power, cost) hardware FPU (or not!)
is in FP-intensive applications?  (by "FP-intensive", assume
20% of the operations performed by the processor fall into
that category).

Thx,
--don

Reply by rickman ●May 25, 20142014-05-25

On 5/25/2014 4:25 PM, Don Y wrote:
> Hi,
>
> I'm exploring tradeoffs in implementation of some computationally
> expensive routines.
>
> The easy (coding) solution is to just use doubles everywhere and
> *assume* the noise floor is sufficiently far down that the ulp's
> don't impact the results in any meaningful way.
>
> But, that requires either hardware support (FPU) or a library
> implementation or some "trickery" on my part.
>
> Hardware FPU adds cost and increases average power consumption
> (for a generic workload).  It also limits the choices I have
> (severe cost and space constraints).
>
> OTOH, a straight-forward library implementation burns more CPU
> cycles to achieve the same result.  Eventually, I will have to
> instrument a design to see where the tipping point lies -- how
> many transistors are switching in each case, etc.
>
> Fixed point solutions mean a lot more up-front work verifying
> no loss of precision throughout the calculations.  Do-able but
> a nightmare for anyone having to maintain the codebase.
>
> OToOH, a non-generic library approach *could*, possibly, eek out
> a win by eliminating unnecessary operations that an FPU (or a
> generic library approach) would naively undertake.  But, this
> similarly encumbers code maintainers -- to know *why* certain
> operations can be elided at certain points in the algorithms, etc.
>
> So, for a specific question:  anyone have any *real* metrics
> regarding how efficient (power, cost) hardware FPU (or not!)
> is in FP-intensive applications?  (by "FP-intensive", assume
> 20% of the operations performed by the processor fall into
> that category).

So far I haven't seen any requirements of either the computations 
vis-a-vis the noise floor or power consumption or anything else.  You 
seem to understand the basic concepts and tradeoffs, but perhaps you 
have no practical experience to know where, even approximately, the 
trade off work best.

I also don't have lots of experience with floating point, but I would 
expect if you are doing a lot of floating point the hardware would use 
less power than a software emulation.  I can't imagine the cost would be 
very significant unless you are building a million of them.

I think finding general "metrics" on FP approaches will be a lot harder 
than defining your requirements and looking for a solution that suits. 
Do you have requirements at this point?

-- 

Rick

Reply by Tim Wescott ●May 25, 20142014-05-25

On Sun, 25 May 2014 17:18:15 -0400, rickman wrote:

> On 5/25/2014 4:25 PM, Don Y wrote:
>> Hi,
>>
>> I'm exploring tradeoffs in implementation of some computationally
>> expensive routines.
>>
>> The easy (coding) solution is to just use doubles everywhere and
>> *assume* the noise floor is sufficiently far down that the ulp's don't
>> impact the results in any meaningful way.
>>
>> But, that requires either hardware support (FPU) or a library
>> implementation or some "trickery" on my part.
>>
>> Hardware FPU adds cost and increases average power consumption (for a
>> generic workload).  It also limits the choices I have (severe cost and
>> space constraints).
>>
>> OTOH, a straight-forward library implementation burns more CPU cycles
>> to achieve the same result.  Eventually, I will have to instrument a
>> design to see where the tipping point lies -- how many transistors are
>> switching in each case, etc.
>>
>> Fixed point solutions mean a lot more up-front work verifying no loss
>> of precision throughout the calculations.  Do-able but a nightmare for
>> anyone having to maintain the codebase.
>>
>> OToOH, a non-generic library approach *could*, possibly, eek out a win
>> by eliminating unnecessary operations that an FPU (or a generic library
>> approach) would naively undertake.  But, this similarly encumbers code
>> maintainers -- to know *why* certain operations can be elided at
>> certain points in the algorithms, etc.
>>
>> So, for a specific question:  anyone have any *real* metrics regarding
>> how efficient (power, cost) hardware FPU (or not!)
>> is in FP-intensive applications?  (by "FP-intensive", assume 20% of the
>> operations performed by the processor fall into that category).
> 
> So far I haven't seen any requirements of either the computations
> vis-a-vis the noise floor or power consumption or anything else.  You
> seem to understand the basic concepts and tradeoffs, but perhaps you
> have no practical experience to know where, even approximately, the
> trade off work best.
> 
> I also don't have lots of experience with floating point, but I would
> expect if you are doing a lot of floating point the hardware would use
> less power than a software emulation.  I can't imagine the cost would be
> very significant unless you are building a million of them.

The cost is often more in that the pool of available processors shrinks 
dramatically, and it's hard to get physically small parts.


-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

Reply by Don Y ●May 25, 20142014-05-25

Hi Rick,

On 5/25/2014 2:18 PM, rickman wrote:
> On 5/25/2014 4:25 PM, Don Y wrote:

>> I'm exploring tradeoffs in implementation of some computationally
>> expensive routines.

>> So, for a specific question: anyone have any *real* metrics
>> regarding how efficient (power, cost) hardware FPU (or not!)
>> is in FP-intensive applications? (by "FP-intensive", assume
>> 20% of the operations performed by the processor fall into
>> that category).
>
> So far I haven't seen any requirements of either the computations
> vis-a-vis the noise floor or power consumption or anything else. You
> seem to understand the basic concepts and tradeoffs, but perhaps you
> have no practical experience to know where, even approximately, the
> trade off work best.
>
> I also don't have lots of experience with floating point, but I would
> expect if you are doing a lot of floating point the hardware would use
> less power than a software emulation. I can't imagine the cost would be
> very significant unless you are building a million of them.

I'm pricing in 100K quantities -- which *tends* to make cost
differences diminish.

But, regardless of quantity, physics dictates the volume of a
battery/cell required to power the things!  Increased quantity
doesn't make it draw less power, etc.  :<

> I think finding general "metrics" on FP approaches will be a lot harder
> than defining your requirements and looking for a solution that suits.
> Do you have requirements at this point?

I can't discuss two of the applications.  But, to "earn my training
wheels", I set out to redesign another app with similar constraints.
It's a (formant) speech synthesizer that runs *in* a BT earpiece.
(i.e., the size of the device is severely constrained -- which has
repercussions on power available, etc.)

A shirt-cuff analysis of the basic processing loop shows ~60 FMUL,
~30 FADD and a couple of trig/transcend operations per iteration.
That's running at ~20KHz (lower sample rates make it hard to
synthesize female and child voices with any quality).  Not a tough
requirement to meet WITHOUT the power+size constraints.  But, throw
it in a box with a ~0.5WHr power source and see how long it lasts
before you're back on the charger!  :-/

There are some computations that I've just omitted from this tally
as they would be hard to quantify in their current forms -- one would
be foolish no naively implement them (e.g., glottal waveform synthesis).

I think this would be a good "get my feet wet" application because
all of the math is constrained a priori.  While I can't *know* what
the synthesizer will be called upon to speak, I *do* know what all
of the CONSTANTS are that drive the algorithms.

As such, I *know* there is no need to support exceptions, I know
the maximum and minimum values to be expected at each stage in
the computation, etc.

At the same time, it has fixed (hard) processing limits -- I can't
"preprocess" speech and play it back out of a large (RAM) buffer...
there's no RAM lying around to exploit in that wasteful a manner.

It also highlights the potential problem of including FPU hardware
in a design if it isn't *always* in use -- does having an *idle*
FPU carry any other recurring (operational) costs?  (in theory,
CMOS should have primarily dynamic currents... can I be sure the
FPU is truly "idle" when I'm not executing FP opcodes?)

And, how much "assist" do the various FPA's require?  Where is the
break-even point for a more "tailored" approach?

Note that hardware FPU and software *generic* libraries have to
accommodate all sorts of use/abuse.  They can't know anything about
the data they are being called upon to process so always have to
"play it safe".  (imagine how much extra work they do when summing
a bunch of numbers of similar magnitudes!)

I'm hoping someone has either measured hardware vs. software
implementations (if not, that's the route I'll be pursuing)
*or* looked at the power requirements of each approach...

--don

Reply by Don Y ●May 25, 20142014-05-25

Hi Tim,

On 5/25/2014 3:40 PM, Tim Wescott wrote:

>> I also don't have lots of experience with floating point, but I would
>> expect if you are doing a lot of floating point the hardware would use
>> less power than a software emulation.  I can't imagine the cost would be
>> very significant unless you are building a million of them.
>
> The cost is often more in that the pool of available processors shrinks
> dramatically, and it's hard to get physically small parts.

Exactly.  Especially if the "other" functions that the processor
is performing do not benefit from the extra (hardware) complexity.

"advanced RISC machine"

Would you use a processor with a GPU (G, not F) to control a
CNC lathe?  Even if it had a GUI?  Or, would you "tough it out"
and write your own blit'er and take the performance knock on
the (largely static!) display tasks?

Reply by Tim Wescott ●May 25, 20142014-05-25

On Sun, 25 May 2014 15:45:24 -0700, Don Y wrote:

> Hi Rick,
> 
> On 5/25/2014 2:18 PM, rickman wrote:
>> On 5/25/2014 4:25 PM, Don Y wrote:
> 
>>> I'm exploring tradeoffs in implementation of some computationally
>>> expensive routines.
> 
>>> So, for a specific question: anyone have any *real* metrics regarding
>>> how efficient (power, cost) hardware FPU (or not!)
>>> is in FP-intensive applications? (by "FP-intensive", assume 20% of the
>>> operations performed by the processor fall into that category).
>>
>> So far I haven't seen any requirements of either the computations
>> vis-a-vis the noise floor or power consumption or anything else. You
>> seem to understand the basic concepts and tradeoffs, but perhaps you
>> have no practical experience to know where, even approximately, the
>> trade off work best.
>>
>> I also don't have lots of experience with floating point, but I would
>> expect if you are doing a lot of floating point the hardware would use
>> less power than a software emulation. I can't imagine the cost would be
>> very significant unless you are building a million of them.
> 
> I'm pricing in 100K quantities -- which *tends* to make cost differences
> diminish.
> 
> But, regardless of quantity, physics dictates the volume of a
> battery/cell required to power the things!  Increased quantity doesn't
> make it draw less power, etc.  :<
> 
>> I think finding general "metrics" on FP approaches will be a lot harder
>> than defining your requirements and looking for a solution that suits.
>> Do you have requirements at this point?
> 
> I can't discuss two of the applications.  But, to "earn my training
> wheels", I set out to redesign another app with similar constraints.
> It's a (formant) speech synthesizer that runs *in* a BT earpiece. (i.e.,
> the size of the device is severely constrained -- which has
> repercussions on power available, etc.)
> 

<< snip >>

> 
> I'm hoping someone has either measured hardware vs. software
> implementations (if not, that's the route I'll be pursuing)
> *or* looked at the power requirements of each approach...

If you're a direct employee and you're working at 100K quantities, it 
should be exceedingly easy to get the attention of the chip company's 
applications engineers.  Maybe too easy -- there have been times in my 
career that I haven't asked an app engineer a question because I couldn't 
handle the thought of fending him off for the next six months.  At any 
rate, you could ask for white papers, or at those quantities you may even 
prompt someone to do some testing.

Assuming that a processor with software emulation could get the job done, 
a processor with an FPU may still be more power efficient because it 
could be mostly turned off.

Be careful shopping for processors -- there are a lot of processors out 
there that have a single-precision FPU that does nothing for double-
precision operations: you often have to dig to find out what a processor 
can really do.

My gut feel is that at 100K quantities you can do the work in fixed-
point, document the hell out of it, and come out ahead.  If time-to-
market is an issue, call me!  Matrix multiplies of fixed-point numbers 
with block-floating point coefficients can be very efficient on DSP-ish 
processors, and doing the math to make sure you're staying inside the 
lines isn't all that hard.

-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

Reply by Don Y ●May 25, 20142014-05-25

Hi Tim,

On 5/25/2014 4:19 PM, Tim Wescott wrote:

[attrs elided]

>> I'm hoping someone has either measured hardware vs. software
>> implementations (if not, that's the route I'll be pursuing)
>> *or* looked at the power requirements of each approach...
>
> If you're a direct employee and you're working at 100K quantities, it
> should be exceedingly easy to get the attention of the chip company's
> applications engineers.  Maybe too easy -- there have been times in my
> career that I haven't asked an app engineer a question because I couldn't
> handle the thought of fending him off for the next six months.

I found that running big numbers by vendors usually ended up with them
camped out on my doorstep as if they were expecting the delivery of
their first child!  :<

Often, I think clients farm out projects deliberately so any queries
that might be "revealing" don't originate from their offices.  More
than once I've had sales reps "gossip" about projects at competitors'
firms -- so why would they expect *me* to trust them with proprietary
details??  :<

> At any
> rate, you could ask for white papers, or at those quantities you may even
> prompt someone to do some testing.

The sort of testing that they would do, I could *easily* do!  Turn off
FP support, drag in an emulation library, measure elapsed time, power
consumption, etc.

That's an unoptimized comparison.  It forces the "software" approach to
bear the same sorts of costs that the hardware implementation *must*
bear (i.e., you can't purchase a 27b FPU... or, one that elides support
for any ops that you never use, etc.  OTOH, you *can* do this with a
software approach!)

> Assuming that a processor with software emulation could get the job done,
> a processor with an FPU may still be more power efficient because it
> could be mostly turned off.

Yes, assuming the static currents are effectively zero.  And, that the
dynamic currents don't alter the average power capacity of the battery,
etc. (e.g., a prolonged low power drain *may* be better for the battery
than one that exhibits lots of dynamism.  Esp given battery chemistries
intended for quick charge/discharge cycles)

> Be careful shopping for processors -- there are a lot of processors out
> there that have a single-precision FPU that does nothing for double-
> precision operations: you often have to dig to find out what a processor
> can really do.

Yes.  And, many "accelerators" instead of genuine FPUs.  This just
complicates finding the sweet spot (e.g., normalizing and denormalizing
values is expensive in software... perhaps a win for a *limited* FPA?)

> My gut feel is that at 100K quantities you can do the work in fixed-
> point, document the hell out of it, and come out ahead.  If time-to-
> market is an issue, call me!  Matrix multiplies of fixed-point numbers
> with block-floating point coefficients can be very efficient on DSP-ish
> processors, and doing the math to make sure you're staying inside the
> lines isn't all that hard.

I have become ever-increasingly verbose in my documentation.  To
the point that current docs include animations, interactive
presentations, etc.  (can't embed this sort of stuff in sources)

Yet, folks still seem to act as if they didn't understand the docs
*or* didn't bother to review them!  (Start coding on day #1.  Figure
out *what* you are supposed to be coding around day #38...)

    "Why is this <whatever> happening?  I only changed two constants
    in the code! ..."

    "Um, did you read the footnote on page 13 of the accompanying
    description of the algorithm?  And, the constraints that it
    clearly described that applied to those constants?  Did you
    have all the DEBUG/invariants enabled when you compiled the
    code?  If so, it would have thrown a compile-time error
    alerting you to your folly..."

    "No, I just put all that stuff in a folder when I started work
    on this.  I figured it would just be easier to ask *you* as you
    wrote it all..."

    "Ah!  Unable to read at your age?  Amazing you made it through
    life thus far!  Tell me... how good are you with *numbers*?
    I.e., are you sure your paycheck is right??  If you'd like, I
    can speak to your employer about that.  I suspect I can make
    *him* much happier with my suggested adjustments...  :-/ "

People seem to just want to *poke* at working code in the hope
that it ends up -- miraculously -- doing what they want it to do.
Rather than *understanding* how it works so they can make intelligent
changes to it.  I think the rapid edit-compile-link-debug cycle times
contribute to this.  It's *too* easy to just make a change and see
what happens -- without thinking about whether or not the change is
what you *really* want.  How do you ever *know* you got it right?

Anyway... the appeal of an "all doubles" approach is there's little
someone can do to *break* it (other than OBVIOUSLY breaking it!).
I'm just not keen on throwing away that cost/performance *just*
to allow "lower cost" developers to be hired... (unless I can put
that on the BoM and get a bean-counter to price it!  :> )

End of today's cynicism.  :)  Time for ice cream!

Reply by rickman ●May 26, 20142014-05-26

On 5/25/2014 6:56 PM, Don Y wrote:
> Hi Tim,
>
> On 5/25/2014 3:40 PM, Tim Wescott wrote:
>
>>> I also don't have lots of experience with floating point, but I would
>>> expect if you are doing a lot of floating point the hardware would use
>>> less power than a software emulation.  I can't imagine the cost would be
>>> very significant unless you are building a million of them.
>>
>> The cost is often more in that the pool of available processors shrinks
>> dramatically, and it's hard to get physically small parts.

Can't say since "small" is not really anything I can measure.  There are 
very small packages available (some much smaller than I want to work 
with).  I have to assume you mean many of the parts with FP they also 
have a larger pin count, meaning 100 and up.  But unless I have a set of 
requirements, that is getting into some very hard to compare features.

> Exactly.  Especially if the "other" functions that the processor
> is performing do not benefit from the extra (hardware) complexity.

What?  If one part of a design needs an I2C interface you think you 
should not use a hardware I2C interface because the entire project 
doesn't need it???  That makes no sense to me.

> "advanced RISC machine"
>
> Would you use a processor with a GPU (G, not F) to control a
> CNC lathe?  Even if it had a GUI?  Or, would you "tough it out"
> and write your own blit'er and take the performance knock on
> the (largely static!) display tasks?

This is a total non-sequitur.  Reminds me of a supposed true story where 
one employee said you get floor area by dividing length by width rather 
than multiplying.  When that was questioned he reasoned with, "How many 
quarters in a dollar?  How many quarters in two dollars?... SEE!"  lol

-- 

Rick

Reply by Paul Rubin ●May 26, 20142014-05-26

rickman <gnuarm@gmail.com> writes:
> I have to assume you mean many of the parts with FP they also have a
> larger pin count, meaning 100 and up.  But unless I have a set of
> requirements, that is getting into some very hard to compare features.

http://www.ti.com/product/tm4c123gh6pm used in the TI Tiva Launchpad is
a 64LQFP, still not exactly tiny.  There is supposedly a new comparable
Freescale part (MK22FN1M0VLH12, also 64LQFP) that is pin compatible with
the part in the Teensy 3.1 (pjrc.com), and that has floating point (the
Teensy cpu is integer-only).  I wonder if the pjrc guy will make a
Teensy 3.2 with the new part, which also has more memory.  It's a cute
little board.  The FP on all these parts is unfortunately single
precision.

Reply by rickman ●May 26, 20142014-05-26

On 5/25/2014 6:45 PM, Don Y wrote:
> Hi Rick,
>
> On 5/25/2014 2:18 PM, rickman wrote:
>> On 5/25/2014 4:25 PM, Don Y wrote:
>
>>> I'm exploring tradeoffs in implementation of some computationally
>>> expensive routines.
>
>>> So, for a specific question: anyone have any *real* metrics
>>> regarding how efficient (power, cost) hardware FPU (or not!)
>>> is in FP-intensive applications? (by "FP-intensive", assume
>>> 20% of the operations performed by the processor fall into
>>> that category).
>>
>> So far I haven't seen any requirements of either the computations
>> vis-a-vis the noise floor or power consumption or anything else. You
>> seem to understand the basic concepts and tradeoffs, but perhaps you
>> have no practical experience to know where, even approximately, the
>> trade off work best.
>>
>> I also don't have lots of experience with floating point, but I would
>> expect if you are doing a lot of floating point the hardware would use
>> less power than a software emulation. I can't imagine the cost would be
>> very significant unless you are building a million of them.
>
> I'm pricing in 100K quantities -- which *tends* to make cost
> differences diminish.

I'm not sure what you are saying about cost differences diminishing. 
High volume makes cost differences jump out and be noticed!  Or are you 
saying everyone quotes you great prices at those volumes?

> But, regardless of quantity, physics dictates the volume of a
> battery/cell required to power the things!  Increased quantity
> doesn't make it draw less power, etc.  :<

I may have something completely different for you to consider.

>> I think finding general "metrics" on FP approaches will be a lot harder
>> than defining your requirements and looking for a solution that suits.
>> Do you have requirements at this point?
>
> I can't discuss two of the applications.  But, to "earn my training
> wheels", I set out to redesign another app with similar constraints.
> It's a (formant) speech synthesizer that runs *in* a BT earpiece.
> (i.e., the size of the device is severely constrained -- which has
> repercussions on power available, etc.)
>
> A shirt-cuff analysis of the basic processing loop shows ~60 FMUL,
> ~30 FADD and a couple of trig/transcend operations per iteration.
> That's running at ~20KHz (lower sample rates make it hard to
> synthesize female and child voices with any quality).  Not a tough
> requirement to meet WITHOUT the power+size constraints.  But, throw
> it in a box with a ~0.5WHr power source and see how long it lasts
> before you're back on the charger!  :-/
>
> There are some computations that I've just omitted from this tally
> as they would be hard to quantify in their current forms -- one would
> be foolish no naively implement them (e.g., glottal waveform synthesis).
>
> I think this would be a good "get my feet wet" application because
> all of the math is constrained a priori.  While I can't *know* what
> the synthesizer will be called upon to speak, I *do* know what all
> of the CONSTANTS are that drive the algorithms.
>
> As such, I *know* there is no need to support exceptions, I know
> the maximum and minimum values to be expected at each stage in
> the computation, etc.
>
> At the same time, it has fixed (hard) processing limits -- I can't
> "preprocess" speech and play it back out of a large (RAM) buffer...
> there's no RAM lying around to exploit in that wasteful a manner.

I highly recommend that you not use pejoratives like "wasteful" when it 
comes to engineering.  One man's waste is another man's efficiency.  It 
only depends on the numbers.  I don't know what your requirements are so 
I can't say having RAM is wasteful or not.

> It also highlights the potential problem of including FPU hardware
> in a design if it isn't *always* in use -- does having an *idle*
> FPU carry any other recurring (operational) costs?  (in theory,
> CMOS should have primarily dynamic currents... can I be sure the
> FPU is truly "idle" when I'm not executing FP opcodes?)

I think this is a red herring.  If you are worried about power 
consumption, worry about power consumption.  Don't start worrying about 
what is idle and what is not before you even get started.  Do you really 
think the FP instructions are going to be hammering away at the power 
draw when they are not being executed?  Do you worry about the return 
from interrupt instruction when you aren't using that?

> And, how much "assist" do the various FPA's require?  Where is the
> break-even point for a more "tailored" approach?

What is an FPA?

> Note that hardware FPU and software *generic* libraries have to
> accommodate all sorts of use/abuse.  They can't know anything about
> the data they are being called upon to process so always have to
> "play it safe".  (imagine how much extra work they do when summing
> a bunch of numbers of similar magnitudes!)
>
> I'm hoping someone has either measured hardware vs. software
> implementations (if not, that's the route I'll be pursuing)
> *or* looked at the power requirements of each approach...

If you are designing a device for 100k production run, it would seem 
reasonable to do some basic testing and get real answers to your 
questions rather than to ask others for their opinions and biases.

Ok, I'm still not clear on your application requirements, but if you 
need some significant amount of computation ability with analog I/O and 
power is a constraint, I know of a device you might want to look at.

The GA144 from Green Arrays is an array of 144 async processors, each of 
which can run instructions at up to 700 MIPS.  Floating point would need 
to be software, but that should not be a significant issue in this case. 
  The features that could be great for your app are...

1) Low standby power consumption of 8 uA, active power of 5 mW/processor
2) Instant start up on trigger
3) Integrated ADC and DAC (5 each) with variable resolution/sample rate 
(can do 20 kHz at ~15 bits)
4) Small device in 1 cm sq 88 pin QFP
5) Small processors use little power and suspend in a single instruction 
time reducing power to 55 nA each with instant wake up.

This device has its drawbacks too.  It is programmed in a Forth like 
language which many are not familiar with.  The I/Os are 1.8 volts which 
should not be a problem in your app.  Each processor is 18 bits with 
only 64 words of memory, not sure what your requirements might be.  You 
can hang an external memory on the device.  It needs a separate SPI 
flash to boot and for program storage.

The price is in the $10 ball park in lower volumes, not sure what it is 
at 100k units.

One of the claimed apps that has been prototyped on this processor is a 
hearing aid app which requires a pair of TMS320C6xxx processors using a 
watt of power (or was it a watt each?).  Sounds a bit like your app.  :)

Using this device will require you to forget everything you think you 
know about embedded processors and letting yourself be guided by the 
force.  But your app might just be a good one for the GA144.

-- 

Rick

Previous12 3 Next

FPU vs soft library vs. fixed point

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group