Vladimir Vassilevsky <nospam@nowhere.com> wrote in 
news:7oSdncAnWruvTU_RnZ2dnUVZ_sadnZ2d@giganews.com:

> 
> 
> Alessandro Basili wrote:
> 
>> Hi,
>> I have a custom designed board which is based on an ADSP2187L dsp, with 
>> a 512KB FLASH, a 128KB SRAM (20 ns cycle) and an FPGA running at 50 MHz.
> 
> 
> ADSP218x is a dinosaur with the address space of 16K words; what is all 
> that external memory for? This DSP is inconvenient for C programming as 
> it doesn't support stack frames and base-index addressing.
> 
> 
>> The dsp task is mainly processing commands and data from external links 
>> (space-wire) on a 200~500 us cycle.
>> Since the actual software has been written in assembler and there are 
>> some concerns about its reliability, I was wondering whether it is worth 
>> while considering the possibility to move to a higher language like C, 
>> in order to gain in maintainability and readability without loosing too 
>> much on performances.
> 
> Yes, you can program ADSP218x in C; VDSP toolset recommended.
> Yes, there will be significant overhead.
> It is impossible to tell if there is enough of speed without actually 
> knowing the application.
> Why getting stuck with the 20 year old technology?
> 
> 
> Vladimir Vassilevsky
> DSP and Mixed Signal Design Consultant
> http://www.abvolt.com

As someone who has written a considerable amount of 218x code, the short 
answer is FORGET USING THE C COMPILER! The ADSP-218x architecture predates 
the emergence of C as a universal language. The registers are not 
orthogonal, the number of pointers is limited, etc.

The good news is that 218x assembly is very easy to read and understand and 
looks more like C than traditional assembly. It is not a difficult 
processor.

I agree with Vladimir that it is essentially obsolete. The only reason I 
see to use it is if you have a large existing code base.

You could use a new ADSP-21489 or ADSP-21479 fixed/floating point DSP for 
less money, easier programing in C or assembly, and substantially more 
performance. I mention SHARC because the assembly language has a similar, 
but actually easier structure and could be ported to C as desired.

You could also use any number of Blackfin targets. These are just the ADI 
choices. There are many other possibilities as well.

Al Clark
www.danvillesignal.com

"Randy Yates" <yates@ieee.org> wrote in message 
news:m3pquf5q8m.fsf@ieee.org...
> Randy Yates <yates@ieee.org> writes:
>
>> Vladimir Vassilevsky <nospam@nowhere.com> writes:
>>
>>> glen herrmannsfeldt wrote:
>>>
>>>
>>>> Much embedded programming is done in C with inline assembler.
>>>
>>> Inline assembler is bad style and a characteristic feature of lame
>>> programmers. It combines disadvantages of both C and assembler.
>>>
>>> If there is a need to use an assembler, make a separate module in
>>> assembler and call it from C as an external function.
>>
>> Amen, brother! My philosophy exactly!
>
> PS: I highly recommand yasm for x86 assembly.
>
>  http://www.tortall.net/projects/yasm/
>

thanks.  I would like to borrow one arsonist.

In comp.dsp David Brown <david@westcontrol.removethisbit.com> wrote:
(snip regarding inline or external assembly code using RDTSC)

>> There is another complication, though, relating to pipelining
>> and RDTSC.  In your example, you want Part A to run before RDTSC,
>> and Part B to run after.  Without the jump, it is more likely that
>> the processor will be able to execute RDTSC out of order, such that
>> the timing doesn't do what you expect.
 
> That is partially true.  As you say yourself, "it is more likely" - 
> forcing a jump makes it more likely that part A runs before RDTSC, and 
> part B runs after it.  But it doesn't guarantee it.

Yes.  There could also be a task switch in between, which should
result in a very large increment in the count.  I have never seen
that in my uses of RDTSC.  I once even did it from Java using JNI
and it seemed to work just fine.  
 
> This is a common problem in situations when you need assembly rather 
> than C, and it is something that a lot of people have trouble 
> understanding.  It is a common mistake to think that "volatile" can be 
> used to get it right - without realising that the compiler can do a lot 
> of re-ordering of non-volatile instructions and accesses over and around 
> the volatile ones.  

(snip)

> The compiler feature you need is a "memory barrier", that tells the 
> compiler to complete any outstanding calculations and stores, and that 
> code afterwards must re-read from memory (and thus can't be executed 
> before the memory barrier).  With gcc, the simplest memory barrier is a 
> "memory clobber" inline assembly  - asm volatile ("" ::: "memory"). 
> Other compilers have different methods, and if you are using an OS that 
> provides barriers, then use them.

For IA32 the closest I find is SFENCE, Store Fence, which guarantees
that all previous stores are complete before following stores are done.
(My words after reading the description.)
 
(snip)
>> Most of the time I try to time a whole loop, such that register
>> use isn't so much of a problem.  Not quite long enough to time
>> with millisecond TOD clocks, though.
 
> Accurate measurement of timing on a processor as big and complex as 
> modern x86's is far from easy - it is always going to be inaccurate, and 
> a rough average over time is the best you can get.

Usually averaged over a long loop it is close enough, at least for
a specific generation of processors.  Also, RDTSC reduces the problem
of variable clock rates, counting clock cycles instead of real time.
 
> But it's a fair point - there is no need to make your code smaller or 
> faster than it actually needs to be.  Premature optimisation is the root 
> of all evil, after all.

Yes, I usually don't do that until it really needs to be done.

-- glen

On 10/11/2010 00:10, glen herrmannsfeldt wrote:
> In comp.dsp David Brown<david@westcontrol.removethisbit.com>  wrote:
> (snip, I wrote)
>
>>> The last few assembly routines I wrote were to use the IA32 RDTSC
>>> instruction.  It conveniently returns the 64 bit result in EDX:EAX,
>>> the usual return registers for (long long) on IA32.  The executable
>>> instructions are RDTSC and RET.
>
> (snip)
>
>> But let's think about what actually happens in the processor the "rdtsc"
>> instruction implemented as an external assembly module, and as inline
>> assembly.  Bear with me on the details here - I have never used assembly
>> on the x86, and I haven't tested this code.
>
>> Case 1 - external module.
>
>> readTimestamp:
>>         rdtsc
>>         ret
> (snip)
>
>> In other words, the call to
>> readTimestamp is a serious block in the flow of the optimiser.
>
>> Secondly, consider the processor executing the code.  The function call
>> and the return are non-conditional, so they will be executed early in
>> the instruction pipeline.  But any jump in the instruction flow means a
>> new block of memory needs to be in the cache, with associated risks of
>> cache misses, page misses, etc.
>
> There is another complication, though, relating to pipelining
> and RDTSC.  In your example, you want Part A to run before RDTSC,
> and Part B to run after.  Without the jump, it is more likely that
> the processor will be able to execute RDTSC out of order, such that
> the timing doesn't do what you expect.
>

That is partially true.  As you say yourself, "it is more likely" - 
forcing a jump makes it more likely that part A runs before RDTSC, and 
part B runs after it.  But it doesn't guarantee it.

This is a common problem in situations when you need assembly rather 
than C, and it is something that a lot of people have trouble 
understanding.  It is a common mistake to think that "volatile" can be 
used to get it right - without realising that the compiler can do a lot 
of re-ordering of non-volatile instructions and accesses over and around 
the volatile ones.  Another common mistake is to think you can write it 
in assembly, and that tricks like forcing an unnecessary jump or call 
will ensure things are executed in the right order - without realising 
that the processor can do substantial re-ordering.

In fact, you need both parts - you need compiler-specific functionality 
to tell the compiler to keep part A and part B separate, and you need 
target-specific functionality to tell the processor to stall until 
in-flight instructions are completed.

The compiler feature you need is a "memory barrier", that tells the 
compiler to complete any outstanding calculations and stores, and that 
code afterwards must re-read from memory (and thus can't be executed 
before the memory barrier).  With gcc, the simplest memory barrier is a 
"memory clobber" inline assembly  - asm volatile ("" ::: "memory"). 
Other compilers have different methods, and if you are using an OS that 
provides barriers, then use them.

The cpu feature will be some sort of "sync" instruction.  I don't know 
what that might be on the x86 - it might be quite complicated depending 
on the exact effect you are trying to achieve.  Again, if your OS or 
compiler has such functions, use them.  (They will typically be 
implemented as macros and inline assembly.)

But in any case, a method that is "more likely to work" is not a solution.

>> In other words, it can be quite costly to read the timestamp this way.
>
>> With inline assembly, you have just test.c :
>
>> static inline uint64_t readTimestamp(void) {
>>         uint64_t x;
>>         asm (" rdtsc " : "=A" (x) :: );
>>         return x;
>> }
> (snip)
>
>> The use of this inline assembly within the test() function is identical.
>>   But now the compiler knows everything about it - it knows that
>> readTimestamp changes EDX:EAX, but leaves EBX and ECX untouched, and
>> neither reads from nor writes to memory.  This gives it much more
>> flexibility in optimising the code in test().  The generated code will
>> be nothing more than the single "rdtsc" instruction and whatever
>> register movements are needed, and as it is inline there is no change of
>> flow when the processor is executing it.
>
> Most of the time I try to time a whole loop, such that register
> use isn't so much of a problem.  Not quite long enough to time
> with millisecond TOD clocks, though.
>

Accurate measurement of timing on a processor as big and complex as 
modern x86's is far from easy - it is always going to be inaccurate, and 
a rough average over time is the best you can get.

But it's a fair point - there is no need to make your code smaller or 
faster than it actually needs to be.  Premature optimisation is the root 
of all evil, after all.

On the other hand, there is no point in making your code needlessly 
bigger and slower, especially when better alternatives are easier to write.

>> For processors with more registers, there is even more to be gained with
>> inline assembly.  On the PPC, for example, I used this inline assembly
>> function recently in code that converted a set of values from
>> little-endian to big-endian:
>
>> static inline uint32_t readByteSwapped(const uint32_t * p, uint32_t x) {
>>         uint32_t y;
>>         asm (" lwbrx %[y], %[x], %[p] " :
>>                  [y] "=r" (y) :
>>                 [x] "r" (x), [p] "r" (p) );
>>         return y;
>> }
>
>> By using inline assembly, the compiler can allocate registers and
>> pipeline the byte-swapped reads optimally.  If I made this an external
>> assembly routine following the C calling conventions, the code would
>> have been ten times slower - using shifts and masks for the conversion
>> would have been faster.
>
> At least for IA32, I have been told that many compilers have
> a built-in inlined bswap() such that you don't need to write one.
>

Different processors and different compilers have different features - 
this code was specific for the ppc using Code Warrior (it would also 
work with gcc).  If I were using Diab Data's compiler (I can't remember 
what they are called now), I'd use it's extensions that let you declare 
data as big or little-endian explicitly, and let the compiler generate 
the best code.

> (snip)
>
>> Assembly is /always/ ugly and non-portable.  Being inline or external
>> makes no difference there.
>
> (snip)
>
> -- glen

In comp.dsp David Brown <david@westcontrol.removethisbit.com> wrote:
(snip, I wrote)

>> The last few assembly routines I wrote were to use the IA32 RDTSC
>> instruction.  It conveniently returns the 64 bit result in EDX:EAX,
>> the usual return registers for (long long) on IA32.  The executable
>> instructions are RDTSC and RET.
 
(snip)

> But let's think about what actually happens in the processor the "rdtsc" 
> instruction implemented as an external assembly module, and as inline 
> assembly.  Bear with me on the details here - I have never used assembly 
> on the x86, and I haven't tested this code.
 
> Case 1 - external module.
 
> readTimestamp:
>        rdtsc
>        ret
(snip)

> In other words, the call to 
> readTimestamp is a serious block in the flow of the optimiser.
 
> Secondly, consider the processor executing the code.  The function call 
> and the return are non-conditional, so they will be executed early in 
> the instruction pipeline.  But any jump in the instruction flow means a 
> new block of memory needs to be in the cache, with associated risks of 
> cache misses, page misses, etc.

There is another complication, though, relating to pipelining
and RDTSC.  In your example, you want Part A to run before RDTSC,
and Part B to run after.  Without the jump, it is more likely that
the processor will be able to execute RDTSC out of order, such that
the timing doesn't do what you expect.
 
> In other words, it can be quite costly to read the timestamp this way.
 
> With inline assembly, you have just test.c :
 
> static inline uint64_t readTimestamp(void) {
>        uint64_t x;
>        asm (" rdtsc " : "=A" (x) :: );
>        return x;
> }
(snip)

> The use of this inline assembly within the test() function is identical. 
>  But now the compiler knows everything about it - it knows that 
> readTimestamp changes EDX:EAX, but leaves EBX and ECX untouched, and 
> neither reads from nor writes to memory.  This gives it much more 
> flexibility in optimising the code in test().  The generated code will 
> be nothing more than the single "rdtsc" instruction and whatever 
> register movements are needed, and as it is inline there is no change of 
> flow when the processor is executing it.

Most of the time I try to time a whole loop, such that register
use isn't so much of a problem.  Not quite long enough to time
with millisecond TOD clocks, though.   

> For processors with more registers, there is even more to be gained with 
> inline assembly.  On the PPC, for example, I used this inline assembly 
> function recently in code that converted a set of values from 
> little-endian to big-endian:
 
> static inline uint32_t readByteSwapped(const uint32_t * p, uint32_t x) {
>        uint32_t y;
>        asm (" lwbrx %[y], %[x], %[p] " :
>                 [y] "=r" (y) :
>                [x] "r" (x), [p] "r" (p) );
>        return y;
> }
 
> By using inline assembly, the compiler can allocate registers and 
> pipeline the byte-swapped reads optimally.  If I made this an external 
> assembly routine following the C calling conventions, the code would 
> have been ten times slower - using shifts and masks for the conversion 
> would have been faster.

At least for IA32, I have been told that many compilers have
a built-in inlined bswap() such that you don't need to write one.
 
(snip)

> Assembly is /always/ ugly and non-portable.  Being inline or external 
> makes no difference there.

(snip)

-- glen

On 09/11/2010 05:12, glen herrmannsfeldt wrote:
> In comp.dsp David Brown<david.brown@removethisbit.hesbynett.no>  wrote:
> (snip, I wrote)
>>>> Much embedded programming is done in C with inline assembler.
>
>> On 08/11/10 21:30, Vladimir Vassilevsky wrote:
>
>>> Inline assembler is bad style and a characteristic feature of lame
>>> programmers. It combines disadvantages of both C and assembler.
>
>>> If there is a need to use an assembler, make a separate module in
>>> assembler and call it from C as an external function.
>
>> I disagree.  It's a matter of taste and style, of course.  It is also a
>> matter of the application and the target - for DSPs with typical DSP
>> code, assembly can make a much bigger difference over C than for more
>> common microprocessor code.
>
>> Sometimes you can't avoid using assembly.  A typical good reason is
>> because you need access to low-level features that can't be expressed in
>> C (such as registers that are accessible only with special
>> instructions), for a few bits of startup code, as part of an interrupt
>> routine (if your compiler does a poor job), or for handling key parts of
>> an RTOS.
>
> The last few assembly routines I wrote were to use the IA32 RDTSC
> instruction.  It conveniently returns the 64 bit result in EDX:EAX,
> the usual return registers for (long long) on IA32.  The executable
> instructions are RDTSC and RET.
>

On the x86, you have so few registers that the compiler can't help you 
much in the allocation, and many instructions have implicit register 
usage.  It also means that the function call overhead is less, since you 
have data is on the stack anyway (though the actual processor 
implementation may keep copies internally in registers).  So it really 
doesn't matter if you have fixed registers in your assembly code, and 
with a bit of luck the processor will handle the "ret" instruction early 
on in the instruction pipeline.

But let's think about what actually happens in the processor the "rdtsc" 
instruction implemented as an external assembly module, and as inline 
assembly.  Bear with me on the details here - I have never used assembly 
on the x86, and I haven't tested this code.

Case 1 - external module.

In assembly.s, you have:

readTimestamp:
	rdtsc
	ret

In test.c, you have :

extern uint64_t readTimestamp(void);

void test(void) {
	// part A
	uint64_t t = readTimestamp();
	// part B
}

First, consider what the compiler knows.  All it knows about 
readTimestamp is that it follows the C calling convention, and returns a 
value in EDX:EAX.  It must assume that the function may change any 
volatile registers (according to the standard x86 ABI), and it may read 
or write any memory.  This means any values held in local registers in 
part A, such as loop counters, pointers, etc., must be preserved on the 
stack before calling readTimestamp(), and restored afterwards. 
Similarly, any outstanding memory writes must be done, and in part B any 
values from memory must be re-read.  In other words, the call to 
readTimestamp is a serious block in the flow of the optimiser.

Secondly, consider the processor executing the code.  The function call 
and the return are non-conditional, so they will be executed early in 
the instruction pipeline.  But any jump in the instruction flow means a 
new block of memory needs to be in the cache, with associated risks of 
cache misses, page misses, etc.

In other words, it can be quite costly to read the timestamp this way.

With inline assembly, you have just test.c :

static inline uint64_t readTimestamp(void) {
	uint64_t x;
	asm (" rdtsc " : "=A" (x) :: );
	return x;
}

void test(void) {
	// part A
	uint64_t t = readTimestamp();
	// part B
}

The use of this inline assembly within the test() function is identical. 
  But now the compiler knows everything about it - it knows that 
readTimestamp changes EDX:EAX, but leaves EBX and ECX untouched, and 
neither reads from nor writes to memory.  This gives it much more 
flexibility in optimising the code in test().  The generated code will 
be nothing more than the single "rdtsc" instruction and whatever 
register movements are needed, and as it is inline there is no change of 
flow when the processor is executing it.

For processors with more registers, there is even more to be gained with 
inline assembly.  On the PPC, for example, I used this inline assembly 
function recently in code that converted a set of values from 
little-endian to big-endian:

static inline uint32_t readByteSwapped(const uint32_t * p, uint32_t x) {
	uint32_t y;
	asm (" lwbrx %[y], %[x], %[p] " :
		 [y] "=r" (y) :
		[x] "r" (x), [p] "r" (p) );
	return y;
}

By using inline assembly, the compiler can allocate registers and 
pipeline the byte-swapped reads optimally.  If I made this an external 
assembly routine following the C calling conventions, the code would 
have been ten times slower - using shifts and masks for the conversion 
would have been faster.

>> Typical bad reasons is for speed-optimising code.  Sometimes it /is/
>> important to get the best possible speed out of the system within a
>> small section of the code.  You may also be writing library code or
>> other heavily-used code, where it can be worth the effort.  But in a
>> great many cases when people think they need to write code in assembly
>> for speed, they are wrong - either it is not worth the cost (in terms of
>> development time, maintainability, readability, correctness, robustness,
>> portability, etc.), or the C compiler will actually do a good job if the
>> programmer just learned to use it properly.
>
> Yes, those are the cases where inline assembler works best, but
> also, as mentioned, is ugly, non-portable, etc.  You can write
> just the inner loop, maybe only a few instructions, in assembler
> with the rest in C.
>

Assembly is /always/ ugly and non-portable.  Being inline or external 
makes no difference there.

>> So all in all, for most targets and most applications, you don't often
>> need assembly when using modern tools.  Since well-written high-level
>> code is /generally/ clearer and more portable than even well-written
>> assembly (though you can write bad code in any language), the preference
>> should always be for using high level coding unless you have overriding
>> reasons for using assembly.
>
>> Then you have the choice - separate assembly modules, or inline assembly.
>
>> If you are writing large sections of assembly code, then assembly
>> modules makes sense.  It is clearer to stick to a single language at a
>> time, and use tools suitable for that language, and large "asm(...)"
>> statements are as messy as large multi-line pre-processor macros.
>
> Last I wrote inline assembler was in Dynamic C for the Rabbit 3000,
> which is pretty much a Z180.  Unlike the Z80, code has a 24 bit
> addressing space, with special call and ret instructions to change
> the high bits.  With inline assembler, the C compiler keeps track
> of the addressing.
>
>> But if you are mixing C and assembly, and want to have minimal assembly,
>> then inline assembly is the way to go, especially if you have good
>> tools.  There are some C compilers that can't deal well with inline
>> assembly - they insist on it being restricted to "assembly functions"
>> only, or they turn off all optimisations for functions that use inline
>> assembly.
>
> For the Z180 there isn't much optimization.
>

The only cure for a poor compiler is to get a better one, if there is 
one available.

>> Other compilers work very well with inline assembly - the
>> compiler will handle register and/or stack allocation, and happily
>> include the inline assembly in its optimisation flow.  If you are using
>> such a compiler (gcc is a well-known example, but there are commercial
>> tools that work well too), then you will probably get the smallest and
>> fastest code by using inline assembly and letting the compiler do its
>> job.  The actual assembly code itself can often be tucked away in
>> "static inline" functions.
>
> (snip)
>
>> If you need to write code requiring a lot of register tracking, then I
>> can see that inline assembly will be messy.  But for a lot of uses, when
>> done properly (both by the user, and by the toolchain vendor), inline
>> assembly is much clearer and simpler than external assembly modules, and
>> results in smaller and faster code.
>
> For the Z180, there aren't many registers, and some have special
> uses, so there really isn't much of a problem with registers tracking.
>
> -- glen
>

In comp.dsp David Brown <david.brown@removethisbit.hesbynett.no> wrote:
(snip, I wrote)
>>> Much embedded programming is done in C with inline assembler.

> On 08/11/10 21:30, Vladimir Vassilevsky wrote:

>> Inline assembler is bad style and a characteristic feature of lame
>> programmers. It combines disadvantages of both C and assembler.

>> If there is a need to use an assembler, make a separate module in
>> assembler and call it from C as an external function.

> I disagree.  It's a matter of taste and style, of course.  It is also a 
> matter of the application and the target - for DSPs with typical DSP 
> code, assembly can make a much bigger difference over C than for more 
> common microprocessor code.
 
> Sometimes you can't avoid using assembly.  A typical good reason is 
> because you need access to low-level features that can't be expressed in 
> C (such as registers that are accessible only with special 
> instructions), for a few bits of startup code, as part of an interrupt 
> routine (if your compiler does a poor job), or for handling key parts of 
> an RTOS.

The last few assembly routines I wrote were to use the IA32 RDTSC
instruction.  It conveniently returns the 64 bit result in EDX:EAX,
the usual return registers for (long long) on IA32.  The executable
instructions are RDTSC and RET.
 
> Typical bad reasons is for speed-optimising code.  Sometimes it /is/ 
> important to get the best possible speed out of the system within a 
> small section of the code.  You may also be writing library code or 
> other heavily-used code, where it can be worth the effort.  But in a 
> great many cases when people think they need to write code in assembly 
> for speed, they are wrong - either it is not worth the cost (in terms of 
> development time, maintainability, readability, correctness, robustness, 
> portability, etc.), or the C compiler will actually do a good job if the 
> programmer just learned to use it properly.

Yes, those are the cases where inline assembler works best, but
also, as mentioned, is ugly, non-portable, etc.  You can write
just the inner loop, maybe only a few instructions, in assembler
with the rest in C.
 
> So all in all, for most targets and most applications, you don't often 
> need assembly when using modern tools.  Since well-written high-level 
> code is /generally/ clearer and more portable than even well-written 
> assembly (though you can write bad code in any language), the preference 
> should always be for using high level coding unless you have overriding 
> reasons for using assembly.
 
> Then you have the choice - separate assembly modules, or inline assembly.
 
> If you are writing large sections of assembly code, then assembly 
> modules makes sense.  It is clearer to stick to a single language at a 
> time, and use tools suitable for that language, and large "asm(...)" 
> statements are as messy as large multi-line pre-processor macros.

Last I wrote inline assembler was in Dynamic C for the Rabbit 3000,
which is pretty much a Z180.  Unlike the Z80, code has a 24 bit
addressing space, with special call and ret instructions to change
the high bits.  With inline assembler, the C compiler keeps track
of the addressing.  
 
> But if you are mixing C and assembly, and want to have minimal assembly, 
> then inline assembly is the way to go, especially if you have good 
> tools.  There are some C compilers that can't deal well with inline 
> assembly - they insist on it being restricted to "assembly functions" 
> only, or they turn off all optimisations for functions that use inline 
> assembly.  

For the Z180 there isn't much optimization.

> Other compilers work very well with inline assembly - the 
> compiler will handle register and/or stack allocation, and happily 
> include the inline assembly in its optimisation flow.  If you are using 
> such a compiler (gcc is a well-known example, but there are commercial 
> tools that work well too), then you will probably get the smallest and 
> fastest code by using inline assembly and letting the compiler do its 
> job.  The actual assembly code itself can often be tucked away in 
> "static inline" functions.
 
(snip)

> If you need to write code requiring a lot of register tracking, then I 
> can see that inline assembly will be messy.  But for a lot of uses, when 
> done properly (both by the user, and by the toolchain vendor), inline 
> assembly is much clearer and simpler than external assembly modules, and 
> results in smaller and faster code.

For the Z180, there aren't many registers, and some have special
uses, so there really isn't much of a problem with registers tracking.

-- glen

Randy Yates <yates@ieee.org> writes:

> Randy Yates <yates@ieee.org> writes:
>
>> Vladimir Vassilevsky <nospam@nowhere.com> writes:
>>
>>> glen herrmannsfeldt wrote:
>>>
>>>
>>>> Much embedded programming is done in C with inline assembler.
>>>
>>> Inline assembler is bad style and a characteristic feature of lame
>>> programmers. It combines disadvantages of both C and assembler.
>>>
>>> If there is a need to use an assembler, make a separate module in
>>> assembler and call it from C as an external function.
>>
>> Amen, brother! My philosophy exactly! 
>
> PS: I highly recommand yasm for x86 assembly. 
>
>   http://www.tortall.net/projects/yasm/

And yes, I do realize this was about the ADSP2187, but I enjoyed
yasm so much I want to contribute to their advertising campaign.
-- 
Randy Yates                      % "Maybe one day I'll feel her cold embrace,
Digital Signal Labs              %                    and kiss her interface, 
mailto://yates@ieee.org          %            til then, I'll leave her alone."
http://www.digitalsignallabs.com %        'Yours Truly, 2095', *Time*, ELO

Randy Yates <yates@ieee.org> writes:

> Vladimir Vassilevsky <nospam@nowhere.com> writes:
>
>> glen herrmannsfeldt wrote:
>>
>>
>>> Much embedded programming is done in C with inline assembler.
>>
>> Inline assembler is bad style and a characteristic feature of lame
>> programmers. It combines disadvantages of both C and assembler.
>>
>> If there is a need to use an assembler, make a separate module in
>> assembler and call it from C as an external function.
>
> Amen, brother! My philosophy exactly! 

PS: I highly recommand yasm for x86 assembly. 

  http://www.tortall.net/projects/yasm/

--Randy

-- 
Randy Yates                      % "...the answer lies within your soul
Digital Signal Labs              %       'cause no one knows which side
mailto://yates@ieee.org          %                   the coin will fall."
http://www.digitalsignallabs.com %  'Big Wheels', *Out of the Blue*, ELO

On 08/11/10 21:30, Vladimir Vassilevsky wrote:
>
>
> glen herrmannsfeldt wrote:
>
>
>> Much embedded programming is done in C with inline assembler.
>
> Inline assembler is bad style and a characteristic feature of lame
> programmers. It combines disadvantages of both C and assembler.
>
> If there is a need to use an assembler, make a separate module in
> assembler and call it from C as an external function.
>

I disagree.  It's a matter of taste and style, of course.  It is also a 
matter of the application and the target - for DSPs with typical DSP 
code, assembly can make a much bigger difference over C than for more 
common microprocessor code.

Sometimes you can't avoid using assembly.  A typical good reason is 
because you need access to low-level features that can't be expressed in 
C (such as registers that are accessible only with special 
instructions), for a few bits of startup code, as part of an interrupt 
routine (if your compiler does a poor job), or for handling key parts of 
an RTOS.

Typical bad reasons is for speed-optimising code.  Sometimes it /is/ 
important to get the best possible speed out of the system within a 
small section of the code.  You may also be writing library code or 
other heavily-used code, where it can be worth the effort.  But in a 
great many cases when people think they need to write code in assembly 
for speed, they are wrong - either it is not worth the cost (in terms of 
development time, maintainability, readability, correctness, robustness, 
portability, etc.), or the C compiler will actually do a good job if the 
programmer just learned to use it properly.

So all in all, for most targets and most applications, you don't often 
need assembly when using modern tools.  Since well-written high-level 
code is /generally/ clearer and more portable than even well-written 
assembly (though you can write bad code in any language), the preference 
should always be for using high level coding unless you have overriding 
reasons for using assembly.

Then you have the choice - separate assembly modules, or inline assembly.

If you are writing large sections of assembly code, then assembly 
modules makes sense.  It is clearer to stick to a single language at a 
time, and use tools suitable for that language, and large "asm(...)" 
statements are as messy as large multi-line pre-processor macros.

But if you are mixing C and assembly, and want to have minimal assembly, 
then inline assembly is the way to go, especially if you have good 
tools.  There are some C compilers that can't deal well with inline 
assembly - they insist on it being restricted to "assembly functions" 
only, or they turn off all optimisations for functions that use inline 
assembly.  Other compilers work very well with inline assembly - the 
compiler will handle register and/or stack allocation, and happily 
include the inline assembly in its optimisation flow.  If you are using 
such a compiler (gcc is a well-known example, but there are commercial 
tools that work well too), then you will probably get the smallest and 
fastest code by using inline assembly and letting the compiler do its 
job.  The actual assembly code itself can often be tucked away in 
"static inline" functions.

Inline assembly lets you mix assembly with the C to get the /best/ of 
both worlds.

As an example, I once re-wrote the C startup code used by a particular 
compiler, so that the startup code was in C instead of the original 
assembler.  The original assembler code was quite well-written and 
clear, but it is difficult to write assembly that is general, clear, and 
efficient.  You often can't write the code to take advantage of 
particular circumstances (such optimising based on the values of 
compile-time constants) without it being messy and full of conditional 
assembly.  But the C compiler will do such optimisations fine.  So my C 
code, along with a couple of lines of inline assembly to set the stack 
pointer, was much smaller and clearer in source code, and the target 
code was significantly smaller and faster.  The code couldn't have been 
written in pure C, and the mix with inline assembler was a big 
improvement over the external assembly module.

>> With some compilers, you can switch back and forth in the middle
>> of a function. Depending or the processor, keeping track of
>> which registers to use may or may not be a problem, but in most
>> cases it can be done and result in fast code.
>
> The result is incomprehensible, unalterable, undebuggable and unportable
> write-only code.
>

If you need to write code requiring a lot of register tracking, then I 
can see that inline assembly will be messy.  But for a lot of uses, when 
done properly (both by the user, and by the toolchain vendor), inline 
assembly is much clearer and simpler than external assembly modules, and 
results in smaller and faster code.