Reply by Al Clark November 11, 20102010-11-11
Vladimir Vassilevsky <nospam@nowhere.com> wrote in 
news:7oSdncAnWruvTU_RnZ2dnUVZ_sadnZ2d@giganews.com:

> > > Alessandro Basili wrote: > >> Hi, >> I have a custom designed board which is based on an ADSP2187L dsp, with >> a 512KB FLASH, a 128KB SRAM (20 ns cycle) and an FPGA running at 50 MHz. > > > ADSP218x is a dinosaur with the address space of 16K words; what is all > that external memory for? This DSP is inconvenient for C programming as > it doesn't support stack frames and base-index addressing. > > >> The dsp task is mainly processing commands and data from external links >> (space-wire) on a 200~500 us cycle. >> Since the actual software has been written in assembler and there are >> some concerns about its reliability, I was wondering whether it is worth >> while considering the possibility to move to a higher language like C, >> in order to gain in maintainability and readability without loosing too >> much on performances. > > Yes, you can program ADSP218x in C; VDSP toolset recommended. > Yes, there will be significant overhead. > It is impossible to tell if there is enough of speed without actually > knowing the application. > Why getting stuck with the 20 year old technology? > > > Vladimir Vassilevsky > DSP and Mixed Signal Design Consultant > http://www.abvolt.com
As someone who has written a considerable amount of 218x code, the short answer is FORGET USING THE C COMPILER! The ADSP-218x architecture predates the emergence of C as a universal language. The registers are not orthogonal, the number of pointers is limited, etc. The good news is that 218x assembly is very easy to read and understand and looks more like C than traditional assembly. It is not a difficult processor. I agree with Vladimir that it is essentially obsolete. The only reason I see to use it is if you have a large existing code base. You could use a new ADSP-21489 or ADSP-21479 fixed/floating point DSP for less money, easier programing in C or assembly, and substantially more performance. I mention SHARC because the assembly language has a similar, but actually easier structure and could be ported to C as desired. You could also use any number of Blackfin targets. These are just the ADI choices. There are many other possibilities as well. Al Clark www.danvillesignal.com
Reply by Ala November 11, 20102010-11-11
"Randy Yates" <yates@ieee.org> wrote in message 
news:m3pquf5q8m.fsf@ieee.org...
> Randy Yates <yates@ieee.org> writes: > >> Vladimir Vassilevsky <nospam@nowhere.com> writes: >> >>> glen herrmannsfeldt wrote: >>> >>> >>>> Much embedded programming is done in C with inline assembler. >>> >>> Inline assembler is bad style and a characteristic feature of lame >>> programmers. It combines disadvantages of both C and assembler. >>> >>> If there is a need to use an assembler, make a separate module in >>> assembler and call it from C as an external function. >> >> Amen, brother! My philosophy exactly! > > PS: I highly recommand yasm for x86 assembly. > > http://www.tortall.net/projects/yasm/ >
thanks. I would like to borrow one arsonist.
Reply by glen herrmannsfeldt November 10, 20102010-11-10
In comp.dsp David Brown <david@westcontrol.removethisbit.com> wrote:
(snip regarding inline or external assembly code using RDTSC)

>> There is another complication, though, relating to pipelining >> and RDTSC. In your example, you want Part A to run before RDTSC, >> and Part B to run after. Without the jump, it is more likely that >> the processor will be able to execute RDTSC out of order, such that >> the timing doesn't do what you expect.
> That is partially true. As you say yourself, "it is more likely" - > forcing a jump makes it more likely that part A runs before RDTSC, and > part B runs after it. But it doesn't guarantee it.
Yes. There could also be a task switch in between, which should result in a very large increment in the count. I have never seen that in my uses of RDTSC. I once even did it from Java using JNI and it seemed to work just fine.
> This is a common problem in situations when you need assembly rather > than C, and it is something that a lot of people have trouble > understanding. It is a common mistake to think that "volatile" can be > used to get it right - without realising that the compiler can do a lot > of re-ordering of non-volatile instructions and accesses over and around > the volatile ones.
(snip)
> The compiler feature you need is a "memory barrier", that tells the > compiler to complete any outstanding calculations and stores, and that > code afterwards must re-read from memory (and thus can't be executed > before the memory barrier). With gcc, the simplest memory barrier is a > "memory clobber" inline assembly - asm volatile ("" ::: "memory"). > Other compilers have different methods, and if you are using an OS that > provides barriers, then use them.
For IA32 the closest I find is SFENCE, Store Fence, which guarantees that all previous stores are complete before following stores are done. (My words after reading the description.) (snip)
>> Most of the time I try to time a whole loop, such that register >> use isn't so much of a problem. Not quite long enough to time >> with millisecond TOD clocks, though.
> Accurate measurement of timing on a processor as big and complex as > modern x86's is far from easy - it is always going to be inaccurate, and > a rough average over time is the best you can get.
Usually averaged over a long loop it is close enough, at least for a specific generation of processors. Also, RDTSC reduces the problem of variable clock rates, counting clock cycles instead of real time.
> But it's a fair point - there is no need to make your code smaller or > faster than it actually needs to be. Premature optimisation is the root > of all evil, after all.
Yes, I usually don't do that until it really needs to be done. -- glen
Reply by David Brown November 10, 20102010-11-10
On 10/11/2010 00:10, glen herrmannsfeldt wrote:
> In comp.dsp David Brown<david@westcontrol.removethisbit.com> wrote: > (snip, I wrote) > >>> The last few assembly routines I wrote were to use the IA32 RDTSC >>> instruction. It conveniently returns the 64 bit result in EDX:EAX, >>> the usual return registers for (long long) on IA32. The executable >>> instructions are RDTSC and RET. > > (snip) > >> But let's think about what actually happens in the processor the "rdtsc" >> instruction implemented as an external assembly module, and as inline >> assembly. Bear with me on the details here - I have never used assembly >> on the x86, and I haven't tested this code. > >> Case 1 - external module. > >> readTimestamp: >> rdtsc >> ret > (snip) > >> In other words, the call to >> readTimestamp is a serious block in the flow of the optimiser. > >> Secondly, consider the processor executing the code. The function call >> and the return are non-conditional, so they will be executed early in >> the instruction pipeline. But any jump in the instruction flow means a >> new block of memory needs to be in the cache, with associated risks of >> cache misses, page misses, etc. > > There is another complication, though, relating to pipelining > and RDTSC. In your example, you want Part A to run before RDTSC, > and Part B to run after. Without the jump, it is more likely that > the processor will be able to execute RDTSC out of order, such that > the timing doesn't do what you expect. >
That is partially true. As you say yourself, "it is more likely" - forcing a jump makes it more likely that part A runs before RDTSC, and part B runs after it. But it doesn't guarantee it. This is a common problem in situations when you need assembly rather than C, and it is something that a lot of people have trouble understanding. It is a common mistake to think that "volatile" can be used to get it right - without realising that the compiler can do a lot of re-ordering of non-volatile instructions and accesses over and around the volatile ones. Another common mistake is to think you can write it in assembly, and that tricks like forcing an unnecessary jump or call will ensure things are executed in the right order - without realising that the processor can do substantial re-ordering. In fact, you need both parts - you need compiler-specific functionality to tell the compiler to keep part A and part B separate, and you need target-specific functionality to tell the processor to stall until in-flight instructions are completed. The compiler feature you need is a "memory barrier", that tells the compiler to complete any outstanding calculations and stores, and that code afterwards must re-read from memory (and thus can't be executed before the memory barrier). With gcc, the simplest memory barrier is a "memory clobber" inline assembly - asm volatile ("" ::: "memory"). Other compilers have different methods, and if you are using an OS that provides barriers, then use them. The cpu feature will be some sort of "sync" instruction. I don't know what that might be on the x86 - it might be quite complicated depending on the exact effect you are trying to achieve. Again, if your OS or compiler has such functions, use them. (They will typically be implemented as macros and inline assembly.) But in any case, a method that is "more likely to work" is not a solution.
>> In other words, it can be quite costly to read the timestamp this way. > >> With inline assembly, you have just test.c : > >> static inline uint64_t readTimestamp(void) { >> uint64_t x; >> asm (" rdtsc " : "=A" (x) :: ); >> return x; >> } > (snip) > >> The use of this inline assembly within the test() function is identical. >> But now the compiler knows everything about it - it knows that >> readTimestamp changes EDX:EAX, but leaves EBX and ECX untouched, and >> neither reads from nor writes to memory. This gives it much more >> flexibility in optimising the code in test(). The generated code will >> be nothing more than the single "rdtsc" instruction and whatever >> register movements are needed, and as it is inline there is no change of >> flow when the processor is executing it. > > Most of the time I try to time a whole loop, such that register > use isn't so much of a problem. Not quite long enough to time > with millisecond TOD clocks, though. >
Accurate measurement of timing on a processor as big and complex as modern x86's is far from easy - it is always going to be inaccurate, and a rough average over time is the best you can get. But it's a fair point - there is no need to make your code smaller or faster than it actually needs to be. Premature optimisation is the root of all evil, after all. On the other hand, there is no point in making your code needlessly bigger and slower, especially when better alternatives are easier to write.
>> For processors with more registers, there is even more to be gained with >> inline assembly. On the PPC, for example, I used this inline assembly >> function recently in code that converted a set of values from >> little-endian to big-endian: > >> static inline uint32_t readByteSwapped(const uint32_t * p, uint32_t x) { >> uint32_t y; >> asm (" lwbrx %[y], %[x], %[p] " : >> [y] "=r" (y) : >> [x] "r" (x), [p] "r" (p) ); >> return y; >> } > >> By using inline assembly, the compiler can allocate registers and >> pipeline the byte-swapped reads optimally. If I made this an external >> assembly routine following the C calling conventions, the code would >> have been ten times slower - using shifts and masks for the conversion >> would have been faster. > > At least for IA32, I have been told that many compilers have > a built-in inlined bswap() such that you don't need to write one. >
Different processors and different compilers have different features - this code was specific for the ppc using Code Warrior (it would also work with gcc). If I were using Diab Data's compiler (I can't remember what they are called now), I'd use it's extensions that let you declare data as big or little-endian explicitly, and let the compiler generate the best code.
> (snip) > >> Assembly is /always/ ugly and non-portable. Being inline or external >> makes no difference there. > > (snip) > > -- glen
Reply by glen herrmannsfeldt November 9, 20102010-11-09
In comp.dsp David Brown <david@westcontrol.removethisbit.com> wrote:
(snip, I wrote)

>> The last few assembly routines I wrote were to use the IA32 RDTSC >> instruction. It conveniently returns the 64 bit result in EDX:EAX, >> the usual return registers for (long long) on IA32. The executable >> instructions are RDTSC and RET.
(snip)
> But let's think about what actually happens in the processor the "rdtsc" > instruction implemented as an external assembly module, and as inline > assembly. Bear with me on the details here - I have never used assembly > on the x86, and I haven't tested this code.
> Case 1 - external module.
> readTimestamp: > rdtsc > ret
(snip)
> In other words, the call to > readTimestamp is a serious block in the flow of the optimiser.
> Secondly, consider the processor executing the code. The function call > and the return are non-conditional, so they will be executed early in > the instruction pipeline. But any jump in the instruction flow means a > new block of memory needs to be in the cache, with associated risks of > cache misses, page misses, etc.
There is another complication, though, relating to pipelining and RDTSC. In your example, you want Part A to run before RDTSC, and Part B to run after. Without the jump, it is more likely that the processor will be able to execute RDTSC out of order, such that the timing doesn't do what you expect.
> In other words, it can be quite costly to read the timestamp this way.
> With inline assembly, you have just test.c :
> static inline uint64_t readTimestamp(void) { > uint64_t x; > asm (" rdtsc " : "=A" (x) :: ); > return x; > }
(snip)
> The use of this inline assembly within the test() function is identical. > But now the compiler knows everything about it - it knows that > readTimestamp changes EDX:EAX, but leaves EBX and ECX untouched, and > neither reads from nor writes to memory. This gives it much more > flexibility in optimising the code in test(). The generated code will > be nothing more than the single "rdtsc" instruction and whatever > register movements are needed, and as it is inline there is no change of > flow when the processor is executing it.
Most of the time I try to time a whole loop, such that register use isn't so much of a problem. Not quite long enough to time with millisecond TOD clocks, though.
> For processors with more registers, there is even more to be gained with > inline assembly. On the PPC, for example, I used this inline assembly > function recently in code that converted a set of values from > little-endian to big-endian:
> static inline uint32_t readByteSwapped(const uint32_t * p, uint32_t x) { > uint32_t y; > asm (" lwbrx %[y], %[x], %[p] " : > [y] "=r" (y) : > [x] "r" (x), [p] "r" (p) ); > return y; > }
> By using inline assembly, the compiler can allocate registers and > pipeline the byte-swapped reads optimally. If I made this an external > assembly routine following the C calling conventions, the code would > have been ten times slower - using shifts and masks for the conversion > would have been faster.
At least for IA32, I have been told that many compilers have a built-in inlined bswap() such that you don't need to write one. (snip)
> Assembly is /always/ ugly and non-portable. Being inline or external > makes no difference there.
(snip) -- glen
Reply by David Brown November 9, 20102010-11-09
On 09/11/2010 05:12, glen herrmannsfeldt wrote:
> In comp.dsp David Brown<david.brown@removethisbit.hesbynett.no> wrote: > (snip, I wrote) >>>> Much embedded programming is done in C with inline assembler. > >> On 08/11/10 21:30, Vladimir Vassilevsky wrote: > >>> Inline assembler is bad style and a characteristic feature of lame >>> programmers. It combines disadvantages of both C and assembler. > >>> If there is a need to use an assembler, make a separate module in >>> assembler and call it from C as an external function. > >> I disagree. It's a matter of taste and style, of course. It is also a >> matter of the application and the target - for DSPs with typical DSP >> code, assembly can make a much bigger difference over C than for more >> common microprocessor code. > >> Sometimes you can't avoid using assembly. A typical good reason is >> because you need access to low-level features that can't be expressed in >> C (such as registers that are accessible only with special >> instructions), for a few bits of startup code, as part of an interrupt >> routine (if your compiler does a poor job), or for handling key parts of >> an RTOS. > > The last few assembly routines I wrote were to use the IA32 RDTSC > instruction. It conveniently returns the 64 bit result in EDX:EAX, > the usual return registers for (long long) on IA32. The executable > instructions are RDTSC and RET. >
On the x86, you have so few registers that the compiler can't help you much in the allocation, and many instructions have implicit register usage. It also means that the function call overhead is less, since you have data is on the stack anyway (though the actual processor implementation may keep copies internally in registers). So it really doesn't matter if you have fixed registers in your assembly code, and with a bit of luck the processor will handle the "ret" instruction early on in the instruction pipeline. But let's think about what actually happens in the processor the "rdtsc" instruction implemented as an external assembly module, and as inline assembly. Bear with me on the details here - I have never used assembly on the x86, and I haven't tested this code. Case 1 - external module. In assembly.s, you have: readTimestamp: rdtsc ret In test.c, you have : extern uint64_t readTimestamp(void); void test(void) { // part A uint64_t t = readTimestamp(); // part B } First, consider what the compiler knows. All it knows about readTimestamp is that it follows the C calling convention, and returns a value in EDX:EAX. It must assume that the function may change any volatile registers (according to the standard x86 ABI), and it may read or write any memory. This means any values held in local registers in part A, such as loop counters, pointers, etc., must be preserved on the stack before calling readTimestamp(), and restored afterwards. Similarly, any outstanding memory writes must be done, and in part B any values from memory must be re-read. In other words, the call to readTimestamp is a serious block in the flow of the optimiser. Secondly, consider the processor executing the code. The function call and the return are non-conditional, so they will be executed early in the instruction pipeline. But any jump in the instruction flow means a new block of memory needs to be in the cache, with associated risks of cache misses, page misses, etc. In other words, it can be quite costly to read the timestamp this way. With inline assembly, you have just test.c : static inline uint64_t readTimestamp(void) { uint64_t x; asm (" rdtsc " : "=A" (x) :: ); return x; } void test(void) { // part A uint64_t t = readTimestamp(); // part B } The use of this inline assembly within the test() function is identical. But now the compiler knows everything about it - it knows that readTimestamp changes EDX:EAX, but leaves EBX and ECX untouched, and neither reads from nor writes to memory. This gives it much more flexibility in optimising the code in test(). The generated code will be nothing more than the single "rdtsc" instruction and whatever register movements are needed, and as it is inline there is no change of flow when the processor is executing it. For processors with more registers, there is even more to be gained with inline assembly. On the PPC, for example, I used this inline assembly function recently in code that converted a set of values from little-endian to big-endian: static inline uint32_t readByteSwapped(const uint32_t * p, uint32_t x) { uint32_t y; asm (" lwbrx %[y], %[x], %[p] " : [y] "=r" (y) : [x] "r" (x), [p] "r" (p) ); return y; } By using inline assembly, the compiler can allocate registers and pipeline the byte-swapped reads optimally. If I made this an external assembly routine following the C calling conventions, the code would have been ten times slower - using shifts and masks for the conversion would have been faster.
>> Typical bad reasons is for speed-optimising code. Sometimes it /is/ >> important to get the best possible speed out of the system within a >> small section of the code. You may also be writing library code or >> other heavily-used code, where it can be worth the effort. But in a >> great many cases when people think they need to write code in assembly >> for speed, they are wrong - either it is not worth the cost (in terms of >> development time, maintainability, readability, correctness, robustness, >> portability, etc.), or the C compiler will actually do a good job if the >> programmer just learned to use it properly. > > Yes, those are the cases where inline assembler works best, but > also, as mentioned, is ugly, non-portable, etc. You can write > just the inner loop, maybe only a few instructions, in assembler > with the rest in C. >
Assembly is /always/ ugly and non-portable. Being inline or external makes no difference there.
>> So all in all, for most targets and most applications, you don't often >> need assembly when using modern tools. Since well-written high-level >> code is /generally/ clearer and more portable than even well-written >> assembly (though you can write bad code in any language), the preference >> should always be for using high level coding unless you have overriding >> reasons for using assembly. > >> Then you have the choice - separate assembly modules, or inline assembly. > >> If you are writing large sections of assembly code, then assembly >> modules makes sense. It is clearer to stick to a single language at a >> time, and use tools suitable for that language, and large "asm(...)" >> statements are as messy as large multi-line pre-processor macros. > > Last I wrote inline assembler was in Dynamic C for the Rabbit 3000, > which is pretty much a Z180. Unlike the Z80, code has a 24 bit > addressing space, with special call and ret instructions to change > the high bits. With inline assembler, the C compiler keeps track > of the addressing. > >> But if you are mixing C and assembly, and want to have minimal assembly, >> then inline assembly is the way to go, especially if you have good >> tools. There are some C compilers that can't deal well with inline >> assembly - they insist on it being restricted to "assembly functions" >> only, or they turn off all optimisations for functions that use inline >> assembly. > > For the Z180 there isn't much optimization. >
The only cure for a poor compiler is to get a better one, if there is one available.
>> Other compilers work very well with inline assembly - the >> compiler will handle register and/or stack allocation, and happily >> include the inline assembly in its optimisation flow. If you are using >> such a compiler (gcc is a well-known example, but there are commercial >> tools that work well too), then you will probably get the smallest and >> fastest code by using inline assembly and letting the compiler do its >> job. The actual assembly code itself can often be tucked away in >> "static inline" functions. > > (snip) > >> If you need to write code requiring a lot of register tracking, then I >> can see that inline assembly will be messy. But for a lot of uses, when >> done properly (both by the user, and by the toolchain vendor), inline >> assembly is much clearer and simpler than external assembly modules, and >> results in smaller and faster code. > > For the Z180, there aren't many registers, and some have special > uses, so there really isn't much of a problem with registers tracking. > > -- glen >
Reply by glen herrmannsfeldt November 9, 20102010-11-09
In comp.dsp David Brown <david.brown@removethisbit.hesbynett.no> wrote:
(snip, I wrote)
>>> Much embedded programming is done in C with inline assembler.
> On 08/11/10 21:30, Vladimir Vassilevsky wrote:
>> Inline assembler is bad style and a characteristic feature of lame >> programmers. It combines disadvantages of both C and assembler.
>> If there is a need to use an assembler, make a separate module in >> assembler and call it from C as an external function.
> I disagree. It's a matter of taste and style, of course. It is also a > matter of the application and the target - for DSPs with typical DSP > code, assembly can make a much bigger difference over C than for more > common microprocessor code.
> Sometimes you can't avoid using assembly. A typical good reason is > because you need access to low-level features that can't be expressed in > C (such as registers that are accessible only with special > instructions), for a few bits of startup code, as part of an interrupt > routine (if your compiler does a poor job), or for handling key parts of > an RTOS.
The last few assembly routines I wrote were to use the IA32 RDTSC instruction. It conveniently returns the 64 bit result in EDX:EAX, the usual return registers for (long long) on IA32. The executable instructions are RDTSC and RET.
> Typical bad reasons is for speed-optimising code. Sometimes it /is/ > important to get the best possible speed out of the system within a > small section of the code. You may also be writing library code or > other heavily-used code, where it can be worth the effort. But in a > great many cases when people think they need to write code in assembly > for speed, they are wrong - either it is not worth the cost (in terms of > development time, maintainability, readability, correctness, robustness, > portability, etc.), or the C compiler will actually do a good job if the > programmer just learned to use it properly.
Yes, those are the cases where inline assembler works best, but also, as mentioned, is ugly, non-portable, etc. You can write just the inner loop, maybe only a few instructions, in assembler with the rest in C.
> So all in all, for most targets and most applications, you don't often > need assembly when using modern tools. Since well-written high-level > code is /generally/ clearer and more portable than even well-written > assembly (though you can write bad code in any language), the preference > should always be for using high level coding unless you have overriding > reasons for using assembly.
> Then you have the choice - separate assembly modules, or inline assembly.
> If you are writing large sections of assembly code, then assembly > modules makes sense. It is clearer to stick to a single language at a > time, and use tools suitable for that language, and large "asm(...)" > statements are as messy as large multi-line pre-processor macros.
Last I wrote inline assembler was in Dynamic C for the Rabbit 3000, which is pretty much a Z180. Unlike the Z80, code has a 24 bit addressing space, with special call and ret instructions to change the high bits. With inline assembler, the C compiler keeps track of the addressing.
> But if you are mixing C and assembly, and want to have minimal assembly, > then inline assembly is the way to go, especially if you have good > tools. There are some C compilers that can't deal well with inline > assembly - they insist on it being restricted to "assembly functions" > only, or they turn off all optimisations for functions that use inline > assembly.
For the Z180 there isn't much optimization.
> Other compilers work very well with inline assembly - the > compiler will handle register and/or stack allocation, and happily > include the inline assembly in its optimisation flow. If you are using > such a compiler (gcc is a well-known example, but there are commercial > tools that work well too), then you will probably get the smallest and > fastest code by using inline assembly and letting the compiler do its > job. The actual assembly code itself can often be tucked away in > "static inline" functions.
(snip)
> If you need to write code requiring a lot of register tracking, then I > can see that inline assembly will be messy. But for a lot of uses, when > done properly (both by the user, and by the toolchain vendor), inline > assembly is much clearer and simpler than external assembly modules, and > results in smaller and faster code.
For the Z180, there aren't many registers, and some have special uses, so there really isn't much of a problem with registers tracking. -- glen
Reply by Randy Yates November 8, 20102010-11-08
Randy Yates <yates@ieee.org> writes:

> Randy Yates <yates@ieee.org> writes: > >> Vladimir Vassilevsky <nospam@nowhere.com> writes: >> >>> glen herrmannsfeldt wrote: >>> >>> >>>> Much embedded programming is done in C with inline assembler. >>> >>> Inline assembler is bad style and a characteristic feature of lame >>> programmers. It combines disadvantages of both C and assembler. >>> >>> If there is a need to use an assembler, make a separate module in >>> assembler and call it from C as an external function. >> >> Amen, brother! My philosophy exactly! > > PS: I highly recommand yasm for x86 assembly. > > http://www.tortall.net/projects/yasm/
And yes, I do realize this was about the ADSP2187, but I enjoyed yasm so much I want to contribute to their advertising campaign. -- Randy Yates % "Maybe one day I'll feel her cold embrace, Digital Signal Labs % and kiss her interface, mailto://yates@ieee.org % til then, I'll leave her alone." http://www.digitalsignallabs.com % 'Yours Truly, 2095', *Time*, ELO
Reply by Randy Yates November 8, 20102010-11-08
Randy Yates <yates@ieee.org> writes:

> Vladimir Vassilevsky <nospam@nowhere.com> writes: > >> glen herrmannsfeldt wrote: >> >> >>> Much embedded programming is done in C with inline assembler. >> >> Inline assembler is bad style and a characteristic feature of lame >> programmers. It combines disadvantages of both C and assembler. >> >> If there is a need to use an assembler, make a separate module in >> assembler and call it from C as an external function. > > Amen, brother! My philosophy exactly!
PS: I highly recommand yasm for x86 assembly. http://www.tortall.net/projects/yasm/ --Randy -- Randy Yates % "...the answer lies within your soul Digital Signal Labs % 'cause no one knows which side mailto://yates@ieee.org % the coin will fall." http://www.digitalsignallabs.com % 'Big Wheels', *Out of the Blue*, ELO
Reply by David Brown November 8, 20102010-11-08
On 08/11/10 21:30, Vladimir Vassilevsky wrote:
> > > glen herrmannsfeldt wrote: > > >> Much embedded programming is done in C with inline assembler. > > Inline assembler is bad style and a characteristic feature of lame > programmers. It combines disadvantages of both C and assembler. > > If there is a need to use an assembler, make a separate module in > assembler and call it from C as an external function. >
I disagree. It's a matter of taste and style, of course. It is also a matter of the application and the target - for DSPs with typical DSP code, assembly can make a much bigger difference over C than for more common microprocessor code. Sometimes you can't avoid using assembly. A typical good reason is because you need access to low-level features that can't be expressed in C (such as registers that are accessible only with special instructions), for a few bits of startup code, as part of an interrupt routine (if your compiler does a poor job), or for handling key parts of an RTOS. Typical bad reasons is for speed-optimising code. Sometimes it /is/ important to get the best possible speed out of the system within a small section of the code. You may also be writing library code or other heavily-used code, where it can be worth the effort. But in a great many cases when people think they need to write code in assembly for speed, they are wrong - either it is not worth the cost (in terms of development time, maintainability, readability, correctness, robustness, portability, etc.), or the C compiler will actually do a good job if the programmer just learned to use it properly. So all in all, for most targets and most applications, you don't often need assembly when using modern tools. Since well-written high-level code is /generally/ clearer and more portable than even well-written assembly (though you can write bad code in any language), the preference should always be for using high level coding unless you have overriding reasons for using assembly. Then you have the choice - separate assembly modules, or inline assembly. If you are writing large sections of assembly code, then assembly modules makes sense. It is clearer to stick to a single language at a time, and use tools suitable for that language, and large "asm(...)" statements are as messy as large multi-line pre-processor macros. But if you are mixing C and assembly, and want to have minimal assembly, then inline assembly is the way to go, especially if you have good tools. There are some C compilers that can't deal well with inline assembly - they insist on it being restricted to "assembly functions" only, or they turn off all optimisations for functions that use inline assembly. Other compilers work very well with inline assembly - the compiler will handle register and/or stack allocation, and happily include the inline assembly in its optimisation flow. If you are using such a compiler (gcc is a well-known example, but there are commercial tools that work well too), then you will probably get the smallest and fastest code by using inline assembly and letting the compiler do its job. The actual assembly code itself can often be tucked away in "static inline" functions. Inline assembly lets you mix assembly with the C to get the /best/ of both worlds. As an example, I once re-wrote the C startup code used by a particular compiler, so that the startup code was in C instead of the original assembler. The original assembler code was quite well-written and clear, but it is difficult to write assembly that is general, clear, and efficient. You often can't write the code to take advantage of particular circumstances (such optimising based on the values of compile-time constants) without it being messy and full of conditional assembly. But the C compiler will do such optimisations fine. So my C code, along with a couple of lines of inline assembly to set the stack pointer, was much smaller and clearer in source code, and the target code was significantly smaller and faster. The code couldn't have been written in pure C, and the mix with inline assembler was a big improvement over the external assembly module.
>> With some compilers, you can switch back and forth in the middle >> of a function. Depending or the processor, keeping track of >> which registers to use may or may not be a problem, but in most >> cases it can be done and result in fast code. > > The result is incomprehensible, unalterable, undebuggable and unportable > write-only code. >
If you need to write code requiring a lot of register tracking, then I can see that inline assembly will be messy. But for a lot of uses, when done properly (both by the user, and by the toolchain vendor), inline assembly is much clearer and simpler than external assembly modules, and results in smaller and faster code.