EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

New ARM Cortex Microcontroller Product Family from STMicroelectronics

Started by Bill Giovino June 18, 2007
>"Jim Granville" wrote... > One clarify : How credible is your ST contact ? - as I cannot see "share > the same packet buffer" anywhere in the user manual, or drawings, and > nothing suggests that conflict. > Search cannot find 'USB' inside the CAN chapter, nor 'CAN' inside the > USB chapter ? > It also does not make chip-design sense, surely it is harder to > lock/overlap some block resource like that, in this cut/paste world ? > > That said, it is a strange thing to 'make up'/ admit to if untrue, so > perhaps it is coming via the errata pipeline ? > > -jg
My contacts are very credible. And, unfortunately, what we are discussing is something that would not be made obvious in datasheet diagrams. Datasheet diagrams are mean to provide an overview of functionality in as clear a visual format as possible. Like the model of the atom, it almost never reflects 100% what's inside the chip. Chip buffers are RAM, and RAM is very greedy when it comes to die area. Allowing for an extra buffer could price out the chip as non-competitive, or it could be unwieldy from a layout POV. My GUESS - the chip was designed for a primary customer, who wanted either USB or CAN (but not both). During chip design, a smart marketing person asked about adding the extra peripheral. Adding the CAN or USB adds small pieces of a penny to the chip cost. But adding an extra buffer probably priced the chip beyond what was quoted to the target customer. Jim, if you want to take this off-line, I can be reached at the first email address on this page: http://www.microcontroller.com/Embedded.asp?did=23 -Bill.
"Jim Granville" wrote...
> I'm with Jon on this, the semantics matter little; but it would be > good to answer the simple question "Can it execute code from RAM?" > - in some systems, that is useful.
In the case of my original article: http://www.microcontroller.com/news/arm_cortex_stm.asp the answer is YES - the ST part can execute code from RAM off the data (system) bus. Of course, there will be extra cycles. Back when I was but a fledgling FAE, in my presentations I used to label architectures as Harvard, Modified Harvard, Von Neumann, etc. In PPT presentations, engineers would ALWAYS debate amongst themselves as to the differences between these architectures, and sometimes whether or not Von Newman (What, Me Worry???). was spelled right... What you decide to call the architecture is much less important than what you decide to do with it. Bill
Bill Giovino wrote:

> "Jim Granville" wrote... > >>I'm with Jon on this, the semantics matter little; but it would be >>good to answer the simple question "Can it execute code from RAM?" >>- in some systems, that is useful. > > > In the case of my original article: > http://www.microcontroller.com/news/arm_cortex_stm.asp > the answer is YES - the ST part can execute code from RAM off the data (system) bus. Of > course, there will be extra cycles.
In some parts, RAM CODE execution is promoted for speed (due to slower FLASH speeds). Is that not the case in the ST device Core/RAM/FLASH combination ? -jg
"Jim Granville" wrote...
> Bill Giovino wrote: > > > "Jim Granville" wrote... > > > >>I'm with Jon on this, the semantics matter little; but it would be > >>good to answer the simple question "Can it execute code from RAM?" > >>- in some systems, that is useful. > > > > > > In the case of my original article: > > http://www.microcontroller.com/news/arm_cortex_stm.asp > > the answer is YES - the ST part can execute code from RAM off the data (system) bus.
Of
> > course, there will be extra cycles. > > In some parts, RAM CODE execution is promoted for speed (due to slower > FLASH speeds). > Is that not the case in the ST device Core/RAM/FLASH combination ? > > -jg
Good question... in the ST part, if you are running out of Flash at zero wait states, then you are getting simultaneous fetches from both data & address buses - taking full advantage of the Harvard (ahem!) architecture gets you *mostly* single-cycle execution. But for the same example, if you are running out of RAM, then you are using the same bus for instructions and data, you lose the advantages of the Harvard architecture and so it's slower. However, if you are running out of Flash with the CPU at a higher speed than the Flash, and so the Flash requires wait states while taking advantage of the Harvard architecture - the speed compared to running instructions & data out of RAM off the data bus and so there are extra cycles the answer is - it depends...
On Jun 22, 12:40 am, "Bill Giovino" <conta...@microcontroller.com>
wrote:

> However, if you are running out of Flash with the CPU at a higher speed than the Flash, > and so the Flash requires wait states while taking advantage of the Harvard > architecture
Any idea if the ST Cortex M3 can run without wait states from flash at their rated speed? That would be quite impressive. Eric
On Jun 22, 2:34 pm, Eric <englere_...@yahoo.com> wrote:
> On Jun 22, 12:40 am, "Bill Giovino" <conta...@microcontroller.com> > wrote: > > > However, if you are running out of Flash with the CPU at a higher speed than the Flash, > > and so the Flash requires wait states while taking advantage of the Harvard > > architecture > > Any idea if the ST Cortex M3 can run without wait states from flash at > their rated speed? That would be quite impressive.
The data sheet says it requires one wait state from 24 to 48 MHz and 2 wait states above 48 MHz. So compared to the Luminary parts running at 50 MHz with *NO* wait states, I say the ST M3 parts are dogs. The power consumption is not great either, at least not compared to parts like Atmel SAM7. The advertisement says it gets "0.5 mA/MHz in RUN mode from Flash", but this is not very accurate. The power curve does not have a 0.5 mA/MHz slope. The STM32F103 data sheet shows higher current per MHz at low clock speeds with a Y intercept of about 9 mA. I think the lower mA/MHz at higher clock speeds reflects the lower MIPS available due to the required wait states. Accounting for that, the mA/MHz ranges from 0.54 at 24 MHz to 0.88 at 72 MHz. I think this may be better than the Luminary Stellaris parts, but not as good as the Atmel SAM7 parts which are claimed to be a true 0.5 mA/MHz with very low static current in the uA range. I have not looked at the newer Luminary parts in detail. Actually, I guess a power factor would be required for the SAM7 parts as well since they run with one wait state at their top speed. So maybe the STM32 part do better on power than I realized! I am still waiting for Luminary to announce parts on a smaller geometry process. I was told they would be out toward the end of the year in a 130 nm process, IIRC. These parts should be very low power, but I don't know if they will keep 5 volt tolerance and what the static current will be.
>Bill Giovino wrote >>http://www.microcontroller.com/news/arm_cortex_stm.asp >>STMicroelectronics has introduced the new STM32 microcontroller family, >>based on the Harvard architecture ARM Cortex. >>
FreeRTOS.org wrote:
>Where have you been all this time? ;o) >http://groups.google.com/group/comp.arch.embedded/browse_thread/thread/528fb9dd63e29756/a16733f4109c7f42?lnk=gst&q=%22ST+announce+their+Cortex-M3+micros%22&rnum=1#a16733f4109c7f42
Note to Richard: When posting Google Groups links, the browse_frm paradigm is nicer. http://groups.google.com/group/comp.arch.embedded/browse_frm/thread/528fb9dd63e29756/a16733f4109c7f42?q=announce.their.Cortex.M3.micros In addition, using periods (or hyphens[1]) to form phrases makes things more searchable (no %22ST stuff) ...and lnk=gst& is just noise. . . [1] A hyphen (grease-monkey) will find e.g BOTH **grease monkey** AND **greasemonkey**.
"rickman" <gnuarm@gmail.com> wrote in message news:1182540547.184518.91830@o11g2000prd.googlegroups.com...
> On Jun 22, 2:34 pm, Eric <englere_...@yahoo.com> wrote: >> On Jun 22, 12:40 am, "Bill Giovino" <conta...@microcontroller.com> >> wrote: >> >> > However, if you are running out of Flash with the CPU at a higher speed than the Flash, >> > and so the Flash requires wait states while taking advantage of the Harvard >> > architecture >> >> Any idea if the ST Cortex M3 can run without wait states from flash at >> their rated speed? That would be quite impressive. > > The data sheet says it requires one wait state from 24 to 48 MHz and 2 > wait states above 48 MHz. So compared to the Luminary parts running > at 50 MHz with *NO* wait states, I say the ST M3 parts are dogs.
It's not that bad. Cortex-M3 has a prefetch buffer and branch prediction. This means that the cost of a single waitstate can be hidden for conditional branches, ie. only indirect branches have a penalty. With 2 wait states the branch prediction only works on unconditional branches, so you'll get a slowdown. However you can change loops to use an unconditional branch at the end so they run at the speed of zero-wait state memory.
> The power consumption is not great either, at least not compared to > parts like Atmel SAM7. The advertisement says it gets "0.5 mA/MHz in > RUN mode from Flash", but this is not very accurate. The power curve > does not have a 0.5 mA/MHz slope. The STM32F103 data sheet shows > higher current per MHz at low clock speeds with a Y intercept of about > 9 mA. I think the lower mA/MHz at higher clock speeds reflects the > lower MIPS available due to the required wait states.
It is the flash power consumption. When you add wait states the power consumption flash drops to 50% (1 wait state) or 33% (2 wait states). Ie. the flash has identical power consumption at 24, 48 and 72MHz. Of course the secondary effect of adding wait states is the core slows down and so uses less power. Based on their numbers I estimate the slowdown is between 10 and 15% - not too bad for 2 wait states.
> Accounting for > that, the mA/MHz ranges from 0.54 at 24 MHz to 0.88 at 72 MHz. I > think this may be better than the Luminary Stellaris parts, but not as > good as the Atmel SAM7 parts which are claimed to be a true 0.5 mA/MHz > with very low static current in the uA range. I have not looked at > the newer Luminary parts in detail.
I calculate 40mA at 72MHz, so 0.56mA/MHz. Not quite 0.5, but close. But I don't see where you get the idea they are worse than SAM7. I'm not sure what part you were comparing with, but the SAM7A3 (also CAN and USB like STM32F103) shows 70mA at 60MHz, or more than twice at the same frequency. Now consider that an M3 runs twice as fast as a SAM7 at the same frequency, so the MIPS/Watt is 4 times as good!
> Actually, I guess a power factor would be required for the SAM7 parts > as well since they run with one wait state at their top speed. So > maybe the STM32 part do better on power than I realized!
If you're trying to compare MIPS/Watt don't forget that different cores running at the same frequency do not run at the same speed. Wilco
On Jun 22, 7:34 pm, "Wilco Dijkstra" <Wilco_dot_Dijks...@ntlworld.com>
wrote:
> "rickman" <gnu...@gmail.com> wrote in messagenews:1182540547.184518.91830@o11g2000prd.googlegroups.com... > > The data sheet says it requires one wait state from 24 to 48 MHz and 2 > > wait states above 48 MHz. So compared to the Luminary parts running > > at 50 MHz with *NO* wait states, I say the ST M3 parts are dogs. > > It's not that bad. Cortex-M3 has a prefetch buffer and branch prediction. This > means that the cost of a single waitstate can be hidden for conditional branches, > ie. only indirect branches have a penalty. With 2 wait states the branch prediction > only works on unconditional branches, so you'll get a slowdown. However you can > change loops to use an unconditional branch at the end so they run at the speed > of zero-wait state memory.
I don't follow what you are saying at all. Branch prediction relates to pipelining. I don't see how it relates to wait states. The required wait states are added because of a fundamental limitation in the bandwidth of the Flash memory. You can look-ahead all you want, but you can still only return one word from Flash per 3 clock cycles when running at full speed. Unless the Flash word width is increased (as in the NXP designs) or the instruction size is reduced (many Cortex M3 instructions are 16 bits, but they would need to be 10 bits with two wait states and 32 bit memory) this will limit performance in the Cortex M3. Am I completely missing something? I always leave that possibility open...
> > The power consumption is not great either, at least not compared to > > parts like Atmel SAM7. The advertisement says it gets "0.5 mA/MHz in > > RUN mode from Flash", but this is not very accurate. The power curve > > does not have a 0.5 mA/MHz slope. The STM32F103 data sheet shows > > higher current per MHz at low clock speeds with a Y intercept of about > > 9 mA. I think the lower mA/MHz at higher clock speeds reflects the > > lower MIPS available due to the required wait states. > > It is the flash power consumption. When you add wait states the power > consumption flash drops to 50% (1 wait state) or 33% (2 wait states). Ie. > the flash has identical power consumption at 24, 48 and 72MHz. > > Of course the secondary effect of adding wait states is the core slows down > and so uses less power. Based on their numbers I estimate the slowdown is > between 10 and 15% - not too bad for 2 wait states.
Yes, that is all pretty obvious. But it does not address the point of the Y intercept being a hefty 9 mA. This is not as high as the Analog Devices ARM parts, but it is significant. It means you need to use modes and hardware features to get better power savings compared to just slowing the clock which is much simpler to do.
> > Accounting for > > that, the mA/MHz ranges from 0.54 at 24 MHz to 0.88 at 72 MHz. I > > think this may be better than the Luminary Stellaris parts, but not as > > good as the Atmel SAM7 parts which are claimed to be a true 0.5 mA/MHz > > with very low static current in the uA range. I have not looked at > > the newer Luminary parts in detail. > > I calculate 40mA at 72MHz, so 0.56mA/MHz. Not quite 0.5, but close. > But I don't see where you get the idea they are worse than SAM7. I'm not sure > what part you were comparing with, but the SAM7A3 (also CAN and USB like > STM32F103) shows 70mA at 60MHz, or more than twice at the same frequency.
The SAM7A3 is one of the oldest SAM7 parts and is not a useful basis for comparison. Personally, I do not expect to have a use for the CAN controller and I don't expect it was running when the power measurements were made. I was using the SAM7S parts as a point of comparison. I have a spread sheet that was provided by Atmel which shows the power rating of the CPU since you can control all the various power consuming sections. Ignoring the peripherals, the CPU (with PLL running) consumes 0.5 mA/MHz with a very small Y intercept (as I initially said). The power for the STM32 is from the data sheet and includes basic power to the peripherals, although since they are not performing work the power they draw is less than typical. So the comparison is not perfect.
> Now consider that an M3 runs twice as fast as a SAM7 at the same frequency, > so the MIPS/Watt is 4 times as good!
How do you support the claim that the M3 runs twice as fast as the SAM7 at the same frequency??? Maybe I don't want to know... I have not seen anyone claim that the M3 runs twice as fast as an ARM7 clock for clock. I don't even think ARM claims that. I seem to recall that after all the hoopla is removed, you might see from 10% to 25% speedup from the ARM7 to the M3 depending on your application. If you disagree on this basic point, then I think we should not discuss it further. I have seen it discussed before ad nauseum with no hard information to support any given number.
> > Actually, I guess a power factor would be required for the SAM7 parts > > as well since they run with one wait state at their top speed. So > > maybe the STM32 part do better on power than I realized! > > If you're trying to compare MIPS/Watt don't forget that different cores running > at the same frequency do not run at the same speed.
Yes, but that is a small delta compared to adding waitstates with a 2x or 3x reduction in performance and therefore the same effect on power efficiency.
"Wilco Dijkstra" wrote...
> > "rickman" wrote... > > On Jun 22, 2:34 pm, Eric wrote: > >> On Jun 22, 12:40 am, "Bill Giovino" wrote: > >> > >> > However, if you are running out of Flash with the CPU at a higher speed than the
Flash,
> >> > and so the Flash requires wait states while taking advantage of the Harvard > >> > architecture > >> > >> Any idea if the ST Cortex M3 can run without wait states from flash at > >> their rated speed? That would be quite impressive. > > > > The data sheet says it requires one wait state from 24 to 48 MHz and 2 > > wait states above 48 MHz. So compared to the Luminary parts running > > at 50 MHz with *NO* wait states, I say the ST M3 parts are dogs. > > It's not that bad. Cortex-M3 has a prefetch buffer and branch prediction. This > means that the cost of a single waitstate can be hidden for conditional branches, > ie. only indirect branches have a penalty. With 2 wait states the branch prediction > only works on unconditional branches, so you'll get a slowdown. However you can > change loops to use an unconditional branch at the end so they run at the speed > of zero-wait state memory.
Completely correct. But you must remember that often devices like these are not often used at their full speed. ST certainly has excellent embedded Flash processes that can run faster than 24MHz and they deliberately chose not to use any of them for this product. In the case of this device, it looks like it was developed speifically for low power applications, where the issue isn't really instructions per second, but milliamps per second. The intelligent peripherals, and especially the non-intrusive DMA, allow developers to run the core slower. When competing with commodity devices, (and anything licensed from ARM has become a commodity), a microcontroller company needs a competitive advantage. ST's advantage is their superior in-house process technology. Only TI (who also licenses the ARM Cortex) competes with ST when it comes to superior in-house process technology, and, hey, ST and TI are so close in process ability I wouldn't bet on the difference between the two. Bill Giovino http://Microcontroller.com http://www.microcontroller.com/news/arm_cortex_stm.asp

The 2024 Embedded Online Conference