EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

ARM's v7 MMU

Started by Don Y May 6, 2014
Hi,

Any pointers as to idiosyncrasies in ARM's v7 MMU?  (no, I'm
not looking for information as to how to *use* it; rather,
pointers regarding any "unexpected behaviors" that I might
encounter -- especially when mixing page sizes, etc.)

Also, any pointers to particular silicon to avoid/favor
in terms of potential problems in the MMU implementation?

Thx!
--don
On 2014-05-06, Don Y <this@is.not.me.com> wrote:
> Hi, > > Any pointers as to idiosyncrasies in ARM's v7 MMU? (no, I'm > not looking for information as to how to *use* it; rather, > pointers regarding any "unexpected behaviors" that I might > encounter -- especially when mixing page sizes, etc.) >
There are 4 indirect things I've come across: 1) As you will be aware, the whole caching/buffering subsystem was totally reworked for ARMv6 and the ARMv5 subsystem is no longer supported in ARMv7. I've found the configuration on memory/device regions is much more sensitive/fragile than it is with ARMv5 devices when the MMU is enabled. A specific example: if you are experiencing device lockups when enabling the MMU, try changing the device type attributes in the paging table for the peripheral region. 2) If you are using the MMU on a device with Security Extensions enabled, don't forget that some register bits which are otherwise R/W become R/O in Non-Secure mode. 3) Don't forget that on ARMv7 class devices, some register updates may be posted across a bus meaning they are not updated immediately. When you turn on instruction and data caching, interrupt handling code can run fast enough that you get a race condition with the interrupt hardware firing the interrupt for a second time unless you use the usual DSB instructions. I've seen this happen on the AM3359. 4) There's no longer any way to invalidate the whole data cache in one go. You now have to do it by MVA or set/way.
> Also, any pointers to particular silicon to avoid/favor > in terms of potential problems in the MMU implementation? >
The AM3359 in the Beaglebone Black caused me way more trouble than the Allwinner A10s did. However, the AM3359 is heavily documented (unlike the Chinese jobs... :-() Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
On 06.5.2014 &#1075;. 22:27, Don Y wrote:
> Hi, > > Any pointers as to idiosyncrasies in ARM's v7 MMU? (no, I'm > not looking for information as to how to *use* it; rather, > pointers regarding any "unexpected behaviors" that I might > encounter -- especially when mixing page sizes, etc.) > > Also, any pointers to particular silicon to avoid/favor > in terms of potential problems in the MMU implementation? > > Thx! > --don
Hi Don, if on that version they still have that ridiculous MMU tagging pages by logical address - so you have to flush all caches etc. on task switch - may be your best chance is to simply disable it, if you have the option (I don't know ARM). Or switch to a power architecture processor, their MMUs work OK. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On 2014-05-07, Dimiter_Popoff <dp@tgi-sci.com> wrote:
> > Hi Don, > > if on that version they still have that ridiculous MMU tagging > pages by logical address - so you have to flush all caches etc. on > task switch - may be your best chance is to simply disable it,
The problem is the caches on ARM don't work unless the MMU is enabled.
> if you have the option (I don't know ARM). Or switch to a power > architecture processor, their MMUs work OK. >
Can you pick up capable battery operated small Power boards for about 20-30 British pounds ? You can for ARM but you can't for Power (at least the last time I checked) which is why there's a hobbyist and experimenting ecosystem around ARM but there isn't around Power. Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
On 07.5.2014 &#1075;. 15:03, Simon Clubley wrote:
> ... > Can you pick up capable battery operated small Power boards for > about 20-30 British pounds ? > > You can for ARM but you can't for Power (at least the last time > I checked) which is why there's a hobbyist and experimenting > ecosystem around ARM but there isn't around Power. > > Simon. >
I know, for whatever reason Power is kept out of reach for the hobbyist market. There are (very) powerful chips which allow sub-$100 boards but that's about all. However, I don't think that would stop Don, my guess is he is just looking for the cheapest hardware which will do the job for him - which may well be ARM based but then may be not. Dimiter
Hi Simon,

On 5/6/2014 7:36 PM, Simon Clubley wrote:
> On 2014-05-06, Don Y<this@is.not.me.com> wrote: > >> Any pointers as to idiosyncrasies in ARM's v7 MMU? (no, I'm >> not looking for information as to how to *use* it; rather, >> pointers regarding any "unexpected behaviors" that I might >> encounter -- especially when mixing page sizes, etc.) > > There are 4 indirect things I've come across: > > 1) As you will be aware, the whole caching/buffering subsystem was totally > reworked for ARMv6 and the ARMv5 subsystem is no longer supported in ARMv7.
Yeah, I really would have liked the "tiny" page size (actually, even tiny *quarterpages*!) Aside from the (slight) performance gain, the sections/supersections I'd gladly trade away in that case!
> I've found the configuration on memory/device regions is much more > sensitive/fragile than it is with ARMv5 devices when the MMU is enabled.
Is this just a case of "doing extra homework" (i.e., making sure you understand the repercussions of each flag setting)? Or, do certain targets behave differently (thus *requiring* different settings)?
> A specific example: if you are experiencing device lockups when enabling > the MMU, try changing the device type attributes in the paging table for > the peripheral region.
I assume you mean beyond the obvious "make sure the page is wired down", cacheability setting, etc.? Said another way (for all of the above), when you discover(ed) the source of the problem, did you slap your head and utter "D'oh!" (i.e., "damn, I should have known better!") *or* did you find yourself uncomfortably wondering why *that* fixed the problem? [The former I can deal with; the latter would leave me anxious!]
> 2) If you are using the MMU on a device with Security Extensions enabled, > don't forget that some register bits which are otherwise R/W become R/O > in Non-Secure mode.
OK
> 3) Don't forget that on ARMv7 class devices, some register updates may > be posted across a bus meaning they are not updated immediately. When > you turn on instruction and data caching, interrupt handling code can > run fast enough that you get a race condition with the interrupt hardware > firing the interrupt for a second time unless you use the usual DSB > instructions. > > I've seen this happen on the AM3359.
I'm not sure I understand your point. Can you embelish an example?
> 4) There's no longer any way to invalidate the whole data cache in one > go. You now have to do it by MVA or set/way.
Hmmm.... this could be annoying. OTOH, there are few cases where I would need to invalidate more than just a cache line, "typically". So, the extra cost/complexity may disappear in practice.
>> Also, any pointers to particular silicon to avoid/favor >> in terms of potential problems in the MMU implementation? > > The AM3359 in the Beaglebone Black caused me way more trouble than the > Allwinner A10s did. However, the AM3359 is heavily documented (unlike > the Chinese jobs... :-()
Was the "trouble" attributable to "learning curve"? I.e., did the A10 benefit from "previous experience" on the BB? Thanks! --don
Hi Simon & Dimiter,

On 5/7/2014 5:03 AM, Simon Clubley wrote:
> On 2014-05-07, Dimiter_Popoff<dp@tgi-sci.com> wrote:
>> if you have the option (I don't know ARM). Or switch to a power >> architecture processor, their MMUs work OK. > > Can you pick up capable battery operated small Power boards for > about 20-30 British pounds ?
I'm not really interested in (ready-made) "boards" but the point is the same -- I want inexpensive and low power (that also tends to suggest a high level of integration). My power budget per node (including all "I/O loads") is ~10W. In some cases, much of that 10W is I/O so the processor needs to be in the 1-2W ballpark.
> You can for ARM but you can't for Power (at least the last time > I checked) which is why there's a hobbyist and experimenting > ecosystem around ARM but there isn't around Power.
There's (I think) also a bigger selection (vendors, configurations) with ARM. --don
Hi Dimiter,

On 5/7/2014 5:27 AM, Dimiter_Popoff wrote:
> On 07.5.2014 &#1075;. 15:03, Simon Clubley wrote: >> ... >> Can you pick up capable battery operated small Power boards for >> about 20-30 British pounds ? >> >> You can for ARM but you can't for Power (at least the last time >> I checked) which is why there's a hobbyist and experimenting >> ecosystem around ARM but there isn't around Power. > > I know, for whatever reason Power is kept out of reach for the > hobbyist market. There are (very) powerful chips which allow sub-$100 > boards but that's about all. > > However, I don't think that would stop Don, my guess is he is > just looking for the cheapest hardware which will do the job > for him - which may well be ARM based but then may be not.
Yes, I'd like to keep cost and power requirements down. OTOH, I am (now) trying to cut some (development) corners. My original design would have required me to create *three* different RTOS's with compatible features/capabilities as they would execute on targets at different price/complexity points (e.g., "Intel", Cortex-A and Cortex-M). Hard to get such a heterogeneous system to "play nice together" :< OTOH, if I can 86 (ha!) the Intel targets, that gives me another degree of freedom in the design (and, *forces* me to steer clear of that ever-changing platform!). Now, I'm trying to rationalize replacing the Cortex-M devices with (more expensive) Cortex-A's... just to eliminate yet another variation and have a truly homogeneous system! Size may prove to be a problem... --don
On 2014-05-07, Don Y <this@is.not.me.com> wrote:
> Hi Simon, > > On 5/6/2014 7:36 PM, Simon Clubley wrote: >> On 2014-05-06, Don Y<this@is.not.me.com> wrote: >> >>> Any pointers as to idiosyncrasies in ARM's v7 MMU? (no, I'm >>> not looking for information as to how to *use* it; rather, >>> pointers regarding any "unexpected behaviors" that I might >>> encounter -- especially when mixing page sizes, etc.) >> >> There are 4 indirect things I've come across: >> >> 1) As you will be aware, the whole caching/buffering subsystem was totally >> reworked for ARMv6 and the ARMv5 subsystem is no longer supported in ARMv7. > > Yeah, I really would have liked the "tiny" page size (actually, > even tiny *quarterpages*!) Aside from the (slight) performance > gain, the sections/supersections I'd gladly trade away in that > case! > >> I've found the configuration on memory/device regions is much more >> sensitive/fragile than it is with ARMv5 devices when the MMU is enabled. > > Is this just a case of "doing extra homework" (i.e., making sure you > understand the repercussions of each flag setting)? Or, do certain > targets behave differently (thus *requiring* different settings)? >
The latter. When I took the perfectly working settings from the A10s for the peripheral memory region to the BBB, the BBB locked up solid every time the MMU was enabled. Turns out that on the AM3359, the peripheral memory region must be marked as shareable device or it simply will not work. Marking the region as non-shareable device caused a solid lockup every time. This was not a issue on the A10s.
>> A specific example: if you are experiencing device lockups when enabling >> the MMU, try changing the device type attributes in the paging table for >> the peripheral region. > > I assume you mean beyond the obvious "make sure the page is wired down", > cacheability setting, etc.? >
Oh, yes. I went through all those (and more) before discovering the solution. I still cannot find anything which explains why the above is required on the AM3359 but not on the A10s.
> Said another way (for all of the above), when you discover(ed) the > source of the problem, did you slap your head and utter "D'oh!" > (i.e., "damn, I should have known better!") *or* did you find > yourself uncomfortably wondering why *that* fixed the problem? > > [The former I can deal with; the latter would leave me anxious!] >
The latter. I could not find anything in the ARM architecture manuals, the AM3359 TRM or other documents about why two Cortex-A8 MCUs behave so differently. That makes me nervous.
>> 2) If you are using the MMU on a device with Security Extensions enabled, >> don't forget that some register bits which are otherwise R/W become R/O >> in Non-Secure mode. > > OK > >> 3) Don't forget that on ARMv7 class devices, some register updates may >> be posted across a bus meaning they are not updated immediately. When >> you turn on instruction and data caching, interrupt handling code can >> run fast enough that you get a race condition with the interrupt hardware >> firing the interrupt for a second time unless you use the usual DSB >> instructions. >> >> I've seen this happen on the AM3359. > > I'm not sure I understand your point. Can you embelish an example? >
This is on the AM3359 with my own interrupt wrapper written in ARM assembly which is executed when the IRQ exception vector is triggered. The IRQ interrupt wrapper determines which interrupt handler to call (UART, timer, etc) and calls it. In the interrupt handler you write to a peripheral (say timer) register to say you have handled the interrupt and then return back to the IRQ interrupt wrapper. The IRQ interrupt wrapper writes to the AM3359 interrupt registers telling it the interrupt controller can search for a new interrupt. When both instruction and data caching is turned on, my code runs sufficiently fast that the write to the timer interrupt acknowledge register is still making it's way across the bus and the interrupt controller thinks the interrupt is still pending because there's no longer a coherent view of resources. The solution is to use a Data Synchronisation Barrier (DSB) instruction sometime between writing the timer interrupt acknowledge register and telling the the interrupt controller it can look for a new interrupt. If you read the AM3359 Technical Reference Manual, you will see the use of a DSB is discussed in relation to writing to the above mentioned interrupt controller register and the same reasoning can apply to the peripheral interrupt acknowledge registers as well.
>> 4) There's no longer any way to invalidate the whole data cache in one >> go. You now have to do it by MVA or set/way. > > Hmmm.... this could be annoying. OTOH, there are few cases where > I would need to invalidate more than just a cache line, "typically". > So, the extra cost/complexity may disappear in practice. > >>> Also, any pointers to particular silicon to avoid/favor >>> in terms of potential problems in the MMU implementation? >> >> The AM3359 in the Beaglebone Black caused me way more trouble than the >> Allwinner A10s did. However, the AM3359 is heavily documented (unlike >> the Chinese jobs... :-() > > Was the "trouble" attributable to "learning curve"? I.e., did the > A10 benefit from "previous experience" on the BB? >
The A10s, with it's poor documentation, came first for me. I managed to figure things out on the A10s even with this poor documentation and I still got tripped up on the BBB when I later started playing with that. Older ARM MCUs used to have such nice predictable behaviour... Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
Hi Simon,

On 5/7/2014 1:24 PM, Simon Clubley wrote:

>>> I've found the configuration on memory/device regions is much more >>> sensitive/fragile than it is with ARMv5 devices when the MMU is enabled. >> >> Is this just a case of "doing extra homework" (i.e., making sure you >> understand the repercussions of each flag setting)? Or, do certain >> targets behave differently (thus *requiring* different settings)? > > The latter. > > When I took the perfectly working settings from the A10s for the > peripheral memory region to the BBB, the BBB locked up solid every time > the MMU was enabled.
Sorry, my bad. :< By "targets" I meant "regions of memory" (i.e., different I/O devices in the same system). It *appears* that the settings you eventually came up with work *universally* for all "(I/O) devices" within a given "MCU target" -- but, that the settings for MCU target #1 differ from those for MCU target #2. Is this a correct assessment?
> Turns out that on the AM3359, the peripheral memory region must be marked > as shareable device or it simply will not work. Marking the region as > non-shareable device caused a solid lockup every time. This was not a > issue on the A10s.
Do all of the "(I/O) devices" on that part fit in a single page/map? I.e., do you *replicate* the settings for the devices that reside at one part of the address space to devices that reside at other parts of the address space? (or, do you throw them all in a "section")
>>> A specific example: if you are experiencing device lockups when enabling >>> the MMU, try changing the device type attributes in the paging table for >>> the peripheral region. >> >> I assume you mean beyond the obvious "make sure the page is wired down", >> cacheability setting, etc.? > > Oh, yes. I went through all those (and more) before discovering the > solution. I still cannot find anything which explains why the above > is required on the AM3359 but not on the A10s.
<frown> And, not likely you are going to have N other MCUs to compare against (to determine *which* of these is the "exception"). :< No help from manufacturer? Forums? Will the A10 "behave" if configured as the AM3359? Or, does your code make assumptions that require it to be configured thusly? What are the design consequences of each configuration?
>> Said another way (for all of the above), when you discover(ed) the >> source of the problem, did you slap your head and utter "D'oh!" >> (i.e., "damn, I should have known better!") *or* did you find >> yourself uncomfortably wondering why *that* fixed the problem? >> >> [The former I can deal with; the latter would leave me anxious!] > > The latter. I could not find anything in the ARM architecture manuals, > the AM3359 TRM or other documents about why two Cortex-A8 MCUs behave > so differently. That makes me nervous.
Agreed. At the very least, have it documented as a "bug"/anomaly so you can at least know that "they" are aware of it -- and, will either act to preserve this behavior *or* alert folks to any *changes* to it.
>>> 3) Don't forget that on ARMv7 class devices, some register updates may >>> be posted across a bus meaning they are not updated immediately. When >>> you turn on instruction and data caching, interrupt handling code can >>> run fast enough that you get a race condition with the interrupt hardware >>> firing the interrupt for a second time unless you use the usual DSB >>> instructions. >>> >>> I've seen this happen on the AM3359. >> >> I'm not sure I understand your point. Can you embelish an example? > > This is on the AM3359 with my own interrupt wrapper written in ARM > assembly which is executed when the IRQ exception vector is triggered. > > The IRQ interrupt wrapper determines which interrupt handler to call > (UART, timer, etc) and calls it. > > In the interrupt handler you write to a peripheral (say timer) register > to say you have handled the interrupt and then return back to the IRQ > interrupt wrapper.
OK.
> The IRQ interrupt wrapper writes to the AM3359 interrupt registers > telling it the interrupt controller can search for a new interrupt.
Ah... also makes sense.
> When both instruction and data caching is turned on, my code runs > sufficiently fast that the write to the timer interrupt acknowledge > register is still making it's way across the bus and the interrupt > controller thinks the interrupt is still pending because there's no > longer a coherent view of resources. > > The solution is to use a Data Synchronisation Barrier (DSB) instruction > sometime between writing the timer interrupt acknowledge register and > telling the the interrupt controller it can look for a new interrupt.
Logical choice (all else being equal) is to do so in the dispatcher (as it allows the most time for any previous code to "complete")
> If you read the AM3359 Technical Reference Manual, you will see the use > of a DSB is discussed in relation to writing to the above mentioned > interrupt controller register and the same reasoning can apply to the > peripheral interrupt acknowledge registers as well.
>>>> Also, any pointers to particular silicon to avoid/favor >>>> in terms of potential problems in the MMU implementation? >>> >>> The AM3359 in the Beaglebone Black caused me way more trouble than the >>> Allwinner A10s did. However, the AM3359 is heavily documented (unlike >>> the Chinese jobs... :-() >> >> Was the "trouble" attributable to "learning curve"? I.e., did the >> A10 benefit from "previous experience" on the BB? > > The A10s, with it's poor documentation, came first for me. > > I managed to figure things out on the A10s even with this poor > documentation and I still got tripped up on the BBB when I later > started playing with that. > > Older ARM MCUs used to have such nice predictable behaviour...
Yes. I think the Cortex-A's are suffering from a desire to follow the "path" of other "big" (complex) processors (e.g., x86) along with all their cruft. One other question: is your use of the MMU largely "static" (i.e., set it and forget it); somewhat dynamic (using it to create individual protection domains for different processes); or even more "esoteric"? The intent of this question being to see how likely other "races" and anomalies are likely to have been stumbled upon in your codebase. Thanks! --don

The 2024 Embedded Online Conference