ARM's v7 MMU

Hi,

Any pointers as to idiosyncrasies in ARM's v7 MMU?  (no, I'm
not looking for information as to how to *use* it; rather,
pointers regarding any "unexpected behaviors" that I might
encounter -- especially when mixing page sizes, etc.)

Also, any pointers to particular silicon to avoid/favor
in terms of potential problems in the MMU implementation?

Thx!
--don

Reply by Simon Clubley ●May 6, 20142014-05-06

On 2014-05-06, Don Y <this@is.not.me.com> wrote:
> Hi,
>
> Any pointers as to idiosyncrasies in ARM's v7 MMU?  (no, I'm
> not looking for information as to how to *use* it; rather,
> pointers regarding any "unexpected behaviors" that I might
> encounter -- especially when mixing page sizes, etc.)
>

There are 4 indirect things I've come across:

1) As you will be aware, the whole caching/buffering subsystem was totally
reworked for ARMv6 and the ARMv5 subsystem is no longer supported in ARMv7.
I've found the configuration on memory/device regions is much more
sensitive/fragile than it is with ARMv5 devices when the MMU is enabled.

A specific example: if you are experiencing device lockups when enabling
the MMU, try changing the device type attributes in the paging table for
the peripheral region.

2) If you are using the MMU on a device with Security Extensions enabled,
don't forget that some register bits which are otherwise R/W become R/O
in Non-Secure mode.

3) Don't forget that on ARMv7 class devices, some register updates may
be posted across a bus meaning they are not updated immediately. When
you turn on instruction and data caching, interrupt handling code can
run fast enough that you get a race condition with the interrupt hardware
firing the interrupt for a second time unless you use the usual DSB
instructions.

I've seen this happen on the AM3359.

4) There's no longer any way to invalidate the whole data cache in one
go. You now have to do it by MVA or set/way.

> Also, any pointers to particular silicon to avoid/favor
> in terms of potential problems in the MMU implementation?
>

The AM3359 in the Beaglebone Black caused me way more trouble than the
Allwinner A10s did. However, the AM3359 is heavily documented (unlike
the Chinese jobs... :-()

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by Dimiter_Popoff ●May 7, 20142014-05-07

On 06.5.2014 &#1075;. 22:27, Don Y wrote:
> Hi,
>
> Any pointers as to idiosyncrasies in ARM's v7 MMU?  (no, I'm
> not looking for information as to how to *use* it; rather,
> pointers regarding any "unexpected behaviors" that I might
> encounter -- especially when mixing page sizes, etc.)
>
> Also, any pointers to particular silicon to avoid/favor
> in terms of potential problems in the MMU implementation?
>
> Thx!
> --don

Hi Don,

if on that version they still have that ridiculous MMU tagging
pages by logical address - so you have to flush all caches etc. on
task switch - may be your best chance is to simply disable it,
if you have the option (I don't know ARM). Or switch to a power
architecture processor, their MMUs work OK.

Dimiter

------------------------------------------------------
Dimiter Popoff, TGI             http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Reply by Simon Clubley ●May 7, 20142014-05-07

On 2014-05-07, Dimiter_Popoff <dp@tgi-sci.com> wrote:
>
> Hi Don,
>
> if on that version they still have that ridiculous MMU tagging
> pages by logical address - so you have to flush all caches etc. on
> task switch - may be your best chance is to simply disable it,

The problem is the caches on ARM don't work unless the MMU is enabled.

> if you have the option (I don't know ARM). Or switch to a power
> architecture processor, their MMUs work OK.
>

Can you pick up capable battery operated small Power boards for
about 20-30 British pounds ?

You can for ARM but you can't for Power (at least the last time
I checked) which is why there's a hobbyist and experimenting
ecosystem around ARM but there isn't around Power.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by Dimiter_Popoff ●May 7, 20142014-05-07

On 07.5.2014 &#1075;. 15:03, Simon Clubley wrote:
> ...
> Can you pick up capable battery operated small Power boards for
> about 20-30 British pounds ?
>
> You can for ARM but you can't for Power (at least the last time
> I checked) which is why there's a hobbyist and experimenting
> ecosystem around ARM but there isn't around Power.
>
> Simon.
>

I know, for whatever reason Power is kept out of reach for the
hobbyist market. There are (very) powerful chips which allow sub-$100
boards but that's about all.

However, I don't think that would stop Don, my guess is he is
just looking for the cheapest hardware which will do the job
for him - which may well be ARM based but then may be not.

Dimiter

Reply by Don Y ●May 7, 20142014-05-07

Hi Simon,

On 5/6/2014 7:36 PM, Simon Clubley wrote:
> On 2014-05-06, Don Y<this@is.not.me.com>  wrote:
>
>> Any pointers as to idiosyncrasies in ARM's v7 MMU?  (no, I'm
>> not looking for information as to how to *use* it; rather,
>> pointers regarding any "unexpected behaviors" that I might
>> encounter -- especially when mixing page sizes, etc.)
>
> There are 4 indirect things I've come across:
>
> 1) As you will be aware, the whole caching/buffering subsystem was totally
> reworked for ARMv6 and the ARMv5 subsystem is no longer supported in ARMv7.

Yeah, I really would have liked the "tiny" page size (actually,
even tiny *quarterpages*!)  Aside from the (slight) performance
gain, the sections/supersections I'd gladly trade away in that
case!

> I've found the configuration on memory/device regions is much more
> sensitive/fragile than it is with ARMv5 devices when the MMU is enabled.

Is this just a case of "doing extra homework" (i.e., making sure you
understand the repercussions of each flag setting)?  Or, do certain
targets behave differently (thus *requiring* different settings)?

> A specific example: if you are experiencing device lockups when enabling
> the MMU, try changing the device type attributes in the paging table for
> the peripheral region.

I assume you mean beyond the obvious "make sure the page is wired down",
cacheability setting, etc.?

Said another way (for all of the above), when you discover(ed) the
source of the problem, did you slap your head and utter "D'oh!"
(i.e., "damn, I should have known better!") *or* did you find
yourself uncomfortably wondering why *that* fixed the problem?

[The former I can deal with; the latter would leave me anxious!]

> 2) If you are using the MMU on a device with Security Extensions enabled,
> don't forget that some register bits which are otherwise R/W become R/O
> in Non-Secure mode.

OK

> 3) Don't forget that on ARMv7 class devices, some register updates may
> be posted across a bus meaning they are not updated immediately. When
> you turn on instruction and data caching, interrupt handling code can
> run fast enough that you get a race condition with the interrupt hardware
> firing the interrupt for a second time unless you use the usual DSB
> instructions.
>
> I've seen this happen on the AM3359.

I'm not sure I understand your point.  Can you embelish an example?

> 4) There's no longer any way to invalidate the whole data cache in one
> go. You now have to do it by MVA or set/way.

Hmmm.... this could be annoying.  OTOH, there are few cases where
I would need to invalidate more than just a cache line, "typically".
So, the extra cost/complexity may disappear in practice.

>> Also, any pointers to particular silicon to avoid/favor
>> in terms of potential problems in the MMU implementation?
>
> The AM3359 in the Beaglebone Black caused me way more trouble than the
> Allwinner A10s did. However, the AM3359 is heavily documented (unlike
> the Chinese jobs... :-()

Was the "trouble" attributable to "learning curve"?  I.e., did the
A10 benefit from "previous experience" on the BB?

Thanks!
--don

Reply by Don Y ●May 7, 20142014-05-07

Hi Simon & Dimiter,

On 5/7/2014 5:03 AM, Simon Clubley wrote:
> On 2014-05-07, Dimiter_Popoff<dp@tgi-sci.com>  wrote:

>> if you have the option (I don't know ARM). Or switch to a power
>> architecture processor, their MMUs work OK.
>
> Can you pick up capable battery operated small Power boards for
> about 20-30 British pounds ?

I'm not really interested in (ready-made) "boards" but the
point is the same -- I want inexpensive and low power (that
also tends to suggest a high level of integration).

My power budget per node (including all "I/O loads") is ~10W.
In some cases, much of that 10W is I/O so the processor needs
to be in the 1-2W ballpark.

> You can for ARM but you can't for Power (at least the last time
> I checked) which is why there's a hobbyist and experimenting
> ecosystem around ARM but there isn't around Power.

There's (I think) also a bigger selection (vendors, configurations)
with ARM.

--don

Reply by Don Y ●May 7, 20142014-05-07

Hi Dimiter,

On 5/7/2014 5:27 AM, Dimiter_Popoff wrote:
> On 07.5.2014 &#1075;. 15:03, Simon Clubley wrote:
>> ...
>> Can you pick up capable battery operated small Power boards for
>> about 20-30 British pounds ?
>>
>> You can for ARM but you can't for Power (at least the last time
>> I checked) which is why there's a hobbyist and experimenting
>> ecosystem around ARM but there isn't around Power.
>
> I know, for whatever reason Power is kept out of reach for the
> hobbyist market. There are (very) powerful chips which allow sub-$100
> boards but that's about all.
>
> However, I don't think that would stop Don, my guess is he is
> just looking for the cheapest hardware which will do the job
> for him - which may well be ARM based but then may be not.

Yes, I'd like to keep cost and power requirements down.
OTOH, I am (now) trying to cut some (development) corners.

My original design would have required me to create *three*
different RTOS's with compatible features/capabilities as
they would execute on targets at different price/complexity
points (e.g., "Intel", Cortex-A and Cortex-M).  Hard to
get such a heterogeneous system to "play nice together"  :<

OTOH, if I can 86 (ha!) the Intel targets, that gives me another
degree of freedom in the design (and, *forces* me to steer clear
of that ever-changing platform!).

Now, I'm trying to rationalize replacing the Cortex-M devices
with (more expensive) Cortex-A's... just to eliminate yet another
variation and have a truly homogeneous system!  Size may prove
to be a problem...

--don

Reply by Simon Clubley ●May 7, 20142014-05-07

On 2014-05-07, Don Y <this@is.not.me.com> wrote:
> Hi Simon,
>
> On 5/6/2014 7:36 PM, Simon Clubley wrote:
>> On 2014-05-06, Don Y<this@is.not.me.com>  wrote:
>>
>>> Any pointers as to idiosyncrasies in ARM's v7 MMU?  (no, I'm
>>> not looking for information as to how to *use* it; rather,
>>> pointers regarding any "unexpected behaviors" that I might
>>> encounter -- especially when mixing page sizes, etc.)
>>
>> There are 4 indirect things I've come across:
>>
>> 1) As you will be aware, the whole caching/buffering subsystem was totally
>> reworked for ARMv6 and the ARMv5 subsystem is no longer supported in ARMv7.
>
> Yeah, I really would have liked the "tiny" page size (actually,
> even tiny *quarterpages*!)  Aside from the (slight) performance
> gain, the sections/supersections I'd gladly trade away in that
> case!
>
>> I've found the configuration on memory/device regions is much more
>> sensitive/fragile than it is with ARMv5 devices when the MMU is enabled.
>
> Is this just a case of "doing extra homework" (i.e., making sure you
> understand the repercussions of each flag setting)?  Or, do certain
> targets behave differently (thus *requiring* different settings)?
>

The latter.

When I took the perfectly working settings from the A10s for the
peripheral memory region to the BBB, the BBB locked up solid every time
the MMU was enabled.

Turns out that on the AM3359, the peripheral memory region must be marked
as shareable device or it simply will not work. Marking the region as
non-shareable device caused a solid lockup every time. This was not a
issue on the A10s.

>> A specific example: if you are experiencing device lockups when enabling
>> the MMU, try changing the device type attributes in the paging table for
>> the peripheral region.
>
> I assume you mean beyond the obvious "make sure the page is wired down",
> cacheability setting, etc.?
>

Oh, yes. I went through all those (and more) before discovering the
solution. I still cannot find anything which explains why the above
is required on the AM3359 but not on the A10s.

> Said another way (for all of the above), when you discover(ed) the
> source of the problem, did you slap your head and utter "D'oh!"
> (i.e., "damn, I should have known better!") *or* did you find
> yourself uncomfortably wondering why *that* fixed the problem?
>
> [The former I can deal with; the latter would leave me anxious!]
>

The latter. I could not find anything in the ARM architecture manuals,
the AM3359 TRM or other documents about why two Cortex-A8 MCUs behave
so differently. That makes me nervous.

>> 2) If you are using the MMU on a device with Security Extensions enabled,
>> don't forget that some register bits which are otherwise R/W become R/O
>> in Non-Secure mode.
>
> OK
>
>> 3) Don't forget that on ARMv7 class devices, some register updates may
>> be posted across a bus meaning they are not updated immediately. When
>> you turn on instruction and data caching, interrupt handling code can
>> run fast enough that you get a race condition with the interrupt hardware
>> firing the interrupt for a second time unless you use the usual DSB
>> instructions.
>>
>> I've seen this happen on the AM3359.
>
> I'm not sure I understand your point.  Can you embelish an example?
>

This is on the AM3359 with my own interrupt wrapper written in ARM
assembly which is executed when the IRQ exception vector is triggered.

The IRQ interrupt wrapper determines which interrupt handler to call
(UART, timer, etc) and calls it.

In the interrupt handler you write to a peripheral (say timer) register
to say you have handled the interrupt and then return back to the IRQ
interrupt wrapper.

The IRQ interrupt wrapper writes to the AM3359 interrupt registers
telling it the interrupt controller can search for a new interrupt.

When both instruction and data caching is turned on, my code runs
sufficiently fast that the write to the timer interrupt acknowledge
register is still making it's way across the bus and the interrupt
controller thinks the interrupt is still pending because there's no
longer a coherent view of resources.

The solution is to use a Data Synchronisation Barrier (DSB) instruction
sometime between writing the timer interrupt acknowledge register and
telling the the interrupt controller it can look for a new interrupt.

If you read the AM3359 Technical Reference Manual, you will see the use
of a DSB is discussed in relation to writing to the above mentioned
interrupt controller register and the same reasoning can apply to the
peripheral interrupt acknowledge registers as well.

>> 4) There's no longer any way to invalidate the whole data cache in one
>> go. You now have to do it by MVA or set/way.
>
> Hmmm.... this could be annoying.  OTOH, there are few cases where
> I would need to invalidate more than just a cache line, "typically".
> So, the extra cost/complexity may disappear in practice.
>
>>> Also, any pointers to particular silicon to avoid/favor
>>> in terms of potential problems in the MMU implementation?
>>
>> The AM3359 in the Beaglebone Black caused me way more trouble than the
>> Allwinner A10s did. However, the AM3359 is heavily documented (unlike
>> the Chinese jobs... :-()
>
> Was the "trouble" attributable to "learning curve"?  I.e., did the
> A10 benefit from "previous experience" on the BB?
>

The A10s, with it's poor documentation, came first for me.

I managed to figure things out on the A10s even with this poor
documentation and I still got tripped up on the BBB when I later
started playing with that.

Older ARM MCUs used to have such nice predictable behaviour...

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Reply by Don Y ●May 8, 20142014-05-08

Hi Simon,

On 5/7/2014 1:24 PM, Simon Clubley wrote:

>>> I've found the configuration on memory/device regions is much more
>>> sensitive/fragile than it is with ARMv5 devices when the MMU is enabled.
>>
>> Is this just a case of "doing extra homework" (i.e., making sure you
>> understand the repercussions of each flag setting)?  Or, do certain
>> targets behave differently (thus *requiring* different settings)?
>
> The latter.
>
> When I took the perfectly working settings from the A10s for the
> peripheral memory region to the BBB, the BBB locked up solid every time
> the MMU was enabled.

Sorry, my bad.  :<  By "targets" I meant "regions of memory" (i.e.,
different I/O devices in the same system).  It *appears* that the
settings you eventually came up with work *universally* for all
"(I/O) devices" within a given "MCU target" -- but, that the
settings for MCU target #1 differ from those for MCU target #2.

Is this a correct assessment?

> Turns out that on the AM3359, the peripheral memory region must be marked
> as shareable device or it simply will not work. Marking the region as
> non-shareable device caused a solid lockup every time. This was not a
> issue on the A10s.

Do all of the "(I/O) devices" on that part fit in a single page/map?
I.e., do you *replicate* the settings for the devices that reside
at one part of the address space to devices that reside at other
parts of the address space?   (or, do you throw them all in a "section")

>>> A specific example: if you are experiencing device lockups when enabling
>>> the MMU, try changing the device type attributes in the paging table for
>>> the peripheral region.
>>
>> I assume you mean beyond the obvious "make sure the page is wired down",
>> cacheability setting, etc.?
>
> Oh, yes. I went through all those (and more) before discovering the
> solution. I still cannot find anything which explains why the above
> is required on the AM3359 but not on the A10s.

<frown>  And, not likely you are going to have N other MCUs to compare
against (to determine *which* of these is the "exception").  :<

No help from manufacturer?  Forums?

Will the A10 "behave" if configured as the AM3359?  Or, does your
code make assumptions that require it to be configured thusly?

What are the design consequences of each configuration?

>> Said another way (for all of the above), when you discover(ed) the
>> source of the problem, did you slap your head and utter "D'oh!"
>> (i.e., "damn, I should have known better!") *or* did you find
>> yourself uncomfortably wondering why *that* fixed the problem?
>>
>> [The former I can deal with; the latter would leave me anxious!]
>
> The latter. I could not find anything in the ARM architecture manuals,
> the AM3359 TRM or other documents about why two Cortex-A8 MCUs behave
> so differently. That makes me nervous.

Agreed.  At the very least, have it documented as a "bug"/anomaly so
you can at least know that "they" are aware of it -- and, will either
act to preserve this behavior *or* alert folks to any *changes* to it.

>>> 3) Don't forget that on ARMv7 class devices, some register updates may
>>> be posted across a bus meaning they are not updated immediately. When
>>> you turn on instruction and data caching, interrupt handling code can
>>> run fast enough that you get a race condition with the interrupt hardware
>>> firing the interrupt for a second time unless you use the usual DSB
>>> instructions.
>>>
>>> I've seen this happen on the AM3359.
>>
>> I'm not sure I understand your point.  Can you embelish an example?
>
> This is on the AM3359 with my own interrupt wrapper written in ARM
> assembly which is executed when the IRQ exception vector is triggered.
>
> The IRQ interrupt wrapper determines which interrupt handler to call
> (UART, timer, etc) and calls it.
>
> In the interrupt handler you write to a peripheral (say timer) register
> to say you have handled the interrupt and then return back to the IRQ
> interrupt wrapper.

OK.

> The IRQ interrupt wrapper writes to the AM3359 interrupt registers
> telling it the interrupt controller can search for a new interrupt.

Ah... also makes sense.

> When both instruction and data caching is turned on, my code runs
> sufficiently fast that the write to the timer interrupt acknowledge
> register is still making it's way across the bus and the interrupt
> controller thinks the interrupt is still pending because there's no
> longer a coherent view of resources.
>
> The solution is to use a Data Synchronisation Barrier (DSB) instruction
> sometime between writing the timer interrupt acknowledge register and
> telling the the interrupt controller it can look for a new interrupt.

Logical choice (all else being equal) is to do so in the dispatcher
(as it allows the most time for any previous code to "complete")

> If you read the AM3359 Technical Reference Manual, you will see the use
> of a DSB is discussed in relation to writing to the above mentioned
> interrupt controller register and the same reasoning can apply to the
> peripheral interrupt acknowledge registers as well.

>>>> Also, any pointers to particular silicon to avoid/favor
>>>> in terms of potential problems in the MMU implementation?
>>>
>>> The AM3359 in the Beaglebone Black caused me way more trouble than the
>>> Allwinner A10s did. However, the AM3359 is heavily documented (unlike
>>> the Chinese jobs... :-()
>>
>> Was the "trouble" attributable to "learning curve"?  I.e., did the
>> A10 benefit from "previous experience" on the BB?
>
> The A10s, with it's poor documentation, came first for me.
>
> I managed to figure things out on the A10s even with this poor
> documentation and I still got tripped up on the BBB when I later
> started playing with that.
>
> Older ARM MCUs used to have such nice predictable behaviour...

Yes.  I think the Cortex-A's are suffering from a desire to follow
the "path" of other "big" (complex) processors (e.g., x86) along
with all their cruft.

One other question:  is your use of the MMU largely "static"
(i.e., set it and forget it); somewhat dynamic (using it to
create individual protection domains for different processes);
or even more "esoteric"?  The intent of this question being to
see how likely other "races" and anomalies are likely to have
been stumbled upon in your codebase.

Thanks!
--don

Previous12 Next

ARM's v7 MMU

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group