VME Auto system controller ID issue

We are using Motorola MVME5100 boards and VxWorks 6.3. We modified the 
delivered BSP to allow us to have shared memory windows across all processor 
cards. Now, our problem appears to be that the auto syscon feature of the 
board is not working properly. That is to say, if any controller, not in 
slot 0, is present and has auto-config jumper set to AUTO., then we see the 
following behavior:

syscontroller card0 can read/write into slave card N with no problems. Then, 
slave N can read/write into syscon0 shared memory. So far so good. Now, 
after a slave access of syscon slot 0 shared memory, any accesses across the 
VME bus, result in a hang. ie the sys controller in slot 0 can no longer 
read/write to slot N.

If the jumper on all cards (other than slot0) are set to NO SYSCON, then all 
accesses across the VME bus appear to function properly and there are no 
hangs. This indicates a hardware issue to us, but we are not 100% certain.

We verified the same behavior across multiple 5100 cards (with various RAM 
amounts)--get the same result. We also verified the same behavior with slot 
0 being a MVME6100 card and slot 1 being a MVME5100 card.

So... the questions are:

1) is this a known HW issue with MVME5100 cards?
2) if not, is there any possibility that the VxWorks BSP could cause the 
behavior?
3) can we conclude that both/all boards think they are system controller?
4) is there a SW fix to make the auto-config jumper work as intended?


Thanks,

Bo

Reply by William Dennen ●January 12, 20072007-01-12

On Mon, 08 Jan 2007 10:57:04 -0600, Bo opined:


> So... the questions are:
> 
> 1) is this a known HW issue with MVME5100 cards?
> 2) if not, is there any possibility that the VxWorks BSP could cause the 
> behavior?
> 3) can we conclude that both/all boards think they are system controller?
> 4) is there a SW fix to make the auto-config jumper work as intended?

I rather much wonder if this isn't some issue with the RMW mechanism being
used within VxWorks resulting in a lock on the local bus.  Such nasty
behavior isn't seen if the transfers are not into shared memory spaces. 
My recollection on the auto-config jumper is that it's sensed by the
Universe at initialization to determine if it needs to provide bus
arbitration and isn't used afterwards.  That this problem comes and goes
depending whether auto-configuration or not is selected does suggest
otherwise; but I doubt the root cause is the jumper setting.

Regards
-- 
>@<
Bill Dennen		wdennen@gmail.com
Cluelessness:  There are no stupid questions,
but there are a LOT of inquisitive idiots.
(despair.com)

Reply by Bo ●January 15, 20072007-01-15

"William Dennen" <wdennen@gmail.com> wrote in message 
news:eo901m$vbs$1@aioe.org...
> On Mon, 08 Jan 2007 10:57:04 -0600, Bo opined:
>
>
>> So... the questions are:
>>
>> 1) is this a known HW issue with MVME5100 cards?
>> 2) if not, is there any possibility that the VxWorks BSP could cause the
>> behavior?
>> 3) can we conclude that both/all boards think they are system controller?
>> 4) is there a SW fix to make the auto-config jumper work as intended?
>
> I rather much wonder if this isn't some issue with the RMW mechanism being
> used within VxWorks resulting in a lock on the local bus.

Good point Bill. Do you know how I can test/change VxWorks to confirm it is 
or isn't a RMW issue?


> Such nasty
> behavior isn't seen if the transfers are not into shared memory spaces.
> My recollection on the auto-config jumper is that it's sensed by the
> Universe at initialization to determine if it needs to provide bus
> arbitration and isn't used afterwards.

This is what I thought as well. I do recall at a previous employer we had 
similar issues with the same Tundra chip---and the workaround was extra crap 
that the BSP had to perform during initialization.... but I really don't 
want to go that route again if avoidable.

Thanks,

Bo

>That this problem comes and goes
> depending whether auto-configuration or not is selected does suggest
> otherwise; but I doubt the root cause is the jumper setting.
>
> Regards
> -- 
>>@<
> Bill Dennen wdennen@gmail.com
> Cluelessness:  There are no stupid questions,
> but there are a LOT of inquisitive idiots.
> (despair.com)

Reply by William Dennen ●January 16, 20072007-01-16

On Mon, 15 Jan 2007 10:54:28 -0600, Bo queried:

> 
> Good point Bill. Do you know how I can test/change VxWorks to confirm it is 
> or isn't a RMW issue?
> 

I wish I did; still don't have a good handle on how shared memory is
_really_ implemented in spite of mucking with it on and off for a number
of years.  The point is that if the memory spaces weren't shared the
transactions would succeed, otherwise Tundra wouldn't be able to sell chip
one.  I suspect your implementation is drawing out a latent defect in the
implementation of shared memory; I'm aware of another who encountered a
similar hang using a more standard configuration (but totally weird in
other ways).  That too is unresolved as far as I know.

Regards
-- 
>@<
Bill Dennen				wdennen@gmail.com
Cluelessness:  There are no stupid questions, but there are a LOT of inquisitive idiots.
(despair.com)

Reply by CBFalconer ●January 16, 20072007-01-16

William Dennen wrote:
> 
... snip ...
> 
> I wish I did; still don't have a good handle on how shared memory
> is _really_ implemented in spite of mucking with it on and off for
> a number of years.  The point is that if the memory spaces weren't
> shared the transactions would succeed, otherwise Tundra wouldn't be
> able to sell chip one.  I suspect your implementation is drawing
> out a latent defect in the implementation of shared memory; I'm
> aware of another who encountered a similar hang using a more
> standard configuration (but totally weird in other ways).  That
> too is unresolved as far as I know.

I have no idea whether this is applicable to the OP's problem, but
in general memory is shared as long as it is not written.  If a
process wants to write in it, the page table for that process is
modified to remap that portion, a copy of the original made, and
the write then proceeds.  That portion of the memory is then no
longer shared.

If the memory is truly shared, so that one processes writes show up
in other processes memory space, then various synchronization
protocols must be used.  This can involves semaphores, monitors,
critical sections, etc.

Threads are generally lightweight processeses, using memory shared
with other threads in the same process, and will need the
synchronization primitives to access it.

-- 
Chuck F (cbfalconer at maineline dot net)
   Available for consulting/temporary embedded and systems.
   <http://cbfalconer.home.att.net>

Reply by Bo ●January 17, 20072007-01-17

"CBFalconer" <cbfalconer@yahoo.com> wrote in message 
news:45AD929D.651D0282@yahoo.com...
>
> I have no idea whether this is applicable to the OP's problem, but
> in general memory is shared as long as it is not written.  If a
> process wants to write in it, the page table for that process is
> modified to remap that portion, a copy of the original made, and
> the write then proceeds.  That portion of the memory is then no
> longer shared.
>
> If the memory is truly shared, so that one processes writes show up
> in other processes memory space, then various synchronization
> protocols must be used.  This can involves semaphores, monitors,
> critical sections, etc.
>
> Threads are generally lightweight processeses, using memory shared
> with other threads in the same process, and will need the
> synchronization primitives to access it.
>
> -- 
> Chuck F (cbfalconer at maineline dot net)
>   Available for consulting/temporary embedded and systems.
>   <http://cbfalconer.home.att.net>
>

Yes the memory is truly shared. However, it is my understand that the RMW 
protection scheme across a VME backplane is implemented in hardware 
generally and that any HW that does not support RMW, the RMW protection 
scheme must be emulated by SW--resulting in a much slower transaction. In my 
particular case, it seems that the physical option jumper causes the HW to 
work/not work depending on its position-- which seems divorced from SW in my 
view. That is, if it was a SW issue, the problem would exist regardless of 
the HW jumper position.

I do find it odd that earlier board models (using the same TUndra chip) do 
not exhibit the problem.

Bo

Reply by William Dennen ●January 18, 20072007-01-18

Bo
I've an idea, if you've got sufficient hardware, that may shed some light
on where the problem is.  You need 4 boards, 2 5100s and 2 anything VME. 
Call the 5100s A & B, the others C & D.  Set up C & D so they can
read/write each other's memory and also either A or B.  NO shared memory
configured for these two.  You've got A & B already set up.  Create the
hang condition and then:
(1) can C & D still read/write each other?
(2) can either C or D read/write to either A or B?
(3) can either A or B read/write to either C or D?

The essence of what you're trying to determine is whether the system
controller function is hosed or not.  IF C & D can still read/write then
it is not and the hang condition is local to A/B.

Would I be correct in assuming that you've left the BSP configured for
a hardware TAS? (I can't remember the #define precisely, but if you mucked
with it, you know the one I mean).

Regards
-- 
>@<
Bill Dennen				wdennen@gmail.com
Cluelessness:  There are no stupid questions, but there are a LOT of inquisitive idiots.
(despair.com)

Reply by Bo ●January 23, 20072007-01-23

"William Dennen" <wdennen@gmail.com> wrote in message 
news:eopafk$s74$1@aioe.org...
> Bo
> I've an idea, if you've got sufficient hardware, that may shed some light
> on where the problem is.  You need 4 boards, 2 5100s and 2 anything VME.
> Call the 5100s A & B, the others C & D.  Set up C & D so they can
> read/write each other's memory and also either A or B.  NO shared memory
> configured for these two.  You've got A & B already set up.  Create the
> hang condition and then:
> (1) can C & D still read/write each other?
> (2) can either C or D read/write to either A or B?
> (3) can either A or B read/write to either C or D?
>
> The essence of what you're trying to determine is whether the system
> controller function is hosed or not.  IF C & D can still read/write then
> it is not and the hang condition is local to A/B.

1) C&D cannot read/write.
2) no
3) no

ie it 'appears' to be an honest-to-God hardware lock-up--from which only a 
power cycle will recover. Scary, huh?

>
> Would I be correct in assuming that you've left the BSP configured for
> a hardware TAS? (I can't remember the #define precisely, but if you mucked
> with it, you know the one I mean).

I don't think that TAS has been changed---at least not by me.

Thanks for the suggestions and help,

Bo

Reply by William Dennen ●January 25, 20072007-01-25

On Tue, 23 Jan 2007 12:46:58 -0600, Bo opined:

> "William Dennen" <wdennen@gmail.com> wrote in message 
> news:eopafk$s74$1@aioe.org...
>> Bo
>> I've an idea, if you've got sufficient hardware, that may shed some light
>> on where the problem is.  You need 4 boards, 2 5100s and 2 anything VME.
>> Call the 5100s A & B, the others C & D.  Set up C & D so they can
>> read/write each other's memory and also either A or B.  NO shared memory
>> configured for these two.  You've got A & B already set up.  Create the
>> hang condition and then:
>> (1) can C & D still read/write each other?
>> (2) can either C or D read/write to either A or B?
>> (3) can either A or B read/write to either C or D?
>>
>> The essence of what you're trying to determine is whether the system
>> controller function is hosed or not.  IF C & D can still read/write then
>> it is not and the hang condition is local to A/B.
> 
> 1) C&D cannot read/write.
> 2) no
> 3) no
> 
> ie it 'appears' to be an honest-to-God hardware lock-up--from which only a 
> power cycle will recover. Scary, huh?
> 
> Bo

Indeed it's scary and smells of an errata, it appears that the system
controller has left the scene.  I would recommend getting Tundra to look
at the issue.  I'm sure they'll want a dump of the Universe and a trace if
you've got the capability.  You can initiate the dialog at
http://www.tundra.com/support.aspx?bid=481&id=962.  Hopefully they can
simulate the sequence ...

Regards
-- 
>@<
Bill Dennen				wdennen@gmail.com
Cluelessness:  There are no stupid questions, but there are a LOT of inquisitive idiots.
(despair.com)

VME Auto system controller ID issue

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group