Cortex-M3 vs PIC32 divide instruction

I've finally been considering a project to use either a
Cortex-M3 or a PIC32 processor and I've a technical question
unrelated to any "business issues" between these options --
the divide instruction operation.  Both of these cores
include one but I'm interested in any remarkable technical
details between them, including cycle counts but not limited
to that (load-store time is fair game.)

From what I've been able to garner from skimming the docs,
the Cortex-M3's MDU executes an SDIV or UDIV in anywhere from
2 to 12 clock cycles, but with a comment suggesting that it
takes less time when the operand sizes are similar.  Which
doesn't tell me what the typical time may be.  Also, it's
been a bit of a pain searching for good assembler docs on the
Cortex-M3.  But I've only been at it for about an hour or so,
so it's likely I am just slow and ignorant -- not that there
aren't good caches out there I should have found.

On the PIC32, the docs are clearer.  It's "one bit per clock"
and it includes an "early detection" of sign/zero bits in the
upper bytes to help goose that along where 7, 15, or 23 bits
worth might be skipped.  Worst case, it says, is 35 clocks.
It also stalls the 5-stage pipe if another division is issued
before the earlier one completes.

I am wondering if anyone has had direct experience playing
with either of these in the area of writing floating point
libraries and has had a chance to compare their relative
utility for that purpose and might comment on any relatively
significant details related to that effort -- speed being the
main question here.

At first blush, I'd say <=12 clocks is better than <=35.  But
there may be other issues.  And while the PIC32 approach is
something I already know how it must be done internally, I'm
curious about exactly what method is used in the Cortex-M3
approach for its division operation -- it's not clear to me.
(VHDL or Verilog code would make that very clear to me, if
anyone has it or a pseudo version of it.)

Jon

Reply by David Brown ●September 6, 20112011-09-06

On 06/09/2011 09:39, Jon Kirwan wrote:
> I've finally been considering a project to use either a
> Cortex-M3 or a PIC32 processor and I've a technical question
> unrelated to any "business issues" between these options --
> the divide instruction operation.  Both of these cores
> include one but I'm interested in any remarkable technical
> details between them, including cycle counts but not limited
> to that (load-store time is fair game.)
>
>  From what I've been able to garner from skimming the docs,
> the Cortex-M3's MDU executes an SDIV or UDIV in anywhere from
> 2 to 12 clock cycles, but with a comment suggesting that it
> takes less time when the operand sizes are similar.  Which
> doesn't tell me what the typical time may be.  Also, it's
> been a bit of a pain searching for good assembler docs on the
> Cortex-M3.  But I've only been at it for about an hour or so,
> so it's likely I am just slow and ignorant -- not that there
> aren't good caches out there I should have found.
>
> On the PIC32, the docs are clearer.  It's "one bit per clock"
> and it includes an "early detection" of sign/zero bits in the
> upper bytes to help goose that along where 7, 15, or 23 bits
> worth might be skipped.  Worst case, it says, is 35 clocks.
> It also stalls the 5-stage pipe if another division is issued
> before the earlier one completes.
>
> I am wondering if anyone has had direct experience playing
> with either of these in the area of writing floating point
> libraries and has had a chance to compare their relative
> utility for that purpose and might comment on any relatively
> significant details related to that effort -- speed being the
> main question here.
>
> At first blush, I'd say<=12 clocks is better than<=35.  But
> there may be other issues.  And while the PIC32 approach is
> something I already know how it must be done internally, I'm
> curious about exactly what method is used in the Cortex-M3
> approach for its division operation -- it's not clear to me.
> (VHDL or Verilog code would make that very clear to me, if
> anyone has it or a pseudo version of it.)
>
> Jon

There are many tricks that can be employed with hardware division to 
make it faster in all or some cases - there is no good way to guess how 
they are implemented in these two cpu's.  But there will not be any 
"hidden issues" - the division instructions on both architectures work, 
they are both slow, and the time varies depending on the operands in a 
way that is difficult to predict and virtually impossible to utilise. 
And in both cases, the timing of the divide instruction will be only a 
small part of a software floating point division routing - the 
variations between different toolchain's floating point routines will be 
much higher than the variation between run-times for divide on either 
processor.

I don't know what more you are looking for.  If you want to divide 
unknown integers, using the cpu's divide instruction.  If you want to 
divide by a known constant integer, let the compiler handle it - either 
it will use the hardware divide instruction, or it will do something 
fancier like multiplying by the reciprocal scaled by a power of two. 
Knowing the nasty details of the hardware division implementation will 
not change that.

If you want to do very fast floating point, get a processor that has 
hardware floating point (Cortex-M4 will be available soon, there are 
real MIPS cpu's available instead of PIC32, there are plenty of 
PPC-based microcontrollers with hardware floating point, etc.).

Reply by Jon Kirwan ●September 6, 20112011-09-06

On Tue, 06 Sep 2011 09:54:00 +0200, David Brown
<david@westcontrol.removethisbit.com> wrote:

>On 06/09/2011 09:39, Jon Kirwan wrote:
>> I've finally been considering a project to use either a
>> Cortex-M3 or a PIC32 processor and I've a technical question
>> unrelated to any "business issues" between these options --
>> the divide instruction operation.  Both of these cores
>> include one but I'm interested in any remarkable technical
>> details between them, including cycle counts but not limited
>> to that (load-store time is fair game.)
>>
>>  From what I've been able to garner from skimming the docs,
>> the Cortex-M3's MDU executes an SDIV or UDIV in anywhere from
>> 2 to 12 clock cycles, but with a comment suggesting that it
>> takes less time when the operand sizes are similar.  Which
>> doesn't tell me what the typical time may be.  Also, it's
>> been a bit of a pain searching for good assembler docs on the
>> Cortex-M3.  But I've only been at it for about an hour or so,
>> so it's likely I am just slow and ignorant -- not that there
>> aren't good caches out there I should have found.
>>
>> On the PIC32, the docs are clearer.  It's "one bit per clock"
>> and it includes an "early detection" of sign/zero bits in the
>> upper bytes to help goose that along where 7, 15, or 23 bits
>> worth might be skipped.  Worst case, it says, is 35 clocks.
>> It also stalls the 5-stage pipe if another division is issued
>> before the earlier one completes.
>>
>> I am wondering if anyone has had direct experience playing
>> with either of these in the area of writing floating point
>> libraries and has had a chance to compare their relative
>> utility for that purpose and might comment on any relatively
>> significant details related to that effort -- speed being the
>> main question here.
>>
>> At first blush, I'd say<=12 clocks is better than<=35.  But
>> there may be other issues.  And while the PIC32 approach is
>> something I already know how it must be done internally, I'm
>> curious about exactly what method is used in the Cortex-M3
>> approach for its division operation -- it's not clear to me.
>> (VHDL or Verilog code would make that very clear to me, if
>> anyone has it or a pseudo version of it.)
>>
>> Jon
>
>There are many tricks that can be employed with hardware division to 
>make it faster in all or some cases - there is no good way to guess how 
>they are implemented in these two cpu's.  But there will not be any 
>"hidden issues" - the division instructions on both architectures work, 
>they are both slow, and the time varies depending on the operands in a 
>way that is difficult to predict and virtually impossible to utilise. 
>And in both cases, the timing of the divide instruction will be only a 
>small part of a software floating point division routing - the 
>variations between different toolchain's floating point routines will be 
>much higher than the variation between run-times for divide on either 
>processor.
>
>I don't know what more you are looking for.  If you want to divide 
>unknown integers, using the cpu's divide instruction.  If you want to 
>divide by a known constant integer, let the compiler handle it - either 
>it will use the hardware divide instruction, or it will do something 
>fancier like multiplying by the reciprocal scaled by a power of two. 
>Knowing the nasty details of the hardware division implementation will 
>not change that.
>
>If you want to do very fast floating point, get a processor that has 
>hardware floating point (Cortex-M4 will be available soon, there are 
>real MIPS cpu's available instead of PIC32, there are plenty of 
>PPC-based microcontrollers with hardware floating point, etc.).

I have other reasons that factor into this decision that
preclude any other choice, right now.  I'm not looking for
the fastest FP, anyway.  So that's not the primary goal here.
I am curious about the details.  That's all.  And I'd like to
make my _own_ judgment, not simply compare other peoples' FP
packages that already exist.  I'm looking at gaining a deep
understanding of these two processors' approaches in the
NARROW case of these particular instructions.

I do not need an education about "time varies" and "let the
compiler handle it."  You should know me well enough by now
for that.  I'm already prepared to examine flash, sram, and
cache issues.  I need to know the specific details here. Part
of where I may be going is into things you may not think to
consider, such as interrupt latency, for example, or simply
for self-education about how the Cortex-M3 does it (I already
_know_ how the PIC32 does it internally.)  Don't presume too
much about my purposes -- they are not run of the mill at the
very least.

I simply need very detailed information.  I've been having a
little difficultly laying hands on it in the Cortex-M3 case.
I'm hoping someone can point me well.

But thanks for the time.  It is appreciated.

Jon

Reply by Arlet Ottens ●September 6, 20112011-09-06

On 09/06/2011 09:39 AM, Jon Kirwan wrote:
> I've finally been considering a project to use either a
> Cortex-M3 or a PIC32 processor and I've a technical question
> unrelated to any "business issues" between these options --
> the divide instruction operation.  Both of these cores
> include one but I'm interested in any remarkable technical
> details between them, including cycle counts but not limited
> to that (load-store time is fair game.)
>
>  From what I've been able to garner from skimming the docs,
> the Cortex-M3's MDU executes an SDIV or UDIV in anywhere from
> 2 to 12 clock cycles, but with a comment suggesting that it
> takes less time when the operand sizes are similar.  Which
> doesn't tell me what the typical time may be.  Also, it's
> been a bit of a pain searching for good assembler docs on the
> Cortex-M3.  But I've only been at it for about an hour or so,
> so it's likely I am just slow and ignorant -- not that there
> aren't good caches out there I should have found.
>
> On the PIC32, the docs are clearer.  It's "one bit per clock"
> and it includes an "early detection" of sign/zero bits in the
> upper bytes to help goose that along where 7, 15, or 23 bits
> worth might be skipped.  Worst case, it says, is 35 clocks.
> It also stalls the 5-stage pipe if another division is issued
> before the earlier one completes.
>

In the ARM reference there's the following comment: "Division operations 
use early termination to minimize the number of cycles required based on 
the number of leading ones and zeroes in the input operands."

That looks similar to what the PIC32 does, but with more bits/cycle.

Reply by David Brown ●September 6, 20112011-09-06

On 06/09/2011 11:45, Jon Kirwan wrote:
> On Tue, 06 Sep 2011 09:54:00 +0200, David Brown
> <david@westcontrol.removethisbit.com>  wrote:
>
>> On 06/09/2011 09:39, Jon Kirwan wrote:
>>> I've finally been considering a project to use either a
>>> Cortex-M3 or a PIC32 processor and I've a technical question
>>> unrelated to any "business issues" between these options --
>>> the divide instruction operation.  Both of these cores
>>> include one but I'm interested in any remarkable technical
>>> details between them, including cycle counts but not limited
>>> to that (load-store time is fair game.)
>>>
>>>   From what I've been able to garner from skimming the docs,
>>> the Cortex-M3's MDU executes an SDIV or UDIV in anywhere from
>>> 2 to 12 clock cycles, but with a comment suggesting that it
>>> takes less time when the operand sizes are similar.  Which
>>> doesn't tell me what the typical time may be.  Also, it's
>>> been a bit of a pain searching for good assembler docs on the
>>> Cortex-M3.  But I've only been at it for about an hour or so,
>>> so it's likely I am just slow and ignorant -- not that there
>>> aren't good caches out there I should have found.
>>>
>>> On the PIC32, the docs are clearer.  It's "one bit per clock"
>>> and it includes an "early detection" of sign/zero bits in the
>>> upper bytes to help goose that along where 7, 15, or 23 bits
>>> worth might be skipped.  Worst case, it says, is 35 clocks.
>>> It also stalls the 5-stage pipe if another division is issued
>>> before the earlier one completes.
>>>
>>> I am wondering if anyone has had direct experience playing
>>> with either of these in the area of writing floating point
>>> libraries and has had a chance to compare their relative
>>> utility for that purpose and might comment on any relatively
>>> significant details related to that effort -- speed being the
>>> main question here.
>>>
>>> At first blush, I'd say<=12 clocks is better than<=35.  But
>>> there may be other issues.  And while the PIC32 approach is
>>> something I already know how it must be done internally, I'm
>>> curious about exactly what method is used in the Cortex-M3
>>> approach for its division operation -- it's not clear to me.
>>> (VHDL or Verilog code would make that very clear to me, if
>>> anyone has it or a pseudo version of it.)
>>>
>>> Jon
>>
>> There are many tricks that can be employed with hardware division to
>> make it faster in all or some cases - there is no good way to guess how
>> they are implemented in these two cpu's.  But there will not be any
>> "hidden issues" - the division instructions on both architectures work,
>> they are both slow, and the time varies depending on the operands in a
>> way that is difficult to predict and virtually impossible to utilise.
>> And in both cases, the timing of the divide instruction will be only a
>> small part of a software floating point division routing - the
>> variations between different toolchain's floating point routines will be
>> much higher than the variation between run-times for divide on either
>> processor.
>>
>> I don't know what more you are looking for.  If you want to divide
>> unknown integers, using the cpu's divide instruction.  If you want to
>> divide by a known constant integer, let the compiler handle it - either
>> it will use the hardware divide instruction, or it will do something
>> fancier like multiplying by the reciprocal scaled by a power of two.
>> Knowing the nasty details of the hardware division implementation will
>> not change that.
>>
>> If you want to do very fast floating point, get a processor that has
>> hardware floating point (Cortex-M4 will be available soon, there are
>> real MIPS cpu's available instead of PIC32, there are plenty of
>> PPC-based microcontrollers with hardware floating point, etc.).
>
> I have other reasons that factor into this decision that
> preclude any other choice, right now.  I'm not looking for
> the fastest FP, anyway.  So that's not the primary goal here.
> I am curious about the details.  That's all.  And I'd like to
> make my _own_ judgment, not simply compare other peoples' FP
> packages that already exist.  I'm looking at gaining a deep
> understanding of these two processors' approaches in the
> NARROW case of these particular instructions.
>
> I do not need an education about "time varies" and "let the
> compiler handle it."  You should know me well enough by now
> for that.

Yes, I know that - that made it a particularly odd question from you.

>  I'm already prepared to examine flash, sram, and
> cache issues.  I need to know the specific details here. Part
> of where I may be going is into things you may not think to
> consider, such as interrupt latency, for example, or simply
> for self-education about how the Cortex-M3 does it (I already
> _know_ how the PIC32 does it internally.)  Don't presume too
> much about my purposes -- they are not run of the mill at the
> very least.
>

When you ask for unusual information like this, the real purpose is 
important - otherwise I can only guess that it is /pure/ curiosity (and 
I can understand that as a reason, and wish I could help you there).

> I simply need very detailed information.  I've been having a
> little difficultly laying hands on it in the Cortex-M3 case.
> I'm hoping someone can point me well.

I would be surprised if you can get the detailed information you would 
like - such implementation details tend to be well hidden from mere mortals.

One thing you might be able to find out about is how the division 
affects pipelining - but on an M3, with its short pipeline, that won't 
make a big difference.

Regarding interrupts, AFAIK instructions on the M3 (and MIPS) are not 
interruptable (unlike some m68k cpus, for example), so maximum interrupt 
latency will be affected by division instructions.

>
> But thanks for the time.  It is appreciated.
>
> Jon

Reply by ●September 6, 20112011-09-06

Jon Kirwan <jonk@infinitefactors.org> wrote:
> Also, it's been a bit of a pain searching for good assembler docs on the
> Cortex-M3.  But I've only been at it for about an hour or so,
> so it's likely I am just slow and ignorant -- not that there
> aren't good caches out there I should have found.

You want the ARMv7-M Architecture Reference Manual off of ARM's website.

-a

Reply by Jon Kirwan ●September 6, 20112011-09-06

On Tue, 6 Sep 2011 10:32:53 +0000 (UTC),
Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:

>Jon Kirwan <jonk@infinitefactors.org> wrote:
>> Also, it's been a bit of a pain searching for good assembler docs on the
>> Cortex-M3.  But I've only been at it for about an hour or so,
>> so it's likely I am just slow and ignorant -- not that there
>> aren't good caches out there I should have found.
>
>You want the ARMv7-M Architecture Reference Manual off of ARM's website.

I think I have that for the assembly part of things.  If you
are referring to the near-end where the Appendices are at,
then I'm already aware of those sections (B, C, F, G, H.)  I
did also look at the timing information in Chapter 18-1, for
example, of DDI0337 on the Cortex-M3 for r1p1, r2p0, and
r2p1.  Though perhaps I haven't read it well enough.

I think I have been there.  But I may have missed something,
too, and I appreciate the suggestion

Jon

Reply by Jon Kirwan ●September 6, 20112011-09-06

On Tue, 06 Sep 2011 12:21:08 +0200, David Brown
<david@westcontrol.removethisbit.com> wrote:

>On 06/09/2011 11:45, Jon Kirwan wrote:
>> On Tue, 06 Sep 2011 09:54:00 +0200, David Brown
>> <david@westcontrol.removethisbit.com>  wrote:
>>
>>> On 06/09/2011 09:39, Jon Kirwan wrote:
>>>> I've finally been considering a project to use either a
>>>> Cortex-M3 or a PIC32 processor and I've a technical question
>>>> unrelated to any "business issues" between these options --
>>>> the divide instruction operation.  Both of these cores
>>>> include one but I'm interested in any remarkable technical
>>>> details between them, including cycle counts but not limited
>>>> to that (load-store time is fair game.)
>>>>
>>>>   From what I've been able to garner from skimming the docs,
>>>> the Cortex-M3's MDU executes an SDIV or UDIV in anywhere from
>>>> 2 to 12 clock cycles, but with a comment suggesting that it
>>>> takes less time when the operand sizes are similar.  Which
>>>> doesn't tell me what the typical time may be.  Also, it's
>>>> been a bit of a pain searching for good assembler docs on the
>>>> Cortex-M3.  But I've only been at it for about an hour or so,
>>>> so it's likely I am just slow and ignorant -- not that there
>>>> aren't good caches out there I should have found.
>>>>
>>>> On the PIC32, the docs are clearer.  It's "one bit per clock"
>>>> and it includes an "early detection" of sign/zero bits in the
>>>> upper bytes to help goose that along where 7, 15, or 23 bits
>>>> worth might be skipped.  Worst case, it says, is 35 clocks.
>>>> It also stalls the 5-stage pipe if another division is issued
>>>> before the earlier one completes.
>>>>
>>>> I am wondering if anyone has had direct experience playing
>>>> with either of these in the area of writing floating point
>>>> libraries and has had a chance to compare their relative
>>>> utility for that purpose and might comment on any relatively
>>>> significant details related to that effort -- speed being the
>>>> main question here.
>>>>
>>>> At first blush, I'd say<=12 clocks is better than<=35.  But
>>>> there may be other issues.  And while the PIC32 approach is
>>>> something I already know how it must be done internally, I'm
>>>> curious about exactly what method is used in the Cortex-M3
>>>> approach for its division operation -- it's not clear to me.
>>>> (VHDL or Verilog code would make that very clear to me, if
>>>> anyone has it or a pseudo version of it.)
>>>>
>>>> Jon
>>>
>>> There are many tricks that can be employed with hardware division to
>>> make it faster in all or some cases - there is no good way to guess how
>>> they are implemented in these two cpu's.  But there will not be any
>>> "hidden issues" - the division instructions on both architectures work,
>>> they are both slow, and the time varies depending on the operands in a
>>> way that is difficult to predict and virtually impossible to utilise.
>>> And in both cases, the timing of the divide instruction will be only a
>>> small part of a software floating point division routing - the
>>> variations between different toolchain's floating point routines will be
>>> much higher than the variation between run-times for divide on either
>>> processor.
>>>
>>> I don't know what more you are looking for.  If you want to divide
>>> unknown integers, using the cpu's divide instruction.  If you want to
>>> divide by a known constant integer, let the compiler handle it - either
>>> it will use the hardware divide instruction, or it will do something
>>> fancier like multiplying by the reciprocal scaled by a power of two.
>>> Knowing the nasty details of the hardware division implementation will
>>> not change that.
>>>
>>> If you want to do very fast floating point, get a processor that has
>>> hardware floating point (Cortex-M4 will be available soon, there are
>>> real MIPS cpu's available instead of PIC32, there are plenty of
>>> PPC-based microcontrollers with hardware floating point, etc.).
>>
>> I have other reasons that factor into this decision that
>> preclude any other choice, right now.  I'm not looking for
>> the fastest FP, anyway.  So that's not the primary goal here.
>> I am curious about the details.  That's all.  And I'd like to
>> make my _own_ judgment, not simply compare other peoples' FP
>> packages that already exist.  I'm looking at gaining a deep
>> understanding of these two processors' approaches in the
>> NARROW case of these particular instructions.
>>
>> I do not need an education about "time varies" and "let the
>> compiler handle it."  You should know me well enough by now
>> for that.
>
>Yes, I know that - that made it a particularly odd question from you.
>
>>  I'm already prepared to examine flash, sram, and
>> cache issues.  I need to know the specific details here. Part
>> of where I may be going is into things you may not think to
>> consider, such as interrupt latency, for example, or simply
>> for self-education about how the Cortex-M3 does it (I already
>> _know_ how the PIC32 does it internally.)  Don't presume too
>> much about my purposes -- they are not run of the mill at the
>> very least.
>
>When you ask for unusual information like this, the real purpose is 
>important - otherwise I can only guess that it is /pure/ curiosity (and 
>I can understand that as a reason, and wish I could help you there).

The purpose is due diligence and to illuminate speculations I
may yet develop.  It's not a crystal clear process that I can
readily explain.  But I do know _what_ I want to know.

If it helps, imagine that I'd like to develop a cycle-
accurate simulator.

>> I simply need very detailed information.  I've been having a
>> little difficultly laying hands on it in the Cortex-M3 case.
>> I'm hoping someone can point me well.
>
>I would be surprised if you can get the detailed information you would 
>like - such implementation details tend to be well hidden from mere mortals.

Appears to be hidden from me, tonight.  So maybe you are
right.

I _am_ able to garner better information from the M4k.  I
still need to find out if the DIV can be interrupted.

>One thing you might be able to find out about is how the division 
>affects pipelining - but on an M3, with its short pipeline, that won't 
>make a big difference.

Yes, 3 stage vs 5 stage on the M4k.  I also took note that
Microchip licensed the M14k, too.

>Regarding interrupts, AFAIK instructions on the M3 (and MIPS) are not 
>interruptable (unlike some m68k cpus, for example), so maximum interrupt 
>latency will be affected by division instructions.

Yes, that is one of several considerations I have in mind.
Only one of them.  But an important one.  I am not yet
certain about the M4k on this point.

Anyway, thanks for the thoughts.  I will see what I can find
out there.  It is an omen that you don't know.  So that
suggests your earlier point about the difficulty here may be
correct.

Jon

>>
>> But thanks for the time.  It is appreciated.
>>
>> Jon

Reply by Jon Kirwan ●September 6, 20112011-09-06

On Tue, 06 Sep 2011 12:21:08 +0200, David Brown
<david@westcontrol.removethisbit.com> wrote:

><snip>
>Regarding interrupts, AFAIK instructions on the M3 (and MIPS) are not 
>interruptable
><snip>

So far, I've found the phrase "Autonomous multiply/divide
unit" in the datasheet for the 5xx, 6xx, and 7xx units from
Microchip.  Their dual bus choice also supports transaction
aborts to improve interrupt latency.  I already know that
issuing another MDU instruction before an earlier divide has
completed will result in an "IU pipeline stall."  But this
doesn't make it clear what happens if another MDU instruction
is NOT issued in the interrupt routine, for example.  It may
be possible that the "autonomous" unit works in parallel, so
long as no attempt is made to access the MDU until it is
done.  If so, that would be fine to learn.

I'll write Microchip on this point to get clarification.  You
may be right about all this.  Might as well dot that i, cross
that t.

BTW, I am also considering porting my own O/S to either the
Cortex-M3 or the PIC32.  But again, this is only one facet of
what I'm thinking about.  it is NOT the totality.  But this
question is germane here, too.

Jon

Reply by ●September 6, 20112011-09-06

Jon Kirwan <jonk@infinitefactors.org> wrote:

> I still need to find out if the DIV can be interrupted.

Footnote e to table 18-1 in the Cortex-M3 r2p0 TRM states that
"DIV is interruptible (abandoned/restarted), with worst case latency of
one cycle."

-a

Previous12 3 4 5 Next

Cortex-M3 vs PIC32 divide instruction

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group