On Dec 31 2008, 8:53=A0pm, eliben <eli...@gmail.com> wrote:
> > I need an MCU with 4 UART (@ 38.4 KBaud each) and several IOs. The 4
> > UARTs is a problem, because the simplest MCUs (PIC, AVR) don't have
> > chips with this amount (AFAIK).
>
> Sorry about the self reply, but I've just found that AVR have the
> 640/1280/2560 families, which seem to have 4 UARTs. Does anyone have
> experience using these chips ?

You could also look at the now sampling  ATXMEGA128A1-AU -

The xMega series show EIGHT (!) uarts @ 100 pins,
7 Uarts @ 64 pins and 5 UARTS @ 44 pins.

-jg

On Jan 3, 2:40=A0pm, David Brown
<david.br...@hesbynett.removethisbit.no> wrote:
> Rocky wrote:
> > On Jan 2, 10:23 pm, David Brown
> > <david.br...@hesbynett.removethisbit.no> wrote:
> >> Jeff Fox wrote:
> >>> On Dec 31 2008, 8:23 am, Vladimir Vassilevsky
> >>> <antispam_bo...@hotmail.com> wrote:
> >>>> 5) Fast AVR should be able to handle 4 independent UARTs at 38400 as=
 the
> >>>> software bit banging.
> >>> What do you think is the upper baud limit for 1 to 4 software bit
> >>> banging
> >>> UART on a fast AVR before a hardware UART is needed?
>
> > <snip>
>
> >> I wrote a 38.4 kbaud software UART on an AVR at 7.37 MHz with 4 times
> >> oversampling. =A0That meant a timer running at 153.6 kHz, with 48
> >> processor clocks between ticks. =A0That's not a lot of time, but easil=
y
> >> enough for the software UART written in assembly.
>
> > 3 times oversampling actually gives better results than 4 times. I
> > know it seems wierd, but it actually gives sampling that is closer to
> > the bit center than 4 times oversampling. It also has less processor
> > overhead and works fine with a 7372800 Hz clock.
>
> I'm not sure that's correct - but it's certainly worth thinking about.
>
> The key synchronisation point is the start bit - from when the line
> drops at the start of the start bit, your ideal sampling point is then
> half a baud time later.
>
> If you are sampling at four times the baud rate, then your sampling
> point becomes 2 Q after you first detect the start of the start bit (you
> can also sample the start bit after 1 Q as well, as extra noise
> resistance). =A0The true start bit started somewhere between -1 Q and 0 Q=
,
> depending on the exact synchronisation, so the ideal sampling point is
> somewhere between 1 Q and 2 Q. =A0Thus sampling at 2 Q is the best you ca=
n do.
>
> If you are sampling at three times the baud rate, the ideal sampling
> point will be between 0.5 Q and 1.5 Q. =A0Sampling at the point 1 Q is
> then in the middle of the true ideal sampling point range. =A0I believe
> this is what you are thinking about as being a better sample point.
>
> I am far from convinced that this is a better idea - I think you are
> better sampling late (between 0 and 1 Q late rather than risk sampling
> early (between -0.5 Q and +0.5 Q - or using the same time scale as the
> four-times oversampling, between -0.67 Q and +0.67 Q). =A0The reason for
> this is that any instability in the sampling is much more likely to
> occur in the early part of the bit, rather than the late part.
> Consider, for example, if the driver, line capacitance and termination
> of the line is such that driving the line to 0 is faster than driving
> the line to 1 (this is the case for CAN drivers, for example - even
> though they are not normally used without a CAN controller, the
> principle is the same). =A0In this case, you would see the 0 values early=
,
> and the 1 values late - your 3-times oversampling may will miss the
> first 1 after a 0 as it takes longer to propagate.
>
Bear in mind the baud rates are much slower than CAN, so effects that
come from reflections and from open-collector driven lines are not a
big issue.

With 4x you can choose to sample the data at between 25% and 50% of
the bit period or between 50% and 75%.
If you use 3x the sample point moves to be between 33% and 67% of the
bit period. This is about 8% closer to the bit center.
It makes it a little more tolerant of bit (CPU) clock, but the big
gain is the drop in processor overhead.

In practice it has worked well.

Ben Bradley wrote:
>    Perhaps the "Propellor" thing deserves it's own thread. I've looked
snip
> Is each core the "standard" Von Neumann architecture (program and data
> inabit different areas of the same address space), or is it Harvard
> (program and data are on separate busses and thus can be accessed
> simultaneously for greater speed, like DSP's, the AVR and several
> other microcontrollers)?

Quick answers? That is a little hard to do but I'll try :)

Each core or cog is a tiny 32-bit CPU with it's own RAM and I/O ports.
The Von Neumann architecture is the standard combined memory for program
and data which is what the Propeller cogs are. However, unlike
conventional CPUs the Propeller does not have any CPU registers but
instead the cog RAM combines the program, data, and "registers". That
doesn't slow things down though as most instructions take 4 clock cycles
so that they run at 20MIPs on a standard 80MHz clock (5MHz x16).

The only instruction that can be really slow are the hub operations
which can take from 7 to 22 clock cycles in case they miss their access
slot. This can be optimized in assembler loops to squeeze a couple of
instructions between each hub operation but most cogs only need to
access the hub RAM to update global buffers and variables otherwise
they can operate totally internally.

There are absolutely no interrupts and why should there be anyway?
You have cogs which you can dedicate to the function that is required
that normally could only be handled by interrupts on conventional CPUs.
A powerful result of this is that each cog can be temporally deterministic
and not at all encumbered with such kludges as "interrupts". Any load that
an individual cog bears is not one that other cogs have to bear unless you
want them too. I find it so much easier to debug and validate the software
now that I do not have to deal with interrupts and the strange things
that can happen as a result of indeterministic clashes.

The "UART" cog receives, transmits, and processes the data and handles the
buffers in hub memory all in a transparent manner to the other cogs, so
then why would you need or want interrupts? I have used my "UART" cogs at
speeds of up to 2M baud with no negative impact at all on the other cogs.
Objects have been written that permit 8Mbps coms between Propeller chips.

>    What's the programming language? Is there an assembler? What's the
> architecture of each core (or "cog") look like (number of registers,
> how they can be used in different instructions and addressing modes,
> approximate average cycles per instruction, etc.)? 

Most of the code is written in a PASCAL like syntax called SPIN. The SPIN
IDE compiler are free.
Here is a very basic code sample from the Propeller Manual tutorial that
can be compiled, loaded, and running within a second or so with one keystroke.

****begin code****

{{ Output.spin }}

PUB Toggle
   dira[16]~~                       'set I/O 16 as an output in this cog (short-form )
   repeat                           'start an infinite loop (indentation IS code)
     !outa[16]                      'toggle output 16
     waitcnt(3_000_000 + cnt)       'wait until system clock has advanced by 3000000

****end code****

Note that this code example will run but it doesn't tell the Propeller what to do
with the clock which will by default revert to the internal 12MHz RC clock.

This is the header normally used to set the clock.

****begin code****

CON
   _clkmode = xtal1 + pll16x	' use low-speed crystal with the PLL set to 16x
   _xinfreq = 5_000_000          ' 5MHz crystal x16 = 80MHz

****end code****

The assembler code doesn't actually run in the same cog that is running SPIN code
but is loaded into cogs as is the case with objects such as the UART function.

Here is a snippet of PASM code that is part of the TV object:-

****begin code****
                         mov     screen,_screen          'point to first tile (upper-leftmost)
                         mov     y,_vt                   'set vertical tiles (y=vt)
:line                   mov     vx,_vx                  'set vertical expand
:vert   if_z            xor     interlace,#1            'interlace skip?
         if_z            tjz     interlace,#:skip
                         call    #hsync                  'do hsync

****and****

         if_z            xor     interlace,#1    wz      'get interlace and field1 into z
                         test    _mode,#%0001    wc      'do visible front porch lines
                         mov     x,vf
         if_nz_and_c     add     x,#1
                         call    #blank_lines

****end code****

Now screen, _screen, y, vt etc are variables located in the cog memory map which is
always 32-bits wide and addressed with 9-bit source and destinations embedded in each
instruction which means that each cog's memory is limited to 512 32-bit words but that
it can directly address these without further reference.

Once again given the compactness of the conditional and modifiable (yes) instruction
it turns out that is plenty of program memory for each cog.

The high-level Spin language itself as opposed to the PASM code (which may be embedded
in sections of the Spin code) actually compiles byte tokens that reside in the 32K hub memory
and are executed by an interpreter loaded into one or more of the cogs. The 32K ROM has a lot
of the higher level functions that are called by the Spin code. It all sounds strange but
it works and you don't have to be concerned with these details anyway.

> Is there a C
> compiler for it? Might there be a GCC port for it? (others might want
> to ask about C++, but for me it's a little hard to imagine using C++
> for a microcontroller).

Imagecraft have come out with a C compiler.
http://www.imagecraft.com/devtools_Propeller.html

>    I did read some blurb on the Parallax site by the designer, that it
> is what it is, that he hand-designed the thing rather than using the
> usual hardware-design tools and HDL's, and there won't be a bunch of
> slightly different verssions with different peripherals and such. Here
> it is:
> http://www.parallax.com/Portals/0/Downloads/docs/article/WhythePropellerWorks.pdf

Yes, and isn't it a beautifully crafted piece of silicon rather than all those lego blocks chips
with peripherals that try to do everything except what you really need them to do.

The thing to remember is that although each cog has a video register and two counter timers is that 
the cog itself is the peripheral and/or the CPU. Now that I mention the counter timers do you know 
that they can be setup with simple Rs and Cs as a DAC or an ADC? Not even to mention the frequency 
synthesis up to 128MHz. It is so easy for the Prop to generate the clocks for other chips in a 
larger system.

Anyway, have a look at the Parallax website for further information. 
http://www.parallax.com/tabid/407/Default.aspx

This is a link to the object exchange for source code examples of the wide variety of
ingenious tasks that the Propeller (or it's cogs) have been put too.
http://obex.parallax.com/

Remember, this is an inexpensive little 40 pin chip.

*Peter*

Rocky wrote:
> On Jan 2, 10:23 pm, David Brown
> <david.br...@hesbynett.removethisbit.no> wrote:
>> Jeff Fox wrote:
>>> On Dec 31 2008, 8:23 am, Vladimir Vassilevsky
>>> <antispam_bo...@hotmail.com> wrote:
>>>> 5) Fast AVR should be able to handle 4 independent UARTs at 38400 as the
>>>> software bit banging.
>>> What do you think is the upper baud limit for 1 to 4 software bit
>>> banging
>>> UART on a fast AVR before a hardware UART is needed?
> 
> <snip>
> 
>> I wrote a 38.4 kbaud software UART on an AVR at 7.37 MHz with 4 times
>> oversampling.  That meant a timer running at 153.6 kHz, with 48
>> processor clocks between ticks.  That's not a lot of time, but easily
>> enough for the software UART written in assembly.
> 
> 3 times oversampling actually gives better results than 4 times. I
> know it seems wierd, but it actually gives sampling that is closer to
> the bit center than 4 times oversampling. It also has less processor
> overhead and works fine with a 7372800 Hz clock.
> 

I'm not sure that's correct - but it's certainly worth thinking about.

The key synchronisation point is the start bit - from when the line 
drops at the start of the start bit, your ideal sampling point is then 
half a baud time later.

If you are sampling at four times the baud rate, then your sampling 
point becomes 2 Q after you first detect the start of the start bit (you 
can also sample the start bit after 1 Q as well, as extra noise 
resistance).  The true start bit started somewhere between -1 Q and 0 Q, 
depending on the exact synchronisation, so the ideal sampling point is 
somewhere between 1 Q and 2 Q.  Thus sampling at 2 Q is the best you can do.

If you are sampling at three times the baud rate, the ideal sampling 
point will be between 0.5 Q and 1.5 Q.  Sampling at the point 1 Q is 
then in the middle of the true ideal sampling point range.  I believe 
this is what you are thinking about as being a better sample point.

I am far from convinced that this is a better idea - I think you are 
better sampling late (between 0 and 1 Q late rather than risk sampling 
early (between -0.5 Q and +0.5 Q - or using the same time scale as the 
four-times oversampling, between -0.67 Q and +0.67 Q).  The reason for 
this is that any instability in the sampling is much more likely to 
occur in the early part of the bit, rather than the late part. 
Consider, for example, if the driver, line capacitance and termination 
of the line is such that driving the line to 0 is faster than driving 
the line to 1 (this is the case for CAN drivers, for example - even 
though they are not normally used without a CAN controller, the 
principle is the same).  In this case, you would see the 0 values early, 
and the 1 values late - your 3-times oversampling may will miss the 
first 1 after a 0 as it takes longer to propagate.

mvh.,

David

   Perhaps the "Propellor" thing deserves it's own thread. I've looked
into this "propellpr" thing a little bit, as someone brought small
"propellor" protoboard to a recent robot club meeting (I forget if it
was the plugin breadboard or the DIP board had a small USB connector
on it) and a 9V battery for power. The statement "it has eight 32-bit
cores" got my attention.
   Maybe you can give quick answers to my questions below before I
dive deeply into the documentation.

On Thu, 01 Jan 2009 01:13:34 GMT, Peter Jakacki
<peterjakacki@gmail.com> wrote:

>valwn@silvtrc.org wrote:
>> How is core-2-core communication?, complicated?, cpu intensive? trivial?
>
>Actually there is no "core-2-core" communications. Each core or cog is 
>absolutely identical and they share the same I/O pins as well but they 
>are all connected to a HUB-like RAM with a simple "rotary commutator" or 
>"Propeller" scheme which guarantees equal access.

   Bow many bytes is this shared RAM? Can one processor generate an
interrupt on another one (without using up one of the common I/O
pins)?
   How much ram and program memory does each core have? Wait, I found
it here:
http://www.parallax.com/tabid/407/Default.aspx
Global RAM/ROM  	64 K bytes; 32K RAM / 32 K ROM
So I guess that's shared among the 8 processors.
Processor RAM  	2 K bytes each

Is each core the "standard" Von Neumann architecture (program and data
inabit different areas of the same address space), or is it Harvard
(program and data are on separate busses and thus can be accessed
simultaneously for greater speed, like DSP's, the AVR and several
other microcontrollers)?

   What's the programming language? Is there an assembler? What's the
architecture of each core (or "cog") look like (number of registers,
how they can be used in different instructions and addressing modes,
approximate average cycles per instruction, etc.)? Is there a C
compiler for it? Might there be a GCC port for it? (others might want
to ask about C++, but for me it's a little hard to imagine using C++
for a microcontroller).
   I did read some blurb on the Parallax site by the designer, that it
is what it is, that he hand-designed the thing rather than using the
usual hardware-design tools and HDL's, and there won't be a bunch of
slightly different verssions with different peripherals and such. Here
it is:
http://www.parallax.com/Portals/0/Downloads/docs/article/WhythePropellerWorks.pdf

>
>So then there is this core-to-hub communication using reads and writes 
>between the COG and common HUB RAM. It's a bit like having an 8-port RAM 
>with 8 independent CPUs connected to it and with each CPU having 32 I/O 
>ports all "wired OR" together to 32 I/O pins.
>
>Simple and surprisingly very effective. Note that there is a Prop II 
>being designed that runs a lot faster, 

   The above page says this thing runs up to 80MHz, that's not too
shabby, depending on what you're comparing it to. How much faster is
"a lot?"  ;)

>has more memory, I/O etc and may 
>have other enhancements as well. But for now I know I can use this 
>simple little Prop to do complicated tasks simply and cheaply.
>
>*Peter*
>

Jeff Fox wrote:
> On Dec 31 2008, 8:23 am, Vladimir Vassilevsky
> <antispam_bo...@hotmail.com> wrote:
> 
>>5) Fast AVR should be able to handle 4 independent UARTs at 38400 as the
>>software bit banging.
> 
> 
> What do you think is the upper baud limit for 1 to 4 software bit
> banging UART on a fast AVR before a hardware UART is needed?

That mainly depends on the interrupt latency. With no other interrupts,
the transmit part is no problem up to hundreds of kbps. Receive part is 
  more difficult. For one software UART, the interrupt by the start bit 
can be used, which allows the speed of hundreds kbps also. For many 
uarts, the sampling of at least two times per bit is required. That sets 
the upper limit at about 100kbps.

> What are the upper limits using the hardware UART on a fast AVR?

The max. speed of the AVR hardware UART is CLK/8, i.e. 2.5Mbaud at 
20MHz. I have actually used the AVR UART at 2.048M; it works as expected.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

>1) Use a different MCU (Renesas, Freescale), etc. I'm reluctant to do
>it, because I'm familiar with AVR and PIC and they're very simple to
>program. Ideally, the programming for this application should take a
>few days, and I wouldn't want to learn a new MCU/compiler/toolchain.
>

I appreciate that you don't want to learn a new micro, but for what it is
worth, the PSoC CY8C29xxx series can be configured with 4 UARTs. The PSoC
comes with configurable digital and analog blocks that can be set up in a
multitude of ways within the constraints of the resources. 

Also possible with the PSoC or with external gating on another processor
is to have one UART communicating over multiplexed pins provided only one
channel is operating at any given time.

-Aubrey

On Jan 2, 10:23=A0pm, David Brown
<david.br...@hesbynett.removethisbit.no> wrote:
> Jeff Fox wrote:
> > On Dec 31 2008, 8:23 am, Vladimir Vassilevsky
> > <antispam_bo...@hotmail.com> wrote:
> >> 5) Fast AVR should be able to handle 4 independent UARTs at 38400 as t=
he
> >> software bit banging.
>
> > What do you think is the upper baud limit for 1 to 4 software bit
> > banging
> > UART on a fast AVR before a hardware UART is needed?
>

<snip>

> I wrote a 38.4 kbaud software UART on an AVR at 7.37 MHz with 4 times
> oversampling. =A0That meant a timer running at 153.6 kHz, with 48
> processor clocks between ticks. =A0That's not a lot of time, but easily
> enough for the software UART written in assembly.

3 times oversampling actually gives better results than 4 times. I
know it seems wierd, but it actually gives sampling that is closer to
the bit center than 4 times oversampling. It also has less processor
overhead and works fine with a 7372800 Hz clock.

Jeff Fox wrote:
> On Dec 31 2008, 8:23 am, Vladimir Vassilevsky
> <antispam_bo...@hotmail.com> wrote:
>> 5) Fast AVR should be able to handle 4 independent UARTs at 38400 as the
>> software bit banging.
> 
> What do you think is the upper baud limit for 1 to 4 software bit
> banging
> UART on a fast AVR before a hardware UART is needed?
> 

That's very much a "it depends" question.  It depends on things like 
whether the UARTs are duplex (sending is much easier than receiving), 
whether they are all active at the same time, whether they are 
synchronized, how accurate the baud rates are known, what the noise 
environment is like (that affects the need for oversampling), whether 
you need to write it all in C or if you can use assembly (this is one of 
the cases where hand-crafted assembly can be *much* faster), and what 
else the processor is doing.

I wrote a 38.4 kbaud software UART on an AVR at 7.37 MHz with 4 times 
oversampling.  That meant a timer running at 153.6 kHz, with 48 
processor clocks between ticks.  That's not a lot of time, but easily 
enough for the software UART written in assembly.

> What are the upper limits using the hardware UART on a fast AVR?
> 
> Best Wishes

On Dec 31 2008, 8:23=A0am, Vladimir Vassilevsky
<antispam_bo...@hotmail.com> wrote:
> 5) Fast AVR should be able to handle 4 independent UARTs at 38400 as the
> software bit banging.

What do you think is the upper baud limit for 1 to 4 software bit
banging
UART on a fast AVR before a hardware UART is needed?

What are the upper limits using the hardware UART on a fast AVR?

Best Wishes