Reply by -jg January 8, 20092009-01-08
On Dec 31 2008, 8:53=A0pm, eliben <eli...@gmail.com> wrote:
> > I need an MCU with 4 UART (@ 38.4 KBaud each) and several IOs. The 4 > > UARTs is a problem, because the simplest MCUs (PIC, AVR) don't have > > chips with this amount (AFAIK). > > Sorry about the self reply, but I've just found that AVR have the > 640/1280/2560 families, which seem to have 4 UARTs. Does anyone have > experience using these chips ?
You could also look at the now sampling ATXMEGA128A1-AU - The xMega series show EIGHT (!) uarts @ 100 pins, 7 Uarts @ 64 pins and 5 UARTS @ 44 pins. -jg
Reply by Rocky January 3, 20092009-01-03
On Jan 3, 2:40=A0pm, David Brown
<david.br...@hesbynett.removethisbit.no> wrote:
> Rocky wrote: > > On Jan 2, 10:23 pm, David Brown > > <david.br...@hesbynett.removethisbit.no> wrote: > >> Jeff Fox wrote: > >>> On Dec 31 2008, 8:23 am, Vladimir Vassilevsky > >>> <antispam_bo...@hotmail.com> wrote: > >>>> 5) Fast AVR should be able to handle 4 independent UARTs at 38400 as=
the
> >>>> software bit banging. > >>> What do you think is the upper baud limit for 1 to 4 software bit > >>> banging > >>> UART on a fast AVR before a hardware UART is needed? > > > <snip> > > >> I wrote a 38.4 kbaud software UART on an AVR at 7.37 MHz with 4 times > >> oversampling. =A0That meant a timer running at 153.6 kHz, with 48 > >> processor clocks between ticks. =A0That's not a lot of time, but easil=
y
> >> enough for the software UART written in assembly. > > > 3 times oversampling actually gives better results than 4 times. I > > know it seems wierd, but it actually gives sampling that is closer to > > the bit center than 4 times oversampling. It also has less processor > > overhead and works fine with a 7372800 Hz clock. > > I'm not sure that's correct - but it's certainly worth thinking about. > > The key synchronisation point is the start bit - from when the line > drops at the start of the start bit, your ideal sampling point is then > half a baud time later. > > If you are sampling at four times the baud rate, then your sampling > point becomes 2 Q after you first detect the start of the start bit (you > can also sample the start bit after 1 Q as well, as extra noise > resistance). =A0The true start bit started somewhere between -1 Q and 0 Q=
,
> depending on the exact synchronisation, so the ideal sampling point is > somewhere between 1 Q and 2 Q. =A0Thus sampling at 2 Q is the best you ca=
n do.
> > If you are sampling at three times the baud rate, the ideal sampling > point will be between 0.5 Q and 1.5 Q. =A0Sampling at the point 1 Q is > then in the middle of the true ideal sampling point range. =A0I believe > this is what you are thinking about as being a better sample point. > > I am far from convinced that this is a better idea - I think you are > better sampling late (between 0 and 1 Q late rather than risk sampling > early (between -0.5 Q and +0.5 Q - or using the same time scale as the > four-times oversampling, between -0.67 Q and +0.67 Q). =A0The reason for > this is that any instability in the sampling is much more likely to > occur in the early part of the bit, rather than the late part. > Consider, for example, if the driver, line capacitance and termination > of the line is such that driving the line to 0 is faster than driving > the line to 1 (this is the case for CAN drivers, for example - even > though they are not normally used without a CAN controller, the > principle is the same). =A0In this case, you would see the 0 values early=
,
> and the 1 values late - your 3-times oversampling may will miss the > first 1 after a 0 as it takes longer to propagate. >
Bear in mind the baud rates are much slower than CAN, so effects that come from reflections and from open-collector driven lines are not a big issue. With 4x you can choose to sample the data at between 25% and 50% of the bit period or between 50% and 75%. If you use 3x the sample point moves to be between 33% and 67% of the bit period. This is about 8% closer to the bit center. It makes it a little more tolerant of bit (CPU) clock, but the big gain is the drop in processor overhead. In practice it has worked well.
Reply by Peter Jakacki January 3, 20092009-01-03
Ben Bradley wrote:
> Perhaps the "Propellor" thing deserves it's own thread. I've looked
snip
> Is each core the "standard" Von Neumann architecture (program and data > inabit different areas of the same address space), or is it Harvard > (program and data are on separate busses and thus can be accessed > simultaneously for greater speed, like DSP's, the AVR and several > other microcontrollers)?
Quick answers? That is a little hard to do but I'll try :) Each core or cog is a tiny 32-bit CPU with it's own RAM and I/O ports. The Von Neumann architecture is the standard combined memory for program and data which is what the Propeller cogs are. However, unlike conventional CPUs the Propeller does not have any CPU registers but instead the cog RAM combines the program, data, and "registers". That doesn't slow things down though as most instructions take 4 clock cycles so that they run at 20MIPs on a standard 80MHz clock (5MHz x16). The only instruction that can be really slow are the hub operations which can take from 7 to 22 clock cycles in case they miss their access slot. This can be optimized in assembler loops to squeeze a couple of instructions between each hub operation but most cogs only need to access the hub RAM to update global buffers and variables otherwise they can operate totally internally. There are absolutely no interrupts and why should there be anyway? You have cogs which you can dedicate to the function that is required that normally could only be handled by interrupts on conventional CPUs. A powerful result of this is that each cog can be temporally deterministic and not at all encumbered with such kludges as "interrupts". Any load that an individual cog bears is not one that other cogs have to bear unless you want them too. I find it so much easier to debug and validate the software now that I do not have to deal with interrupts and the strange things that can happen as a result of indeterministic clashes. The "UART" cog receives, transmits, and processes the data and handles the buffers in hub memory all in a transparent manner to the other cogs, so then why would you need or want interrupts? I have used my "UART" cogs at speeds of up to 2M baud with no negative impact at all on the other cogs. Objects have been written that permit 8Mbps coms between Propeller chips.
> What's the programming language? Is there an assembler? What's the > architecture of each core (or "cog") look like (number of registers, > how they can be used in different instructions and addressing modes, > approximate average cycles per instruction, etc.)?
Most of the code is written in a PASCAL like syntax called SPIN. The SPIN IDE compiler are free. Here is a very basic code sample from the Propeller Manual tutorial that can be compiled, loaded, and running within a second or so with one keystroke. ****begin code**** {{ Output.spin }} PUB Toggle dira[16]~~ 'set I/O 16 as an output in this cog (short-form ) repeat 'start an infinite loop (indentation IS code) !outa[16] 'toggle output 16 waitcnt(3_000_000 + cnt) 'wait until system clock has advanced by 3000000 ****end code**** Note that this code example will run but it doesn't tell the Propeller what to do with the clock which will by default revert to the internal 12MHz RC clock. This is the header normally used to set the clock. ****begin code**** CON _clkmode = xtal1 + pll16x ' use low-speed crystal with the PLL set to 16x _xinfreq = 5_000_000 ' 5MHz crystal x16 = 80MHz ****end code**** The assembler code doesn't actually run in the same cog that is running SPIN code but is loaded into cogs as is the case with objects such as the UART function. Here is a snippet of PASM code that is part of the TV object:- ****begin code**** mov screen,_screen 'point to first tile (upper-leftmost) mov y,_vt 'set vertical tiles (y=vt) :line mov vx,_vx 'set vertical expand :vert if_z xor interlace,#1 'interlace skip? if_z tjz interlace,#:skip call #hsync 'do hsync ****and**** if_z xor interlace,#1 wz 'get interlace and field1 into z test _mode,#%0001 wc 'do visible front porch lines mov x,vf if_nz_and_c add x,#1 call #blank_lines ****end code**** Now screen, _screen, y, vt etc are variables located in the cog memory map which is always 32-bits wide and addressed with 9-bit source and destinations embedded in each instruction which means that each cog's memory is limited to 512 32-bit words but that it can directly address these without further reference. Once again given the compactness of the conditional and modifiable (yes) instruction it turns out that is plenty of program memory for each cog. The high-level Spin language itself as opposed to the PASM code (which may be embedded in sections of the Spin code) actually compiles byte tokens that reside in the 32K hub memory and are executed by an interpreter loaded into one or more of the cogs. The 32K ROM has a lot of the higher level functions that are called by the Spin code. It all sounds strange but it works and you don't have to be concerned with these details anyway.
> Is there a C > compiler for it? Might there be a GCC port for it? (others might want > to ask about C++, but for me it's a little hard to imagine using C++ > for a microcontroller).
Imagecraft have come out with a C compiler. http://www.imagecraft.com/devtools_Propeller.html
> I did read some blurb on the Parallax site by the designer, that it > is what it is, that he hand-designed the thing rather than using the > usual hardware-design tools and HDL's, and there won't be a bunch of > slightly different verssions with different peripherals and such. Here > it is: > http://www.parallax.com/Portals/0/Downloads/docs/article/WhythePropellerWorks.pdf
Yes, and isn't it a beautifully crafted piece of silicon rather than all those lego blocks chips with peripherals that try to do everything except what you really need them to do. The thing to remember is that although each cog has a video register and two counter timers is that the cog itself is the peripheral and/or the CPU. Now that I mention the counter timers do you know that they can be setup with simple Rs and Cs as a DAC or an ADC? Not even to mention the frequency synthesis up to 128MHz. It is so easy for the Prop to generate the clocks for other chips in a larger system. Anyway, have a look at the Parallax website for further information. http://www.parallax.com/tabid/407/Default.aspx This is a link to the object exchange for source code examples of the wide variety of ingenious tasks that the Propeller (or it's cogs) have been put too. http://obex.parallax.com/ Remember, this is an inexpensive little 40 pin chip. *Peter*
Reply by David Brown January 3, 20092009-01-03
Rocky wrote:
> On Jan 2, 10:23 pm, David Brown > <david.br...@hesbynett.removethisbit.no> wrote: >> Jeff Fox wrote: >>> On Dec 31 2008, 8:23 am, Vladimir Vassilevsky >>> <antispam_bo...@hotmail.com> wrote: >>>> 5) Fast AVR should be able to handle 4 independent UARTs at 38400 as the >>>> software bit banging. >>> What do you think is the upper baud limit for 1 to 4 software bit >>> banging >>> UART on a fast AVR before a hardware UART is needed? > > <snip> > >> I wrote a 38.4 kbaud software UART on an AVR at 7.37 MHz with 4 times >> oversampling. That meant a timer running at 153.6 kHz, with 48 >> processor clocks between ticks. That's not a lot of time, but easily >> enough for the software UART written in assembly. > > 3 times oversampling actually gives better results than 4 times. I > know it seems wierd, but it actually gives sampling that is closer to > the bit center than 4 times oversampling. It also has less processor > overhead and works fine with a 7372800 Hz clock. >
I'm not sure that's correct - but it's certainly worth thinking about. The key synchronisation point is the start bit - from when the line drops at the start of the start bit, your ideal sampling point is then half a baud time later. If you are sampling at four times the baud rate, then your sampling point becomes 2 Q after you first detect the start of the start bit (you can also sample the start bit after 1 Q as well, as extra noise resistance). The true start bit started somewhere between -1 Q and 0 Q, depending on the exact synchronisation, so the ideal sampling point is somewhere between 1 Q and 2 Q. Thus sampling at 2 Q is the best you can do. If you are sampling at three times the baud rate, the ideal sampling point will be between 0.5 Q and 1.5 Q. Sampling at the point 1 Q is then in the middle of the true ideal sampling point range. I believe this is what you are thinking about as being a better sample point. I am far from convinced that this is a better idea - I think you are better sampling late (between 0 and 1 Q late rather than risk sampling early (between -0.5 Q and +0.5 Q - or using the same time scale as the four-times oversampling, between -0.67 Q and +0.67 Q). The reason for this is that any instability in the sampling is much more likely to occur in the early part of the bit, rather than the late part. Consider, for example, if the driver, line capacitance and termination of the line is such that driving the line to 0 is faster than driving the line to 1 (this is the case for CAN drivers, for example - even though they are not normally used without a CAN controller, the principle is the same). In this case, you would see the 0 values early, and the 1 values late - your 3-times oversampling may will miss the first 1 after a 0 as it takes longer to propagate. mvh., David
Reply by Ben Bradley January 3, 20092009-01-03
   Perhaps the "Propellor" thing deserves it's own thread. I've looked
into this "propellpr" thing a little bit, as someone brought small
"propellor" protoboard to a recent robot club meeting (I forget if it
was the plugin breadboard or the DIP board had a small USB connector
on it) and a 9V battery for power. The statement "it has eight 32-bit
cores" got my attention.
   Maybe you can give quick answers to my questions below before I
dive deeply into the documentation.

On Thu, 01 Jan 2009 01:13:34 GMT, Peter Jakacki
<peterjakacki@gmail.com> wrote:

>valwn@silvtrc.org wrote: >> How is core-2-core communication?, complicated?, cpu intensive? trivial? > >Actually there is no "core-2-core" communications. Each core or cog is >absolutely identical and they share the same I/O pins as well but they >are all connected to a HUB-like RAM with a simple "rotary commutator" or >"Propeller" scheme which guarantees equal access.
Bow many bytes is this shared RAM? Can one processor generate an interrupt on another one (without using up one of the common I/O pins)? How much ram and program memory does each core have? Wait, I found it here: http://www.parallax.com/tabid/407/Default.aspx Global RAM/ROM 64 K bytes; 32K RAM / 32 K ROM So I guess that's shared among the 8 processors. Processor RAM 2 K bytes each Is each core the "standard" Von Neumann architecture (program and data inabit different areas of the same address space), or is it Harvard (program and data are on separate busses and thus can be accessed simultaneously for greater speed, like DSP's, the AVR and several other microcontrollers)? What's the programming language? Is there an assembler? What's the architecture of each core (or "cog") look like (number of registers, how they can be used in different instructions and addressing modes, approximate average cycles per instruction, etc.)? Is there a C compiler for it? Might there be a GCC port for it? (others might want to ask about C++, but for me it's a little hard to imagine using C++ for a microcontroller). I did read some blurb on the Parallax site by the designer, that it is what it is, that he hand-designed the thing rather than using the usual hardware-design tools and HDL's, and there won't be a bunch of slightly different verssions with different peripherals and such. Here it is: http://www.parallax.com/Portals/0/Downloads/docs/article/WhythePropellerWorks.pdf
> >So then there is this core-to-hub communication using reads and writes >between the COG and common HUB RAM. It's a bit like having an 8-port RAM >with 8 independent CPUs connected to it and with each CPU having 32 I/O >ports all "wired OR" together to 32 I/O pins. > >Simple and surprisingly very effective. Note that there is a Prop II >being designed that runs a lot faster,
The above page says this thing runs up to 80MHz, that's not too shabby, depending on what you're comparing it to. How much faster is "a lot?" ;)
>has more memory, I/O etc and may >have other enhancements as well. But for now I know I can use this >simple little Prop to do complicated tasks simply and cheaply. > >*Peter* >
Reply by Vladimir Vassilevsky January 2, 20092009-01-02

Jeff Fox wrote:
> On Dec 31 2008, 8:23 am, Vladimir Vassilevsky > <antispam_bo...@hotmail.com> wrote: > >>5) Fast AVR should be able to handle 4 independent UARTs at 38400 as the >>software bit banging. > > > What do you think is the upper baud limit for 1 to 4 software bit > banging UART on a fast AVR before a hardware UART is needed?
That mainly depends on the interrupt latency. With no other interrupts, the transmit part is no problem up to hundreds of kbps. Receive part is more difficult. For one software UART, the interrupt by the start bit can be used, which allows the speed of hundreds kbps also. For many uarts, the sampling of at least two times per bit is required. That sets the upper limit at about 100kbps.
> What are the upper limits using the hardware UART on a fast AVR?
The max. speed of the AVR hardware UART is CLK/8, i.e. 2.5Mbaud at 20MHz. I have actually used the AVR UART at 2.048M; it works as expected. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by antedeluvian51 January 2, 20092009-01-02
>1) Use a different MCU (Renesas, Freescale), etc. I'm reluctant to do >it, because I'm familiar with AVR and PIC and they're very simple to >program. Ideally, the programming for this application should take a >few days, and I wouldn't want to learn a new MCU/compiler/toolchain. >
I appreciate that you don't want to learn a new micro, but for what it is worth, the PSoC CY8C29xxx series can be configured with 4 UARTs. The PSoC comes with configurable digital and analog blocks that can be set up in a multitude of ways within the constraints of the resources. Also possible with the PSoC or with external gating on another processor is to have one UART communicating over multiplexed pins provided only one channel is operating at any given time. -Aubrey
Reply by Rocky January 2, 20092009-01-02
On Jan 2, 10:23=A0pm, David Brown
<david.br...@hesbynett.removethisbit.no> wrote:
> Jeff Fox wrote: > > On Dec 31 2008, 8:23 am, Vladimir Vassilevsky > > <antispam_bo...@hotmail.com> wrote: > >> 5) Fast AVR should be able to handle 4 independent UARTs at 38400 as t=
he
> >> software bit banging. > > > What do you think is the upper baud limit for 1 to 4 software bit > > banging > > UART on a fast AVR before a hardware UART is needed? >
<snip>
> I wrote a 38.4 kbaud software UART on an AVR at 7.37 MHz with 4 times > oversampling. =A0That meant a timer running at 153.6 kHz, with 48 > processor clocks between ticks. =A0That's not a lot of time, but easily > enough for the software UART written in assembly.
3 times oversampling actually gives better results than 4 times. I know it seems wierd, but it actually gives sampling that is closer to the bit center than 4 times oversampling. It also has less processor overhead and works fine with a 7372800 Hz clock.
Reply by David Brown January 2, 20092009-01-02
Jeff Fox wrote:
> On Dec 31 2008, 8:23 am, Vladimir Vassilevsky > <antispam_bo...@hotmail.com> wrote: >> 5) Fast AVR should be able to handle 4 independent UARTs at 38400 as the >> software bit banging. > > What do you think is the upper baud limit for 1 to 4 software bit > banging > UART on a fast AVR before a hardware UART is needed? >
That's very much a "it depends" question. It depends on things like whether the UARTs are duplex (sending is much easier than receiving), whether they are all active at the same time, whether they are synchronized, how accurate the baud rates are known, what the noise environment is like (that affects the need for oversampling), whether you need to write it all in C or if you can use assembly (this is one of the cases where hand-crafted assembly can be *much* faster), and what else the processor is doing. I wrote a 38.4 kbaud software UART on an AVR at 7.37 MHz with 4 times oversampling. That meant a timer running at 153.6 kHz, with 48 processor clocks between ticks. That's not a lot of time, but easily enough for the software UART written in assembly.
> What are the upper limits using the hardware UART on a fast AVR? > > Best Wishes
Reply by Jeff Fox January 2, 20092009-01-02
On Dec 31 2008, 8:23=A0am, Vladimir Vassilevsky
<antispam_bo...@hotmail.com> wrote:
> 5) Fast AVR should be able to handle 4 independent UARTs at 38400 as the > software bit banging.
What do you think is the upper baud limit for 1 to 4 software bit banging UART on a fast AVR before a hardware UART is needed? What are the upper limits using the hardware UART on a fast AVR? Best Wishes