Parallax Propeller| page 6

Reply by Coos Haak ●January 25, 20132013-01-25

Op Thu, 24 Jan 2013 23:03:32 -0800 (PST) schreef Mark Wills:

> On Jan 25, 6:55&#4294967295;am, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
>> On Jan 24, 3:03&#4294967295;pm, Mark Wills <markrobertwi...@yahoo.co.uk> wrote:
>>
>>> The TMS99xx family of processors (very old) has 16 prioritised
>>> cascading interrupts. Probably inherited from mini-computer
>>> architecture. Very very powerful for its day. Since they were
>>> prioritised, a lower level interrupt would not interrupt a higher
>>> level interrupt until the higher level ISR terminated. Makes serving
>>> multiple interrupts an absolute doddle. Not bad for 1976.
>>
>> Doddle? I've never heard that word before. Is a doddle good or bad?
> 
> doddle = extremely simple/easy
> 
> "Did you manage to fix that bug?"
> "Yeah, it was a doddle!"
> 
> :-)

As we say: een fluitje van een cent.
A flute of a cent does cost nearly nothing and can be made for nearly
nothing. There is a herb (Anthriscus sylvestris) we call Fluitekruid. Due
to nearly French purism (hash-tag vs. mot-di&#4294967295;se) we must write
Fluitenkruid, as if it were plural.

-- 
Coos

CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html

Reply by None ●January 25, 20132013-01-25

rickman <gnuarm@gmail.com> writes:
> How do you know the display "fuzziness" was due to software timing?  I 
> would expect software timing on a clocked processor to be on par with 
> other means of timing.  There are other aspects of design that could 
> cause fuzziness or timing ambiguities in the signal.

If you look at the inner loop driving the output pin, you can do a min/max
skew calculation which ends up with quite a bit of jitter on the table.
The product is the PockeTerm, you can pick one up at:

    http://www.brielcomputers.com/wordpress/?cat=25

It's open source, VGA_HiRes_Text.spin is the low level driver for
VGA output.  Note it actually uses *two* CPUs, and is some pretty darn
cool assembly code--written by the president of the Propeller company!

Andy Valencia
Home page: http://www.vsta.org/andy/
To contact me: http://www.vsta.org/contact/andy.html

Reply by rickman ●January 26, 20132013-01-26

On 1/25/2013 4:57 PM, None wrote:
> rickman<gnuarm@gmail.com>  writes:
>> How do you know the display "fuzziness" was due to software timing?  I
>> would expect software timing on a clocked processor to be on par with
>> other means of timing.  There are other aspects of design that could
>> cause fuzziness or timing ambiguities in the signal.
>
> If you look at the inner loop driving the output pin, you can do a min/max
> skew calculation which ends up with quite a bit of jitter on the table.
> The product is the PockeTerm, you can pick one up at:
>
>      http://www.brielcomputers.com/wordpress/?cat=25
>
> It's open source, VGA_HiRes_Text.spin is the low level driver for
> VGA output.  Note it actually uses *two* CPUs, and is some pretty darn
> cool assembly code--written by the president of the Propeller company!
>
> Andy Valencia
> Home page: http://www.vsta.org/andy/
> To contact me: http://www.vsta.org/contact/andy.html

I don't follow what causes the skew you mention.  Instruction timings 
are deterministic, no?  If not, trying to time using code is hopeless. 
If the timings are deterministic, the skew should not be cumulative 
since they are all based on the CPU clock.  Is the CPU clock from an 
accurate oscillator like a crystal?  If it is using an internal RC 
clock, again timing to sufficient accuracy is hopeless.

Rick

Reply by None ●January 26, 20132013-01-26

rickman <gnuarm@gmail.com> writes:
> I don't follow what causes the skew you mention.  Instruction timings 
> are deterministic, no?

The chip has a lower level bit stream engine which the higher level CPU
("cog") is feeding.  Well, a pair of cogs.  Each cog has local memory
and then a really expensive path through a central arbiter ("hub").  It
fills its image of the scanlines from the shared memory, then has to
feed it via waitvid into the lower level.  Note that it's bit stream
engine *per cog*, so you also have to worry about their sync.

So yes, instruction timings are deterministic (although your shared
memory accesses will vary modulo the hub round-robin count).  You
need to reach the waitvid before it's your turn to supply the next value.
But given that, this is much like the old wait state sync feeding bytes
to a floppy controller.  PLL and waitvid sync are achieved with
magic incantations from Parallax, and it is not 100%.

> If the timings are deterministic, the skew should not be cumulative 
> since they are all based on the CPU clock.  Is the CPU clock from an 
> accurate oscillator like a crystal?  If it is using an internal RC 
> clock, again timing to sufficient accuracy is hopeless.

The board has a CPU clock from which the PLL derives the video output
frequency.  I recall the CPU clock being based on a crystal, but not one
with any consideration for video intervals.  And the PLL's are per cog,
again my comment about (potential lack of) global sync.

Anyway, you should buy one and check it out.  I'd be curious to hear
if (1) you also observe the same video quality, and (2) if you think it's
the waitvid mechanism, more the PLL->SVGA generation, or the sync issues
of the paired video generators.  They even supply the schematic, FWIW.

Andy Valencia
Home page: http://www.vsta.org/andy/
To contact me: http://www.vsta.org/contact/andy.html

Reply by Mark Wills ●January 27, 20132013-01-27

On Jan 26, 11:07&#4294967295;pm, rickman <gnu...@gmail.com> wrote:
> On 1/25/2013 4:57 PM, None wrote:
>
>
>
>
>
>
>
>
>
> > rickman<gnu...@gmail.com> &#4294967295;writes:
> >> How do you know the display "fuzziness" was due to software timing? &#4294967295;I
> >> would expect software timing on a clocked processor to be on par with
> >> other means of timing. &#4294967295;There are other aspects of design that could
> >> cause fuzziness or timing ambiguities in the signal.
>
> > If you look at the inner loop driving the output pin, you can do a min/max
> > skew calculation which ends up with quite a bit of jitter on the table.
> > The product is the PockeTerm, you can pick one up at:
>
> > &#4294967295; &#4294967295; &#4294967295;http://www.brielcomputers.com/wordpress/?cat=25
>
> > It's open source, VGA_HiRes_Text.spin is the low level driver for
> > VGA output. &#4294967295;Note it actually uses *two* CPUs, and is some pretty darn
> > cool assembly code--written by the president of the Propeller company!
>
> > Andy Valencia
> > Home page:http://www.vsta.org/andy/
> > To contact me:http://www.vsta.org/contact/andy.html
>
> I don't follow what causes the skew you mention. &#4294967295;Instruction timings
> are deterministic, no? &#4294967295;If not, trying to time using code is hopeless.
> If the timings are deterministic, the skew should not be cumulative
> since they are all based on the CPU clock. &#4294967295;Is the CPU clock from an
> accurate oscillator like a crystal? &#4294967295;If it is using an internal RC
> clock, again timing to sufficient accuracy is hopeless.
>
> Rick

The instruction times are deterministic (presumably; never written
code on the propeller), but when generating video in software, *per
scan line* all possible code-paths have to add up to the same number
of cycles in order to completely avoid jitter. That's very hard to do.

Consider a single scan line that contains text interspersed with
spaces. For the current horizontal position the software has to:

* Determine if background or foreground (i.e. a pixel of text colour)
should be drawn
* If background
  * select background colour to video output register
* If foreground
  * determine character under current horizontal position
  * determine offset (in pixels) into the current line of the
character
  * is a pixel to be drawn?
    * If yes, load pixel colour
    * otherwise, load background colour

The second code path is a lot more complex, containing many more
instructions, yet both code paths have to balance in terms of
execution time. This is just one example.

This is how video is done on the original Atari VCS console. 100%
software, with the hardware only providing horizontal interrupts (one
per scan line) and VBLNK interrupts, IIRC.

Caveat: The above assumes that there is no interrupt per horizontal
pixel. With interrupts, it's much easier. The Propeller doesn't have
any interrupts so software video generation would be non-trivial to
say the least. The easiest way would be to provide a pixel clock and
use an I/O pin to sync to, as Chuck found out for himself when
implementing video on the GA144.

Reply by Andrew Haley ●January 27, 20132013-01-27

In comp.lang.forth Mark Wills <markrobertwills@yahoo.co.uk> wrote:
> Caveat: The above assumes that there is no interrupt per horizontal
> pixel. With interrupts, it's much easier.

I really don't understand why you say this.  You need to be able to
sync to a timing pulse; whether this is done with interrupts doesn't
matter.

Andrew.

Reply by Dombo ●January 27, 20132013-01-27

Op 27-Jan-13 9:50, Mark Wills schreef:
> On Jan 26, 11:07 pm, rickman <gnu...@gmail.com> wrote:
>> On 1/25/2013 4:57 PM, None wrote:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> rickman<gnu...@gmail.com>  writes:
>>>> How do you know the display "fuzziness" was due to software timing?  I
>>>> would expect software timing on a clocked processor to be on par with
>>>> other means of timing.  There are other aspects of design that could
>>>> cause fuzziness or timing ambiguities in the signal.
>>
>>> If you look at the inner loop driving the output pin, you can do a min/max
>>> skew calculation which ends up with quite a bit of jitter on the table.
>>> The product is the PockeTerm, you can pick one up at:
>>
>>>       http://www.brielcomputers.com/wordpress/?cat=25
>>
>>> It's open source, VGA_HiRes_Text.spin is the low level driver for
>>> VGA output.  Note it actually uses *two* CPUs, and is some pretty darn
>>> cool assembly code--written by the president of the Propeller company!
>>
>>> Andy Valencia
>>> Home page:http://www.vsta.org/andy/
>>> To contact me:http://www.vsta.org/contact/andy.html
>>
>> I don't follow what causes the skew you mention.  Instruction timings
>> are deterministic, no?  If not, trying to time using code is hopeless.
>> If the timings are deterministic, the skew should not be cumulative
>> since they are all based on the CPU clock.  Is the CPU clock from an
>> accurate oscillator like a crystal?  If it is using an internal RC
>> clock, again timing to sufficient accuracy is hopeless.
>>
>> Rick
>
> The instruction times are deterministic (presumably; never written
> code on the propeller), but when generating video in software, *per
> scan line* all possible code-paths have to add up to the same number
> of cycles in order to completely avoid jitter. That's very hard to do.
>
> Consider a single scan line that contains text interspersed with
> spaces. For the current horizontal position the software has to:
>
> * Determine if background or foreground (i.e. a pixel of text colour)
> should be drawn
> * If background
>    * select background colour to video output register
> * If foreground
>    * determine character under current horizontal position
>    * determine offset (in pixels) into the current line of the
> character
>    * is a pixel to be drawn?
>      * If yes, load pixel colour
>      * otherwise, load background colour
>
> The second code path is a lot more complex, containing many more
> instructions, yet both code paths have to balance in terms of
> execution time. This is just one example.
>
> This is how video is done on the original Atari VCS console. 100%
> software, with the hardware only providing horizontal interrupts (one
> per scan line) and VBLNK interrupts, IIRC.

On the Atari VCS the software did not have to send out the individual 
pixels. The TIA chip had memory for a single scan-line, which the TIA 
chip converted to a video signal autonomously. The software just had to 
make sure that the right data was loaded into the TIA chip in time for 
each scan-line, it could finish doing that before the end of the 
scan-line, but not after that. The TIA chip has also a function to stall 
the CPU until the start of the next scan line. I.o.w. the software had 
to be fast enough for each possible execution flow, but did not have to 
complete in the exact same number of cycles.

Reply by Hugh Aguilar ●January 28, 20132013-01-28

On Jan 25, 12:03&#4294967295;am, Mark Wills <forthfr...@gmail.com> wrote:
> On Jan 25, 6:55&#4294967295;am, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
>
> > On Jan 24, 3:03&#4294967295;pm, Mark Wills <markrobertwi...@yahoo.co.uk> wrote:
>
> > > The TMS99xx family of processors (very old) has 16 prioritised
> > > cascading interrupts. Probably inherited from mini-computer
> > > architecture. Very very powerful for its day. Since they were
> > > prioritised, a lower level interrupt would not interrupt a higher
> > > level interrupt until the higher level ISR terminated. Makes serving
> > > multiple interrupts an absolute doddle. Not bad for 1976.
>
> > Doddle? I've never heard that word before. Is a doddle good or bad?
>
> doddle = extremely simple/easy
>
> "Did you manage to fix that bug?"
> "Yeah, it was a doddle!"
>
> :-)

Maybe the reason why we don't have "doddle" or any similar word in
America, is because we never do anything the simple/easy way here! :-)

Reply by rickman ●January 28, 20132013-01-28

On 1/27/2013 3:50 AM, Mark Wills wrote:
> On Jan 26, 11:07 pm, rickman<gnu...@gmail.com>  wrote:
>> On 1/25/2013 4:57 PM, None wrote:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> rickman<gnu...@gmail.com>    writes:
>>>> How do you know the display "fuzziness" was due to software timing?  I
>>>> would expect software timing on a clocked processor to be on par with
>>>> other means of timing.  There are other aspects of design that could
>>>> cause fuzziness or timing ambiguities in the signal.
>>
>>> If you look at the inner loop driving the output pin, you can do a min/max
>>> skew calculation which ends up with quite a bit of jitter on the table.
>>> The product is the PockeTerm, you can pick one up at:
>>
>>>       http://www.brielcomputers.com/wordpress/?cat=25
>>
>>> It's open source, VGA_HiRes_Text.spin is the low level driver for
>>> VGA output.  Note it actually uses *two* CPUs, and is some pretty darn
>>> cool assembly code--written by the president of the Propeller company!
>>
>>> Andy Valencia
>>> Home page:http://www.vsta.org/andy/
>>> To contact me:http://www.vsta.org/contact/andy.html
>>
>> I don't follow what causes the skew you mention.  Instruction timings
>> are deterministic, no?  If not, trying to time using code is hopeless.
>> If the timings are deterministic, the skew should not be cumulative
>> since they are all based on the CPU clock.  Is the CPU clock from an
>> accurate oscillator like a crystal?  If it is using an internal RC
>> clock, again timing to sufficient accuracy is hopeless.
>>
>> Rick
>
> The instruction times are deterministic (presumably; never written
> code on the propeller), but when generating video in software, *per
> scan line* all possible code-paths have to add up to the same number
> of cycles in order to completely avoid jitter. That's very hard to do.
>
> Consider a single scan line that contains text interspersed with
> spaces. For the current horizontal position the software has to:
>
> * Determine if background or foreground (i.e. a pixel of text colour)
> should be drawn
> * If background
>    * select background colour to video output register
> * If foreground
>    * determine character under current horizontal position
>    * determine offset (in pixels) into the current line of the
> character
>    * is a pixel to be drawn?
>      * If yes, load pixel colour
>      * otherwise, load background colour
>
> The second code path is a lot more complex, containing many more
> instructions, yet both code paths have to balance in terms of
> execution time. This is just one example.
>
> This is how video is done on the original Atari VCS console. 100%
> software, with the hardware only providing horizontal interrupts (one
> per scan line) and VBLNK interrupts, IIRC.

I'm not getting it.  I guess the software had to be done this way to 
optimize the CPU utilization.  The "proper" way to time in software is 
to have the video data already calculated in a frame buffer and use spin 
loops to time when pixels are shifted out.  That way you don't have lots 
of processing to figure out the timing for.  But you spend most of your 
processing time in spin loops.  Why was it done this way?  To save a few 
bucks on video hardware?  That's just not an issue now days... unless 
you are really obsessive about not using hardware where hardware is 
warranted.

> Caveat: The above assumes that there is no interrupt per horizontal
> pixel. With interrupts, it's much easier. The Propeller doesn't have
> any interrupts so software video generation would be non-trivial to
> say the least. The easiest way would be to provide a pixel clock and
> use an I/O pin to sync to, as Chuck found out for himself when
> implementing video on the GA144.

I would have to go back and reread the web pages, but I think Chuck's 
original attempt was to time the *entire* frame timing in software with 
NO hardware timing at all.  He found the timings drifted too much from 
temperature (that's what async processors do after all, they are timed 
by the silicon delays which vary with temp) so that with the monitor he 
was using it would stop working once the board warmed up.  I'm surprised 
he had to build it to find that out.  But I guess he didn't have specs 
on the monitor.

His "compromise" to hardware timing was to use a horizontal *line* 
interrupt (with a casual use of the word "interrupt", it is really a 
wait for a signal) which was driven from the 10 MHz oscillator node, 
like you described the Atari VCS.  He still did the pixel timing in a 
software loop.  With 144 processors it is no big deal to do that... *OR* 
he could have sprinkled a few counters around the chip to be used for 
*really* low power timing.  Each CPU core uses 5 mW when it is running a 
simple timing loop.  One of the big goals of the chip is to be low power 
and software timing is the antithesis of low power in my opinion.  But 
then you would need an oscillator and a clock tree...

I think there is an optimal compromise between a chip with fully async 
CPUs, with teeny tiny memories, no clocks, no peripherals (including 
nearly no real memory interface) and a chip with a very small number of 
huge CPUs, major clock trees running at very high clock rates, massive 
memories (multiple types), a plethora of hardware peripherals and a 
maximal bandwidth memory interface.  How about an array of many small 
CPUs, much like the F18 (or an F32 which rumor has is under 
development), each one with a few kB of memory, with a dedicated idle 
timer connected to lower speed clock trees (is one or two small clock 
trees a real power problem?), some real hardware peripherals for the 
higher speed I/O standards like 100/1000 Mbps Ethernet, real USB 
(including USB 3.0), some amount of on chip block RAM and some *real* 
memory interface which works at 200 or 300 MHz clock rates?

I get where Chuck is coming from with the minimal CPU thing.  I have 
said before that I think it is a useful chip in many ways.  But so far I 
haven't been able to use it.  One project faced the memory interface 
limitation and another found the chip to be too hard to use in the low 
power modes it is supposed to be capable of, just not when you need to 
do real time stuff at real low power.  It only needs a few small 
improvements including *real* I/O that can work at a number of voltages 
rather than just the core voltage.

Oh yeah, some real documentation on the development system would be 
useful too.  I think you have to read some three or more documents just 
to get started with the tools.  I know it was pretty hard to figure it 
all out, not that I *actually* figured it out.

Rick

Reply by rickman ●January 28, 20132013-01-28

Weird, your posts all show up in my reader as replies to your own 
messages rather than replies to my posts.  The trimming made it hard for 
me to figure out just what we were talking about with the odd 
connections in my reader.

On 1/26/2013 10:09 PM, None wrote:
> rickman<gnuarm@gmail.com>  writes:
>> I don't follow what causes the skew you mention.  Instruction timings
>> are deterministic, no?
>
> The chip has a lower level bit stream engine which the higher level CPU
> ("cog") is feeding.  Well, a pair of cogs.  Each cog has local memory
> and then a really expensive path through a central arbiter ("hub").  It
> fills its image of the scanlines from the shared memory, then has to
> feed it via waitvid into the lower level.  Note that it's bit stream
> engine *per cog*, so you also have to worry about their sync.

I can't picture the processing with this description.  I don't know 
about the higher level and lower level CPUs you describe.  Are you 
saying there is some sort of dedicated hardware in each CPU for video? 
Or is this separate from the CPUs?  Why a *pair* of COGs?  I assume a 
COG is the Propeller term for a CPU?

> So yes, instruction timings are deterministic (although your shared
> memory accesses will vary modulo the hub round-robin count).  You
> need to reach the waitvid before it's your turn to supply the next value.
> But given that, this is much like the old wait state sync feeding bytes
> to a floppy controller.  PLL and waitvid sync are achieved with
> magic incantations from Parallax, and it is not 100%.

Not 100%?  What does that mean?  Magic?  I guess this is the magic smoke 
you want to keep from getting out of the chip?

>> If the timings are deterministic, the skew should not be cumulative
>> since they are all based on the CPU clock.  Is the CPU clock from an
>> accurate oscillator like a crystal?  If it is using an internal RC
>> clock, again timing to sufficient accuracy is hopeless.
>
> The board has a CPU clock from which the PLL derives the video output
> frequency.  I recall the CPU clock being based on a crystal, but not one
> with any consideration for video intervals.  And the PLL's are per cog,
> again my comment about (potential lack of) global sync.

I still don't know enough about the architecture to know what this 
means.  I don't care if the CPUs are not coordinated closely.  If you 
have a video engine providing the clock timing, why would the CPU timing 
matter?

> Anyway, you should buy one and check it out.  I'd be curious to hear
> if (1) you also observe the same video quality, and (2) if you think it's
> the waitvid mechanism, more the PLL->SVGA generation, or the sync issues
> of the paired video generators.  They even supply the schematic, FWIW.

I appreciate your enthusiasm, but I have my own goals and projects.  I 
am currently oriented towards absurdly low power levels in digital 
designs and am working on a design that will require no explicit power 
source, it will scavenge power from the environment.  I don't think a 
Propeller is suitable for such a task is it?

Rick