EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

ARM/Atmel buses, architecture - Naimi

Started by Veek. M June 10, 2016
On 11.6.2016 г. 05:35, rickman wrote:
> On 6/10/2016 9:07 PM, George Neuner wrote: >> On Fri, 10 Jun 2016 12:22:10 -0500, Tim Wescott >> <seemywebsite@myfooter.really> wrote: >> >>> A "pure" (or maybe "old-time") Harvard architecture is one where >>> instructions and data are _entirely_ separate >> >> The most commonly seen Harvard CPUs are the "modified" variant which >> allows code and data to be together in a common memory, but *cached* >> separately. [DSPs obviously have other ideas]. > > I guess my question would be, what is the point of drawing a distinction > between Harvard, modified Harvard and von Neumann? Sure there are a few > advantages to separating code and data cache, but I consider that to be > an issue of cache design. I have never even given any thought to which > of the three basic architectures a given CPU used. >
I guess none of us give it much thought what an architecture is called of course. Then dwelling onto what 1940-s term to use on a post 2000 CPU seems somewhat out of place anyway. We just get to the details and use the parts as they are. George separates the DSP-s for a good reason. The one I know well is the 5420 of TI (have made stuff with it, wrote an assembler for it). Well it does have separate program and data buses - and memories - internally. Some of the data RAM was "dual access" per cycle; in fact it could do 3 data accesses and one program access per cycle (how else is it supposed to do a MAC per cycle anyway). No caches to speak of, it is a small device - 2 cores were consuming just about 300mW IIRC, this at 100 MHz clock - not bad at all for an end of the 90-s processor. And I don't remember encountering the word "Harvard" while I was at it, not that it would not fit. Or may be I have but did not notice it. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
On Fri, 10 Jun 2016 22:35:27 -0400, rickman <gnuarm@gmail.com> wrote:

>On 6/10/2016 9:07 PM, George Neuner wrote: >> On Fri, 10 Jun 2016 12:22:10 -0500, Tim Wescott >> <seemywebsite@myfooter.really> wrote: >> >>> A "pure" (or maybe "old-time") Harvard architecture is one where >>> instructions and data are _entirely_ separate >> >> The most commonly seen Harvard CPUs are the "modified" variant which >> allows code and data to be together in a common memory, but *cached* >> separately. [DSPs obviously have other ideas]. > >I guess my question would be, what is the point of drawing a distinction >between Harvard, modified Harvard and von Neumann? Sure there are a few >advantages to separating code and data cache, but I consider that to be >an issue of cache design. I have never even given any thought to which >of the three basic architectures a given CPU used.
The essense of the von Neumann architecture is the idea that the program is stored in memory, as opposed to being hard-wired. In this respect, Harvard is merely a variant of von Neumann. However, classic von Neumann acknowledges that code itself is also data for other code, and thus it allows for the idea that the program can modify itself. Self-modifying code is a serious issue for deep pipelines: the old version of a modified instruction may already have been fetched. Detecting and dealing with this is very expensive: partial results must be discarded, (at least parts of) the pipeline must be flushed, and the modified instruction stream must be restarted. The Harvard design became popular because it actively discourages trying to write self-modifying code. But carrying separation of code and data all the way to disjoint memory is expensive with large memories, so CPUs tend to have unified memory with disjoint caching [the "modified" Harvard design]. George
On 11.6.2016 &#1075;. 22:51, George Neuner wrote:
>.... > Self-modifying code is a serious issue for deep pipelines: the old > version of a modified instruction may already have been fetched.
> .... Not exactly this but in this line of thought something got me some 15 years ago when I was dealing with the 5420 in a way I still remember, wasted me a day perhaps. I had written some self modifying code - not for use in the end device, just a one time utility for me. Ran it on the 5420 (was computationally intensive and this was my fastest option at the time, then I had made a new toolchain for it so I wanted to use it etc.). Something pretty simple did not work; the self modifying code seemed not to get modified or something. But when I looked with the monitor (I had done that for the 5420, too) it was OK. Yet it ran as if it was not. Turned out I had not read the entire errata sheet. There was something about writing to program memory which got never initiated unless a write to some data memory took place after it (or something of the sort)... So when the monitor trapped it obviously did write to data memory - stack etc. - and I saw the correct program memory... Still remember it, could even locate the source (just found the comment "hopefully some write has begun....", did not try to get into the details again of course). Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
On 12/06/16 05:51, George Neuner wrote:
> On Fri, 10 Jun 2016 22:35:27 -0400, rickman <gnuarm@gmail.com> wrote: > >> On 6/10/2016 9:07 PM, George Neuner wrote: >>> On Fri, 10 Jun 2016 12:22:10 -0500, Tim Wescott >>> <seemywebsite@myfooter.really> wrote: >>> >>>> A "pure" (or maybe "old-time") Harvard architecture is one where >>>> instructions and data are _entirely_ separate >>> >>> The most commonly seen Harvard CPUs are the "modified" variant which >>> allows code and data to be together in a common memory, but *cached* >>> separately. [DSPs obviously have other ideas]. >> >> I guess my question would be, what is the point of drawing a distinction >> between Harvard, modified Harvard and von Neumann? Sure there are a few >> advantages to separating code and data cache, but I consider that to be >> an issue of cache design. I have never even given any thought to which >> of the three basic architectures a given CPU used. > > The essense of the von Neumann architecture is the idea that the > program is stored in memory, as opposed to being hard-wired. In this > respect, Harvard is merely a variant of von Neumann. > > However, classic von Neumann acknowledges that code itself is also > data for other code, and thus it allows for the idea that the program > can modify itself. > > Self-modifying code is a serious issue for deep pipelines:
And yet, "self-modifying code" is exactly what the Java Hot-spot JIT compiler is. The first and second time through, the byte-code is interpreted, and after the second it's on a queue of things to compile to native code. When a compile thread completes that task, that code path diverges to the newly-generated native code. The first task of that code is often to decide whether the branches further down this path are what this code has been optimised for, and to return to the interpreter if the assumptions are not met. My point is, self-modifying code, under the right conditions, is not only possible, but is the right solution for some problems.
On Sun, 12 Jun 2016 02:13:35 +0300, Dimiter_Popoff <dp@tgi-sci.com>
wrote:

>On 11.6.2016 ?. 22:51, George Neuner wrote: >>.... >> Self-modifying code is a serious issue for deep pipelines: the old >> version of a modified instruction may already have been fetched. > > .... > >Not exactly this but in this line of thought something got me some 15 >years ago when I was dealing with the 5420 in a way I still remember, >wasted me a day perhaps. >I had written some self modifying code - not for use in the end device, >just a one time utility for me. Ran it on the 5420 (was computationally >intensive and this was my fastest option at the time, then I had made >a new toolchain for it so I wanted to use it etc.). >Something pretty simple did not work; the self modifying code seemed >not to get modified or something. But when I looked with the monitor >(I had done that for the 5420, too) it was OK. Yet it ran as if it >was not. >Turned out I had not read the entire errata sheet. There was something >about writing to program memory which got never initiated unless a write >to some data memory took place after it (or something of the sort)... >So when the monitor trapped it obviously did write to data >memory - stack etc. - and I saw the correct program memory... > >Still remember it, could even locate the source (just found the >comment "hopefully some write has begun....", did not try to >get into the details again of course). > >Dimiter
It may be that you didn't flush the write(s) ??? AFAIK, starting from the Pentium Pro, all the Intel chips have snooped data writes and automatically invalidated corresponding code cache addresses. They also monitor the code cache and flush the pipeline if any in-flight instruction is invalidated. Modern chips snoop the unified L2 cache so cross-core writes can be seen earlier [before they go all the way to memory]. I would have thought that the E5420 would be in that class, but some of the old chips only monitored actual memory writes. Still, the write wouldn't be seen until it hit at least the L2 cache. The E5420 had write-back L1, so the writes would have needed to be flushed explicitly [or you would have needed to wait until the modified lines were replaced.] Intel is rather inexplicably nice to self-modifying code: generally the worst that happens is that it will have poor performance. Many other manufacturers are actively hostile: a lot of chips make even writing _correct_ self-modifying code a challenge. On most non-Intel chips, in addition to flushing code modifying writes all the way to memory, you must deliberately invalidate the modified addresses in the code cache. On many chips you must also deliberately invalidate branch predictions. And then, with some chips, you still must time everything correctly because the chip will continue to execute already fetched instructions regardless of whether they have been invalidated in cache. George
On Sun, 12 Jun 2016 11:01:16 +1000, Clifford Heath
<no.spam@please.net> wrote:

>On 12/06/16 05:51, George Neuner wrote: > >> Self-modifying code is a serious issue for deep pipelines: > >And yet, "self-modifying code" is exactly what the Java Hot-spot JIT >compiler is. The first and second time through, the byte-code is >interpreted, and after the second it's on a queue of things to compile >to native code. When a compile thread completes that task, that code >path diverges to the newly-generated native code. The first task of that >code is often to decide whether the branches further down this path are >what this code has been optimised for, and to return to the interpreter >if the assumptions are not met. > >My point is, self-modifying code, under the right conditions, is not >only possible, but is the right solution for some problems.
JIT generation is not really applicable. It is "self-modifying" in a broad sense, but not in a way that is detrimental to code correctness. [performance is a different issue] Although it is called "Just In Time", the reality is that JIT code isn't being rewritten *AS* it is being executed: that is, a block of code is generated, and only when it is complete is the CPU permitted to enter it. There is no concern that old (incorrect) instructions will be fetched and executed before they are overwritten with new (correct) instructions. Most Harvard chips have no real issues dealing with JIT code because new code typically is at a different address than the old code it replaces. All the JIT systems I am familiar with indirect calls through a jump table rather than patching call sites directly, so that they are free to replace code blocks at will. Which is not to say that old code addresses, branch prediction targets, etc. should not also be invalidated [and also external things like changing page protections]: these are needed so that code generation buffers can be reused. But typically reuse happens at timescales that ensure any cache traces would have been aged out and gone anyway. George
On Sun, 12 Jun 2016 11:01:16 +1000, Clifford Heath
<no.spam@please.net> wrote:

>My point is, self-modifying code, under the right conditions, is not >only possible, but is the right solution for some problems.
While self modifying code was necessary in old computers without index registers, I am surprised it is still used today. In those old computers the low end of the instruction word was the data address. These bits in the instruction word was modified and the instruction executed to access a different element in an array. Sequentially accessing an array was simple by incrementing the whole instruction word, which incremented the address part by one. Of course, you needed to have an array limit check to compare the actual instruction word with the last instruction word. Failing to do this and sooner or later, the instruction word would become a completely different instruction :-)
On 12.6.2016 &#1075;. 05:40, George Neuner wrote:
> On Sun, 12 Jun 2016 02:13:35 +0300, Dimiter_Popoff <dp@tgi-sci.com> > wrote: > >> On 11.6.2016 ?. 22:51, George Neuner wrote: >>> .... >>> Self-modifying code is a serious issue for deep pipelines: the old >>> version of a modified instruction may already have been fetched. >>> .... >> >> Not exactly this but in this line of thought something got me some 15 >> years ago when I was dealing with the 5420 in a way I still remember, >> wasted me a day perhaps. >> I had written some self modifying code - not for use in the end device, >> just a one time utility for me. Ran it on the 5420 (was computationally >> intensive and this was my fastest option at the time, then I had made >> a new toolchain for it so I wanted to use it etc.). >> Something pretty simple did not work; the self modifying code seemed >> not to get modified or something. But when I looked with the monitor >> (I had done that for the 5420, too) it was OK. Yet it ran as if it >> was not. >> Turned out I had not read the entire errata sheet. There was something >> about writing to program memory which got never initiated unless a write >> to some data memory took place after it (or something of the sort)... >> So when the monitor trapped it obviously did write to data >> memory - stack etc. - and I saw the correct program memory... >> >> Still remember it, could even locate the source (just found the >> comment "hopefully some write has begun....", did not try to >> get into the details again of course). >> >> Dimiter > > It may be that you didn't flush the write(s) ??? > > AFAIK, starting from the Pentium Pro, all the Intel chips have snooped > data writes and automatically invalidated corresponding code cache > addresses. They also monitor the code cache and flush the pipeline if > any in-flight instruction is invalidated. > > Modern chips snoop the unified L2 cache so cross-core writes can be > seen earlier [before they go all the way to memory]. I would have > thought that the E5420 would be in that class, but some of the old > chips only monitored actual memory writes. > > Still, the write wouldn't be seen until it hit at least the L2 cache. > The E5420 had write-back L1, so the writes would have needed to be > flushed explicitly [or you would have needed to wait until the > modified lines were replaced.] > > > Intel is rather inexplicably nice to self-modifying code: generally > the worst that happens is that it will have poor performance. Many > other manufacturers are actively hostile: a lot of chips make even > writing _correct_ self-modifying code a challenge. > > On most non-Intel chips, in addition to flushing code modifying writes > all the way to memory, you must deliberately invalidate the modified > addresses in the code cache. On many chips you must also deliberately > invalidate branch predictions. > > And then, with some chips, you still must time everything correctly > because the chip will continue to execute already fetched instructions > regardless of whether they have been invalidated in cache. > > George >
Hah, there is a nice misunderstanding :). It was not an Intel part and I did not know 5420 did apply to one of theirs. It was a TI TMS-whatever-5420 dsp (of their C54xx series). I certainly have had my share of forgetting to invalidate the I-cache on power(PPC) while I was porting DPS to it (well over 10 years ago), but these were just routine errors, easy to catch. Nowhere near as nasty as the one I remember for the (my...) 5420 which took reading the errata sheet to comprehend (the 5420 DSP has no caches etc., no sync , dcbf, icbi etc. opcodes at all, they had messed something up internally - not critical as long as one was aware of it). Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
On Sun, 12 Jun 2016 10:22:34 +0300, Dimiter_Popoff <dp@tgi-sci.com>
wrote:

>On 12.6.2016 ?. 05:40, George Neuner wrote:
[some stuff about the Intel E5420 ]
>Hah, there is a nice misunderstanding :). It was not an Intel part >and I did not know 5420 did apply to one of theirs. > >It was a TI TMS-whatever-5420 dsp (of their C54xx series).
Ah. I never worked with TI - all my DSP work was with ADI chips. George
The 2026 Embedded Online Conference