EmbeddedRelated.com
Forums

PIC vs ARM assembler (no flamewar please)

Started by Unknown February 14, 2007
Wilco Dijkstra wrote:
> "Jim Granville" <no.spam@designtools.maps.co.nz> wrote in message > news:45db4806$1@clear.net.nz... > >>rickman wrote: > > >>>The TMS9900 used a pointer register (that's right, registers did not >>>go away) to point to the first register in memory. An ADD would then >>>take three memory accesses to complete rather than one clock cycle. >>>Even if you put the memory on chip, you either have to limit the >>>location of the registers to a special bank of fast, multiport memory >>>(register bank) or you have to accept multiple memory cycles for a >>>single instruction, even when working in registers. >> >>Sounds like a poor example of how anyone would do this today. >> >>Look at the XC166, and eZ8, for examples of how you can do >>very efficent memory overlays. >> >>In a uC, you are talking of a few K's of memory, so speed should >>not be an issue at all. > > > These are not examples of a RAM mapped register file, just of a > hardware assisted context switch. So the contents of the RAM are > copied to/from the register file but are not kept in sync until the next > context switch.
Which are not ? - perhaps you are talking about the TMS9900 ? If you meant the eZ8, then perhaps reading up on the Register Pointer operation would assist. In the eZ8, the register pointer adds to the 4 bit register operand, to map/overlay those 16 registers, into up to 12 bits of RAM
> > Even a few KB of SRAM is much slower than a register file.
Slower, yes. 'Much slower' is moot - given that the bottle neck in most CPUs/uC is code access from FLASH, and that on-chip SRAM speeds are MUCH FASTER than Flash speeds, so it's not looking like the determining-speed path. There seems to be no practical speed impact from this, when you look at the Mhz speeds of real devices like the St10/XC166 cores ? -jg
Jim Granville wrote:
> Wilco Dijkstra wrote: >> "Jim Granville" <no.spam@designtools.maps.co.nz> wrote in message >> news:45db4806$1@clear.net.nz... >> >>> rickman wrote: >> >> >>>> The TMS9900 used a pointer register (that's right, registers did not >>>> go away) to point to the first register in memory. An ADD would then >>>> take three memory accesses to complete rather than one clock cycle. >>>> Even if you put the memory on chip, you either have to limit the >>>> location of the registers to a special bank of fast, multiport memory >>>> (register bank) or you have to accept multiple memory cycles for a >>>> single instruction, even when working in registers. >>> >>> Sounds like a poor example of how anyone would do this today. >>> >>> Look at the XC166, and eZ8, for examples of how you can do >>> very efficent memory overlays. >>> >>> In a uC, you are talking of a few K's of memory, so speed should >>> not be an issue at all. >> >> >> These are not examples of a RAM mapped register file, just of a >> hardware assisted context switch. So the contents of the RAM are >> copied to/from the register file but are not kept in sync until the next >> context switch. > > Which are not ? - perhaps you are talking about the TMS9900 ? > > If you meant the eZ8, then perhaps reading up on the Register Pointer > operation would assist. In the eZ8, the register pointer adds to the > 4 bit register operand, to map/overlay those 16 registers, into up to 12 > bits of RAM
Indeed. The Register Pointer in itself is made up of two separate parts, and those parts are used to add to the register operand, as JG said. It allows you to have 4-bit addressing (a group of 16 working registers, with the full RP being used for the complete address), or 8-bit addressing (a page, with only half of the RP being used), or the absolute 12-bit address. Throw in compatibility with older code from when the Z8s could only address 2^8 bits of RAM (or register file), and you've got a pretty good blend of power and low code size. It's very logical and intuitive, when you think of it. >> >> Even a few KB of SRAM is much slower than a register file. > > Slower, yes. 'Much slower' is moot - given that the bottle neck in > most CPUs/uC is code access from FLASH, and that on-chip SRAM speeds > are MUCH FASTER than Flash speeds, so it's not looking like the > determining-speed path. > > There seems to be no practical speed impact from this, when you > look at the Mhz speeds of real devices like the St10/XC166 cores ? > > -jg Regards, D.
"David Brown" <david@westcontrol.removethisbit.com> wrote in message 
news:45db1975$0$31521$8404b019@news.wineasy.se...

> However, there is no fixed distinction between RISC and CISC. The two > terms refer to a range of characteristics commonly associated with RISC > cpus and CISC cpus. Some chips clearly fall into one camp or the other, > but most have at least slightly mixed characteristics.
RISC and CISC are about instruction set architecture, not implementation (although it does have an effect on the implementation).
>The ColdFire core is very much such a mixed chip - in terms of the ISA, it >is noticeably more RISCy than the 68k (especially the later cores with >their more complex addressing modes), and in terms of its implementation, >it is even more so. Even the original 68k, with its multiple registers and >(mostly) orthogonal instruction set is pretty RISCy.
Well, let's look at 10 features that are typical for most RISCs today: * large uniform register file: no (8 data + 8 address registers) * load/store architecture: no * naturally aligned load/store: no * simple addressing modes: no (9 variants, yes for ColdFire?) * fixed instruction sizes: no * simple instructions: no (yes for ColdFire) * calls place return address in a register: no * 3 operand ALU instructions: no * ALU instructions do not corrupt flags: no * delayed branch: no So that is 0 for 68K, 2 for ColdFire. ARM scores 8, Thumb scores 6, Thumb-2 7. MIPS scores 10 (very pure). This clearly shows 68K and ColdFire are CISCs, while the rest are RISCs.
> So the ARM is moving from a fairly pure RISC architecture, through the > Thumb (with it's more CISCy smaller register set and more specialised > register usage) and now Thumb-2 (with variable length instructions). It's > gaining CISC attributes in a move to improve code density at the expense > of more complex instruction decoding.
Yes, RISCs have become more complex. However that doesn't make them CISC! Although ARM is not a pure RISC to start with, Thumb-1 and Thumb-2 are only slighly more complex and still have most of the RISC characteristics.
> The ColdFire, on the other hand, has moved from the original 68k to a more > RISCy core, with a much greater emphasis on single-cycle > register-to-register instructions and a simpler and more efficient core, > in order to improve performance and lead to a smaller implementation.
Indeed, it has gained 2 points by removing some of the complex micro coded instrucions and addressing modes, thus allowing a simpler more pipelined implementation. But it clearly doesn't make it a RISC like the marketing people want us to believe...
> There are still plenty of differences between the architectures, but there > is no doubt that there are a lot more similarities between the ARM Thumb-2 > and the ColdFire than between the original ARM and the original 68k.
I'd say that any similarities only exist on a superficial level. For example the variable length instructions in Thumb-2 are easier to decode than 68K or ColdFire.
>> There are few RISCs with variable length instructions. >> > The AVR? I can't think of any others.
Hitachi SH and ARC for example. Wilco
> > Even a few KB of SRAM is much slower than a register file. > > Slower, yes. ....
Generally true, but there are exceptions. TIs 54xx DSPs have some registers memory adressable (not all, e.g. not the accumulators, just the so called "auxilary registers"). Whether they are really memory addresses or not I don't know, the RAM is on-chip at address 0 (about where these register are), and this RAM allows 2 accesses per cycle, so there is no slowdown out of that. But given that this architecture allows 3 RAM accesses per cycle (or was it 4?), this is hardly surprising, it is designed to not have a memory bottleneck. Dimiter On Feb 21, 1:00 am, Jim Granville <no.s...@designtools.maps.co.nz> wrote:
> Wilco Dijkstra wrote: > > "Jim Granville" <no.s...@designtools.maps.co.nz> wrote in message > >news:45db4806$1@clear.net.nz... > > >>rickman wrote: > > >>>The TMS9900 used a pointer register (that's right, registers did not > >>>go away) to point to the first register in memory. An ADD would then > >>>take three memory accesses to complete rather than one clock cycle. > >>>Even if you put the memory on chip, you either have to limit the > >>>location of the registers to a special bank of fast, multiport memory > >>>(register bank) or you have to accept multiple memory cycles for a > >>>single instruction, even when working in registers. > > >>Sounds like a poor example of how anyone would do this today. > > >>Look at the XC166, and eZ8, for examples of how you can do > >>very efficent memory overlays. > > >>In a uC, you are talking of a few K's of memory, so speed should > >>not be an issue at all. > > > These are not examples of a RAM mapped register file, just of a > > hardware assisted context switch. So the contents of the RAM are > > copied to/from the register file but are not kept in sync until the next > > context switch. > > Which are not ? - perhaps you are talking about the TMS9900 ? > > If you meant the eZ8, then perhaps reading up on the Register Pointer > operation would assist. In the eZ8, the register pointer adds to the > 4 bit register operand, to map/overlay those 16 registers, into up to 12 > bits of RAM > > > > > Even a few KB of SRAM is much slower than a register file. > > Slower, yes. 'Much slower' is moot - given that the bottle neck in > most CPUs/uC is code access from FLASH, and that on-chip SRAM speeds > are MUCH FASTER than Flash speeds, so it's not looking like the > determining-speed path. > > There seems to be no practical speed impact from this, when you > look at the Mhz speeds of real devices like the St10/XC166 cores ? > > -jg
"Jim Granville" <no.spam@designtools.maps.co.nz> wrote in message 
news:45db7d1e$1@clear.net.nz...
> Wilco Dijkstra wrote: >> "Jim Granville" <no.spam@designtools.maps.co.nz> wrote in message >> news:45db4806$1@clear.net.nz...
>>>Look at the XC166, and eZ8, for examples of how you can do >>>very efficent memory overlays. >>> >>>In a uC, you are talking of a few K's of memory, so speed should >>>not be an issue at all. >> >> These are not examples of a RAM mapped register file, just of a >> hardware assisted context switch. So the contents of the RAM are >> copied to/from the register file but are not kept in sync until the next >> context switch. > > Which are not ? - perhaps you are talking about the TMS9900 ?
No, I meant the XC166 (SPARC, AMD29K etc) register windows.
> If you meant the eZ8, then perhaps reading up on the Register Pointer > operation would assist. In the eZ8, the register pointer adds to the > 4 bit register operand, to map/overlay those 16 registers, into up to 12 > bits of RAM
The eZ8 is really weird indeed, you can either call it a CPU with a large register file or a CPU with no registers and direct memory addressing. The instruction cycle timings are pretty slow so its either fetch speed or the register access that is holding it back.
>> Even a few KB of SRAM is much slower than a register file. > > Slower, yes. 'Much slower' is moot - given that the bottle neck in > most CPUs/uC is code access from FLASH, and that on-chip SRAM speeds > are MUCH FASTER than Flash speeds, so it's not looking like the > determining-speed path.
While SRAM is faster than flash, it wouldn't be fast enough to be used like a register in a simple MCU. On ARM7 for example, register read, ALU operation and register write all happen within one clock cycle. With SRAM the cycle time would become 3-4 times as long (not to mention power consumption).
> There seems to be no practical speed impact from this, when you > look at the Mhz speeds of real devices like the St10/XC166 cores ?
That's because the XC166 uses registers and not RAM. Wilco
Wilco Dijkstra wrote:
> > While SRAM is faster than flash, it wouldn't be fast enough to be > used like a register in a simple MCU. On ARM7 for example, > register read, ALU operation and register write all happen within > one clock cycle. With SRAM the cycle time would become 3-4 > times as long (not to mention power consumption).
To get a handle on what On-Chip, small RAM speeds can achieve, in real silicon, look at the FPGA Block Sync RAMS - those are smallish block, Dual ported, and plenty fast enough to keep up with the cycle times of a CPU. I don't see FPGA CPUs being held back by their 'slow sram', as you claim ?. RAM based DSPs are now pushing 1GHz, and that's larger chunks of RAM than are needed for register-maped-memory. -jg
On Wed, 21 Feb 2007 14:56:59 +1300, Jim Granville
<no.spam@designtools.maps.co.nz> wrote:

>Wilco Dijkstra wrote: >> >> While SRAM is faster than flash, it wouldn't be fast enough to be >> used like a register in a simple MCU. On ARM7 for example, >> register read, ALU operation and register write all happen within >> one clock cycle. With SRAM the cycle time would become 3-4 >> times as long (not to mention power consumption). > > To get a handle on what On-Chip, small RAM speeds can achieve, in real >silicon, look at the FPGA Block Sync RAMS - those are smallish block, >Dual ported, and plenty fast enough to keep up with the cycle times of a >CPU. > I don't see FPGA CPUs being held back by their 'slow sram', >as you claim ?. > RAM based DSPs are now pushing 1GHz, and that's larger chunks >of RAM than are needed for register-maped-memory.
I just dumped my message in progress on this -- you said what I wanted to say very clearly. I use such DSPs. I think Wilco must be stuck thinking in terms of external bus drivers where what is connected is unknown and the bus interface designer must work to worst cases. Too much ARM, perhaps? Jon
On Wed, 21 Feb 2007 00:02:27 GMT, "Wilco Dijkstra"
<Wilco_dot_Dijkstra@ntlworld.com> wrote:

>"David Brown" <david@westcontrol.removethisbit.com> wrote in message >news:45db1975$0$31521$8404b019@news.wineasy.se... > >> However, there is no fixed distinction between RISC and CISC. The two >> terms refer to a range of characteristics commonly associated with RISC >> cpus and CISC cpus. Some chips clearly fall into one camp or the other, >> but most have at least slightly mixed characteristics. > >RISC and CISC are about instruction set architecture, not implementation >(although it does have an effect on the implementation). ><snip>
I respect your knowledge and skill, Wilco, but I cannot agree with this as I understand you writing it here based upon my experiences. I spent 1-on-1 time with Hennessy and listened to the reasoning he used. RISC was all about thinking in detailed terms of practical implementation. They were faced with access to lower-technology FABs (larger feature sizes, fewer transmission gates and inverters, etc.) and wanted to achieve more with less. Doing that was everything about implementation and the instruction set architecture was allowed to go where it must. That this worked out to being a 'reduced instruction set' was something that came out of achieving competing performance out of lower-tech FAB capability than folks like Intel or Motorola had available to their flagship lines of the day. There was a design philosophy based upon theory -- that was simply the realization that many of the things that slowed down a CISC was also a matter of perceived convenience for programmers, so the policy was then to get rid of anything and everything that slowed down the clock rate without paying _well_ for that delay. A focus on throughput. The fact that removing barriers to speed also happened to reduce the need for more transistor equivalents was the happy coincidence that fueled the initiative. The instructions were a result of the application of focusing on implementation details -- not some instruction set theory under which the implementation then followed. If higher level features were cheap to implement and paid for themselves in performance, they were simply kept. Very practical, hard nosed approach. If you ever listened to such a lecture by those actually doing the work, you'd see this narrow focus. The register flags that signalled whether or not a register was in-use as a destination were tossed as too expensive -- they required infrastructure in order to delay the processor and the combinatorial worst-case path of the whole of that meant additional __delay__ in each clock cycle, whether or not this interlock was useful instruction to instruction. You paid for it on every cycle, need it or not. So out it went. No interlocks. Sorry. Similar thinking was involved in the Alpha's refusal to do 'lane changes,' for example. Hennessy had a huge blow up of the 68020 CPU in one room at MIPS (which was quite near Weitek, at the time), when I visited. He would go through each and every detail of the implementation there and talk about it, at length, and explain why it was worthwhile... or not... and what the exact quantitative cost was in each cycle's timing and over the broader arch of an application. Some of the difficulties were higher memory bandwidths required, once you started tossing out stuff like register interlocks, microstore and its associated sequencing overhead, lane changing, etc. But if that could be satisfied, and that was kind of possible at the time with some static ram from performance semi, it would perform like a bat out of hell. So to speak. But the focus was on implementation on lower-tech FABs and, while doing that, still competing with CISC and beating it. Of course, FABs got a lot better and access to high tech FAB resources became increasingly brokered to keep them running 24/7, and the driving need for lower-tech feature sizes became relaxed. Also, CISC looking external designs could now be designed with internal RISC processors, built-in TLBs, re-order buffers, registration stations with multiple functional units to share, jump prediction, .... so much so, that in fact Intel started putting L1 cache memory on-die. There was so much excess available, they ran out of nifty ideas and the best they knew to do with it was suck up die space with cache memory. So the RISC drive relaxed. At least, on the consumer market area. But for those making cheap embedded controllers, I suspect that die size and effectively using somewhat lower FAB technology remains useful. So the low-transistor count approaches once the much lauded domain of RISC remain important. Jon
Jonathan Kirwan wrote:
> On Wed, 21 Feb 2007 00:02:27 GMT, "Wilco Dijkstra" > <Wilco_dot_Dijkstra@ntlworld.com> wrote: > > >>"David Brown" <david@westcontrol.removethisbit.com> wrote in message >>news:45db1975$0$31521$8404b019@news.wineasy.se... >> >> >>>However, there is no fixed distinction between RISC and CISC. The two >>>terms refer to a range of characteristics commonly associated with RISC >>>cpus and CISC cpus. Some chips clearly fall into one camp or the other, >>>but most have at least slightly mixed characteristics. >> >>RISC and CISC are about instruction set architecture, not implementation >>(although it does have an effect on the implementation). >><snip> > > > I respect your knowledge and skill, Wilco, but I cannot agree with > this as I understand you writing it here based upon my experiences. > > I spent 1-on-1 time with Hennessy and listened to the reasoning he > used. RISC was all about thinking in detailed terms of practical > implementation. They were faced with access to lower-technology FABs > (larger feature sizes, fewer transmission gates and inverters, etc.) > and wanted to achieve more with less. Doing that was everything about > implementation and the instruction set architecture was allowed to go > where it must. That this worked out to being a 'reduced instruction > set' was something that came out of achieving competing performance > out of lower-tech FAB capability than folks like Intel or Motorola had > available to their flagship lines of the day. > > There was a design philosophy based upon theory -- that was simply the > realization that many of the things that slowed down a CISC was also a > matter of perceived convenience for programmers, so the policy was > then to get rid of anything and everything that slowed down the clock > rate without paying _well_ for that delay. A focus on throughput. The > fact that removing barriers to speed also happened to reduce the need > for more transistor equivalents was the happy coincidence that fueled > the initiative. The instructions were a result of the application of > focusing on implementation details -- not some instruction set theory > under which the implementation then followed. If higher level features > were cheap to implement and paid for themselves in performance, they > were simply kept. Very practical, hard nosed approach. > > If you ever listened to such a lecture by those actually doing the > work, you'd see this narrow focus. The register flags that signalled > whether or not a register was in-use as a destination were tossed as > too expensive -- they required infrastructure in order to delay the > processor and the combinatorial worst-case path of the whole of that > meant additional __delay__ in each clock cycle, whether or not this > interlock was useful instruction to instruction. You paid for it on > every cycle, need it or not. So out it went. No interlocks. Sorry. > Similar thinking was involved in the Alpha's refusal to do 'lane > changes,' for example. > > Hennessy had a huge blow up of the 68020 CPU in one room at MIPS > (which was quite near Weitek, at the time), when I visited. He would > go through each and every detail of the implementation there and talk > about it, at length, and explain why it was worthwhile... or not... > and what the exact quantitative cost was in each cycle's timing and > over the broader arch of an application. > > Some of the difficulties were higher memory bandwidths required, once > you started tossing out stuff like register interlocks, microstore and > its associated sequencing overhead, lane changing, etc. But if that > could be satisfied, and that was kind of possible at the time with > some static ram from performance semi, it would perform like a bat out > of hell. So to speak. > > But the focus was on implementation on lower-tech FABs and, while > doing that, still competing with CISC and beating it. > > Of course, FABs got a lot better and access to high tech FAB resources > became increasingly brokered to keep them running 24/7, and the > driving need for lower-tech feature sizes became relaxed. Also, CISC > looking external designs could now be designed with internal RISC > processors, built-in TLBs, re-order buffers, registration stations > with multiple functional units to share, jump prediction, .... so much > so, that in fact Intel started putting L1 cache memory on-die. There > was so much excess available, they ran out of nifty ideas and the best > they knew to do with it was suck up die space with cache memory. > > So the RISC drive relaxed. At least, on the consumer market area. > > But for those making cheap embedded controllers, I suspect that die > size and effectively using somewhat lower FAB technology remains > useful. So the low-transistor count approaches once the much lauded > domain of RISC remain important.
All one can really derive in meaning from RISC, is Reduced Instruction Set Computer - any other assertions become in the eye of the beholder, or worse, spin doctoring - so there is little point in slicing and dicing the details of what is, or is not, RISC. -jg
On Wed, 21 Feb 2007 16:39:37 +1300, Jim Granville
<no.spam@designtools.maps.co.nz> wrote:

>Jonathan Kirwan wrote: >> On Wed, 21 Feb 2007 00:02:27 GMT, "Wilco Dijkstra" >> <Wilco_dot_Dijkstra@ntlworld.com> wrote: >> >> >>>"David Brown" <david@westcontrol.removethisbit.com> wrote in message >>>news:45db1975$0$31521$8404b019@news.wineasy.se... >>> >>> >>>>However, there is no fixed distinction between RISC and CISC. The two >>>>terms refer to a range of characteristics commonly associated with RISC >>>>cpus and CISC cpus. Some chips clearly fall into one camp or the other, >>>>but most have at least slightly mixed characteristics. >>> >>>RISC and CISC are about instruction set architecture, not implementation >>>(although it does have an effect on the implementation). >>><snip> >> >> >> I respect your knowledge and skill, Wilco, but I cannot agree with >> this as I understand you writing it here based upon my experiences. >> >> I spent 1-on-1 time with Hennessy and listened to the reasoning he >> used. RISC was all about thinking in detailed terms of practical >> implementation. They were faced with access to lower-technology FABs >> (larger feature sizes, fewer transmission gates and inverters, etc.) >> and wanted to achieve more with less. Doing that was everything about >> implementation and the instruction set architecture was allowed to go >> where it must. That this worked out to being a 'reduced instruction >> set' was something that came out of achieving competing performance >> out of lower-tech FAB capability than folks like Intel or Motorola had >> available to their flagship lines of the day. >> >> There was a design philosophy based upon theory -- that was simply the >> realization that many of the things that slowed down a CISC was also a >> matter of perceived convenience for programmers, so the policy was >> then to get rid of anything and everything that slowed down the clock >> rate without paying _well_ for that delay. A focus on throughput. The >> fact that removing barriers to speed also happened to reduce the need >> for more transistor equivalents was the happy coincidence that fueled >> the initiative. The instructions were a result of the application of >> focusing on implementation details -- not some instruction set theory >> under which the implementation then followed. If higher level features >> were cheap to implement and paid for themselves in performance, they >> were simply kept. Very practical, hard nosed approach. >> >> If you ever listened to such a lecture by those actually doing the >> work, you'd see this narrow focus. The register flags that signalled >> whether or not a register was in-use as a destination were tossed as >> too expensive -- they required infrastructure in order to delay the >> processor and the combinatorial worst-case path of the whole of that >> meant additional __delay__ in each clock cycle, whether or not this >> interlock was useful instruction to instruction. You paid for it on >> every cycle, need it or not. So out it went. No interlocks. Sorry. >> Similar thinking was involved in the Alpha's refusal to do 'lane >> changes,' for example. >> >> Hennessy had a huge blow up of the 68020 CPU in one room at MIPS >> (which was quite near Weitek, at the time), when I visited. He would >> go through each and every detail of the implementation there and talk >> about it, at length, and explain why it was worthwhile... or not... >> and what the exact quantitative cost was in each cycle's timing and >> over the broader arch of an application. >> >> Some of the difficulties were higher memory bandwidths required, once >> you started tossing out stuff like register interlocks, microstore and >> its associated sequencing overhead, lane changing, etc. But if that >> could be satisfied, and that was kind of possible at the time with >> some static ram from performance semi, it would perform like a bat out >> of hell. So to speak. >> >> But the focus was on implementation on lower-tech FABs and, while >> doing that, still competing with CISC and beating it. >> >> Of course, FABs got a lot better and access to high tech FAB resources >> became increasingly brokered to keep them running 24/7, and the >> driving need for lower-tech feature sizes became relaxed. Also, CISC >> looking external designs could now be designed with internal RISC >> processors, built-in TLBs, re-order buffers, registration stations >> with multiple functional units to share, jump prediction, .... so much >> so, that in fact Intel started putting L1 cache memory on-die. There >> was so much excess available, they ran out of nifty ideas and the best >> they knew to do with it was suck up die space with cache memory. >> >> So the RISC drive relaxed. At least, on the consumer market area. >> >> But for those making cheap embedded controllers, I suspect that die >> size and effectively using somewhat lower FAB technology remains >> useful. So the low-transistor count approaches once the much lauded >> domain of RISC remain important. > > All one can really derive in meaning from RISC, is Reduced Instruction >Set Computer - any other assertions become in the eye of the beholder, >or worse, spin doctoring - so there is little point in slicing and >dicing the details of what is, or is not, RISC.
Real meaning is found in the details of how things work, not in some banner or ideology. Which is, I suppose, about what I said. Thanks, Jon