EmbeddedRelated.com
Forums

intel 386 et al

Started by Hul Tytus September 11, 2019
On Mon, 16 Sep 2019 14:57:38 -0400, George Neuner
<gneuner2@comcast.net> wrote:

>On Thu, 12 Sep 2019 10:37:40 +0200, David Brown ><david.brown@hesbynett.no> wrote: > >>On 11/09/2019 23:27, Paul Rubin wrote: >>> Hul Tytus <ht@panix.com> writes: >>>> Anyone know of a good handbook describing the machine code of Intel's >>>> 386's? A page describing each instruction along with some text about >> >>I presume this is for some sort of history project? >> >>> I don't know if it's exactly the format you wanted, but I liked >>> "Programming the 80386" by John Crawford and Patrick Gelsinger, who were >>> involved with the 386's design. It did a good job of explaining how >>> memory mapping, the protected mode segment registers, call gates for >>> crossing privilege domains etc. all worked. I still don't understand >>> why today's OS's don't use those features. They would also allow >>> application programs to be set up like miniature OS's with protected >>> memory regions, for things like in-memory databases. >>> >> >>It is a /long/ time since I have read details of the 386 - you are >>talking about a processor that was outdated over 25 years ago. >> >>However, if my memory and understanding is correct, many of these >>advanced protection features were overly complex and extremely slow. > >That's true ... validating segment descriptors and segment limits when >loading the segment selector took > 1000 cycles. This was so onerous >that the i486 and later included a small cache of validated >descriptors. But the cache never was large enough to help programs >that needed to use many segments - IIRC, it held only 6 entries - and >as time went on it shrank to just 2 entries.
Descriptor loading was slow, but I'm pretty sure it wasn't *that* slow. It's hard to imagine what it could be doing for that long. I remember times more like 40 clocks on a 286.
On Tue, 17 Sep 2019 01:01:22 -0500, Robert Wessel
<robertwessel2@yahoo.com> wrote:

>On Mon, 16 Sep 2019 14:57:38 -0400, George Neuner ><gneuner2@comcast.net> wrote: > >>On Thu, 12 Sep 2019 10:37:40 +0200, David Brown >><david.brown@hesbynett.no> wrote: >> >>>On 11/09/2019 23:27, Paul Rubin wrote: >>>> Hul Tytus <ht@panix.com> writes: >>>>> Anyone know of a good handbook describing the machine code of Intel's >>>>> 386's? A page describing each instruction along with some text about >>> >>>I presume this is for some sort of history project? >>> >>>> I don't know if it's exactly the format you wanted, but I liked >>>> "Programming the 80386" by John Crawford and Patrick Gelsinger, who were >>>> involved with the 386's design. It did a good job of explaining how >>>> memory mapping, the protected mode segment registers, call gates for >>>> crossing privilege domains etc. all worked. I still don't understand >>>> why today's OS's don't use those features. They would also allow >>>> application programs to be set up like miniature OS's with protected >>>> memory regions, for things like in-memory databases. >>>> >>> >>>It is a /long/ time since I have read details of the 386 - you are >>>talking about a processor that was outdated over 25 years ago. >>> >>>However, if my memory and understanding is correct, many of these >>>advanced protection features were overly complex and extremely slow. >> >>That's true ... validating segment descriptors and segment limits when >>loading the segment selector took > 1000 cycles.
Sounds more about iAPX432, in which some instructions were real slow.
>This was so onerous >>that the i486 and later included a small cache of validated >>descriptors. But the cache never was large enough to help programs >>that needed to use many segments - IIRC, it held only 6 entries - and >>as time went on it shrank to just 2 entries. > > >Descriptor loading was slow, but I'm pretty sure it wasn't *that* >slow. It's hard to imagine what it could be doing for that long. I >remember times more like 40 clocks on a 286.
On Tue, 17 Sep 2019 01:01:22 -0500, Robert Wessel
<robertwessel2@yahoo.com> wrote:

>On Mon, 16 Sep 2019 14:57:38 -0400, George Neuner ><gneuner2@comcast.net> wrote: > >>On Thu, 12 Sep 2019 10:37:40 +0200, David Brown >><david.brown@hesbynett.no> wrote: >> >>>On 11/09/2019 23:27, Paul Rubin wrote: >>>> Hul Tytus <ht@panix.com> writes: >>>>> Anyone know of a good handbook describing the machine code of Intel's >>>>> 386's? A page describing each instruction along with some text about >>> >>>I presume this is for some sort of history project? >>> >>>> I don't know if it's exactly the format you wanted, but I liked >>>> "Programming the 80386" by John Crawford and Patrick Gelsinger, who were >>>> involved with the 386's design. It did a good job of explaining how >>>> memory mapping, the protected mode segment registers, call gates for >>>> crossing privilege domains etc. all worked. I still don't understand >>>> why today's OS's don't use those features. They would also allow >>>> application programs to be set up like miniature OS's with protected >>>> memory regions, for things like in-memory databases. >>>> >>> >>>It is a /long/ time since I have read details of the 386 - you are >>>talking about a processor that was outdated over 25 years ago. >>> >>>However, if my memory and understanding is correct, many of these >>>advanced protection features were overly complex and extremely slow. >> >>That's true ... validating segment descriptors and segment limits when >>loading the segment selector took > 1000 cycles. This was so onerous >>that the i486 and later included a small cache of validated >>descriptors. But the cache never was large enough to help programs >>that needed to use many segments - IIRC, it held only 6 entries - and >>as time went on it shrank to just 2 entries. > > >Descriptor loading was slow, but I'm pretty sure it wasn't *that* >slow. It's hard to imagine what it could be doing for that long. I >remember times more like 40 clocks on a 286.
Protected mode segment switching on the i286 was fairly quick, but the i386 (and later) behaved very differently. Loading a segment register in protected mode could result in a long sequence if the descriptor was not already in the descriptor cache: - read the descriptor from memory into the cache - validate the descriptor contents And on the i386 and later - set the "Accessed" bit in the descriptor - write back the modified descriptor to memory The i386 additionally performed an unnecessary limit check on the current offset value, but did not throw any faults when the check was done as part of the descriptor load - it just wasted additional cycles. Validating the descriptor was done in microcode and could take hundreds of cycles. The i286 did not define or check many of the descriptor's control bits, whereas the i386 and later defined and checked all the bits. The i286 did /not/ modify and write back the descriptor - the "Accessed" bit was defined for the i286 but was not set by the hardware [if the OS used it, it had to deal with it manually]. The i386 and later automatically set the bit on load and wrote back the descriptor to memory. [The i486 and later had data caches to absorb the write, but the i386 did not.] The i286 and i386 had only one descriptor cache line per segment register, so they took the full descriptor load hit every time the register was modified. The caches became multi-way in later chips so as to (try to) keep already-validated descriptors available in case they were needed again. I know that the slowness of protected mode segment switching has been discussed at length in the past - if not here, then in the arch or x86 forums. Unfortunately I can't easily locate an online reference for segment switch times. I figured Agner Fog would have something, but he doesn't seem to have benchmarked the system instructions. [Or if he has, I stupidly can't seem to find his results]. George
On Tue, 17 Sep 2019 17:51:01 -0400, George Neuner
<gneuner2@comcast.net> wrote:

>On Tue, 17 Sep 2019 01:01:22 -0500, Robert Wessel ><robertwessel2@yahoo.com> wrote: > >>On Mon, 16 Sep 2019 14:57:38 -0400, George Neuner >><gneuner2@comcast.net> wrote: >> >>>On Thu, 12 Sep 2019 10:37:40 +0200, David Brown >>><david.brown@hesbynett.no> wrote: >>> >>>>On 11/09/2019 23:27, Paul Rubin wrote: >>>>> Hul Tytus <ht@panix.com> writes: >>>>>> Anyone know of a good handbook describing the machine code of Intel's >>>>>> 386's? A page describing each instruction along with some text about >>>> >>>>I presume this is for some sort of history project? >>>> >>>>> I don't know if it's exactly the format you wanted, but I liked >>>>> "Programming the 80386" by John Crawford and Patrick Gelsinger, who were >>>>> involved with the 386's design. It did a good job of explaining how >>>>> memory mapping, the protected mode segment registers, call gates for >>>>> crossing privilege domains etc. all worked. I still don't understand >>>>> why today's OS's don't use those features. They would also allow >>>>> application programs to be set up like miniature OS's with protected >>>>> memory regions, for things like in-memory databases. >>>>> >>>> >>>>It is a /long/ time since I have read details of the 386 - you are >>>>talking about a processor that was outdated over 25 years ago. >>>> >>>>However, if my memory and understanding is correct, many of these >>>>advanced protection features were overly complex and extremely slow. >>> >>>That's true ... validating segment descriptors and segment limits when >>>loading the segment selector took > 1000 cycles. This was so onerous >>>that the i486 and later included a small cache of validated >>>descriptors. But the cache never was large enough to help programs >>>that needed to use many segments - IIRC, it held only 6 entries - and >>>as time went on it shrank to just 2 entries. >> >> >>Descriptor loading was slow, but I'm pretty sure it wasn't *that* >>slow. It's hard to imagine what it could be doing for that long. I >>remember times more like 40 clocks on a 286. > >Protected mode segment switching on the i286 was fairly quick, but the >i386 (and later) behaved very differently. > > >Loading a segment register in protected mode could result in a long >sequence if the descriptor was not already in the descriptor cache: > > - read the descriptor from memory into the cache > - validate the descriptor contents >And on the i386 and later > - set the "Accessed" bit in the descriptor > - write back the modified descriptor to memory > >The i386 additionally performed an unnecessary limit check on the >current offset value, but did not throw any faults when the check was >done as part of the descriptor load - it just wasted additional >cycles. > >Validating the descriptor was done in microcode and could take >hundreds of cycles. The i286 did not define or check many of the >descriptor's control bits, whereas the i386 and later defined and >checked all the bits. > >The i286 did /not/ modify and write back the descriptor - the >"Accessed" bit was defined for the i286 but was not set by the >hardware [if the OS used it, it had to deal with it manually]. >The i386 and later automatically set the bit on load and wrote back >the descriptor to memory. [The i486 and later had data caches to >absorb the write, but the i386 did not.] > >The i286 and i386 had only one descriptor cache line per segment >register, so they took the full descriptor load hit every time the >register was modified. The caches became multi-way in later chips so >as to (try to) keep already-validated descriptors available in case >they were needed again. > > >I know that the slowness of protected mode segment switching has been >discussed at length in the past - if not here, then in the arch or x86 >forums. Unfortunately I can't easily locate an online reference for >segment switch times. I figured Agner Fog would have something, but >he doesn't seem to have benchmarked the system instructions. [Or if >he has, I stupidly can't seem to find his results].
The Intel 386 reference (copy available on Bitsavers) say 18 or 19 clocks for a MOV to a segment register. As usual, that would be quite optimistic and base on zero memory wait states, so would be rather longer than that. The Intel 486 manual says 9 clocks. Perhaps you're thinking of a task switch via a task gate, which could definitely be multiple hundreds of clocks.