EmbeddedRelated.com
Forums

Direction of Stack Growth

Started by karthikbalaguru October 21, 2007
Jerry Avins wrote:

> karthikbalaguru wrote:
>> Reading through this and thinking about heap mechanism. I got some >> query. Why do we need the stack as the heap appears flexible ?
> Manipulating a heap requires software. Some stacks are implemented > entirely in hardware (PUSH, POP; CALL, RETURN). It would be very hard to > create or maintain a heap on a machine with no stack.
IBM S/360 and S/370 don't have a stack. CALL and RETURN are done through a linked list that also includes space to save the registers. Called routines are expected to save and restore most of the general registers. The calling convention used by OS/360 and many of the supported high-level languages has the calling routine supply a 72 byte save area addressed in R13. R14 is the return address, and R15 is the address of the routine being called. The called routine then saves registers 14 through 12 in the save area addressed by R13, arranges its own save area as the next element in a linked list. Traditionally many assembly and Fortran programs used a static save area (no recursion). Languages that allow recursion dynamically allocate a save area along with space for local variables. One inconvenience of a stack is that someone has to create the stack, and so that routine has to be different. For OS/360, the calling convention described is similar to that used by the OS to call your program. It is then more convenient than other systems to load a program into memory and call it without a question of who owns the stack. For small allocations, though, the stack likely has less overhead. -- glen
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
> IBM S/360 and S/370 don't have a stack. CALL and RETURN are done > through a linked list that also includes space to save the registers. > Called routines are expected to save and restore most of the general > registers. The calling convention used by OS/360 and many of the > supported high-level languages has the calling routine supply a 72 > byte save area addressed in R13. R14 is the return address, and R15 > is the address of the routine being called. The called routine then > saves registers 14 through 12 in the save area addressed by R13, > arranges its own save area as the next element in a linked list. > Traditionally many assembly and Fortran programs used a static save > area (no recursion). Languages that allow recursion dynamically > allocate a save area along with space for local variables.
q&d conversion of old greencard ios3270 to html http://www.garlic.com/~lynn/gcard.html call/save/return conventions http://www.garlic.com/~lynn/gcard.html#50
On Oct 23, 12:10 pm, karthikbalaguru <karthikbalagur...@gmail.com>
wrote:
> On Oct 23, 3:05 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote: > > > Nick Maclaren wrote: > > > (snip) > > > > Witness that System/360 stacks were normally upwards growing in the > > > 1960s and 1970s, and changed as the new generation of people with a > > > DEC background moved in during the 1980s. > > > The S/360 that I used, running OS/360 and descendants, used > > linked lists for subroutine calling, register saving, and > > argument lists. > > I am eager to understand more about this. > Linked lists for subroutine calling. > Is it a good method or a bad method of design ? > Where can i get more information about this ?
The standard S/360 calling convention used a liked list of register save areas. Mind you that this was designed before languages with small subroutines and default external linkage became popular, so this was really intended for a fairly heavyweight program to program call, and is not a particularly good design by today's standards, and the convention was almost never followed by compilers (or assembler programmers) within a program, just at the external calls and entries. In short, ever program was responsible for allocating a 18 word save area for the callee to use. During a call, R1 pointed to a list of pointers, one for each parameter (everything was pass by reference), with the high bit set on the last pointer. Optionally, R1 might point directly to a single parameter, but this was rare (usually it was just R1 pointing to a single pointer). R13 pointed to the register save area from the caller (R15 was the called address, R14 the return). The called program would save registers in the passed save are, typically with a "STM R14,R12,12(R13)", which would store R14, R15, R0..R13 staring in the fourth word of the save area. The called program would then allocate a new save area, and store the address of the prior save area in the second word of the new area. That second word is how you did the "pop" to get back to the caller. The other two words in the save area had uses in some cases, but in most programs were ignored. See chapter 2 of: http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/IEA2A670/CCONTENTS?SHELF=IEA2BK80&DN=SA22-7605-08&DT=20070426031033 If you wanted to do a linked list of activation records today, you'd do it a bit differently, and you'd almost certainly have something that looked a lot like a conventional stack allocated activation record, just allocated from the heap. Perhaps something like: +0 : back link +4 : return address +8 : local variable area for callee +n : parameters (after local area) In that scheme, the called routine needs to publish not just its entry point, but the required size of its local area. This obviously has some overhead, namely the allocation of the activation record from the heap rather than the trivial allocation done by adding (or subtracting) to the stack pointer. OTOH, some of that can be optimized fairly easily. For example, if you know , or can limit, the size of the activation record(s) needed by a second level callee, the caller can just preallocate it inside the first callee's activation record. Obviously things like recursive routines need to not do that sort of thing. Pool allocators might also work well in many cases, especially since most activation records are fairly small. An interesting benefit of the linked list approach is that it does not have the problem of consuming large quantities of address space with unused stack, which eliminates one of the major problems with the thread-per-connection approach to servers (there are others, but that's a biggie).
> > I haven't looked at what Linux/390 does, though. > > What does linux follow/390 do ? > > Where can i get more information about this ?
On zLinux (64 bit) or Linux/390 (31 bit), it's more like the conventional C-like calling conventions you see on other platforms, although a little weird. There's a downward growing stack, pointed to by R15, and a defined stack frame format (basically there's a back chain, some preallocated register save areas in which the callee can save callee save registers, if necessary), and then the usual local and parameter areas. R14 is the return address (and is one of the callee regs). The first five integer parameters are passed in R2..R6, the fisrt two FP parms in F0 and F2 (for zLinux it's four FP parms in F0, F2, F4, F6), return in R2 or F0. With some fairly conventional special handling for structures, and whatnot. There are some other minor differences between the 31 and 64 bit platforms (for example, 64 bit parameters are passed in single registers in zLinux, in register pairs in Linux/390). Starting at about page nine of: http://www.tachyonsoft.com/s8139db.pdf
robertwessel2@yahoo.com wrote:

(I wrote)

>>Note that IBM was consistently big-endian for S/370. >>Though there are no bit addressing instructions, all the documentation >>numbers bits with the least significant bit as zero.
>>It does get interesting with the extension to 64 bit z/architecture, >>now the bits of a 32 bit register are numbered 32 to 63. >>(That is, 32 bit instructions use bits 32 to 63 of the >>64 bit registers defined by the architecture.)
> Your second statement is correct, I assume the first is a typo. In > IBM S/360 (and many other IBM designed ISAs, like PPC), the *most* > significant bit of a register or word is zero.
Sometimes different words come out on the screen than the ones you think you are typing. Yes, I meant to say most significant bit is bit zero. (Otherwise it would not have been consistent.) Consistency seems worthwhile, otherwise you might end up like VAX with middle-endian floating point formats, carried over from the PDP-11 and continued on (as one option) in Alpha. -- glen
In article <cOednV5P0eG-74PanZ2dnUVZ_s6mnZ2d@comcast.com>,
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
|> Jerry Avins wrote:
|> > karthikbalaguru wrote:
|> 
|> >> Reading through this and thinking about heap mechanism. I got some
|> >> query.  Why do we need the stack as the heap appears flexible ?
|> 
|> > Manipulating a heap requires software. Some stacks are implemented 
|> > entirely in hardware (PUSH, POP; CALL, RETURN). It would be very hard to 
|> > create or maintain a heap on a machine with no stack.
|> 
|> IBM S/360 and S/370 don't have a stack.  CALL and RETURN are done
|> through a linked list that also includes space to save the registers.

Specifically, several compilers generated code that used 'GETMAIN R'
to allocate the stack frame one procedure entry.  Those compilers
used an underlying 'heap' mechanism for the call stack.

Been there - done that :-)


Regards,
Nick Maclaren.
>Specifically, several compilers generated code that used 'GETMAIN R' >to allocate the stack frame one procedure entry. Those compilers >used an underlying 'heap' mechanism for the call stack. > >Been there - done that :-)
Wow, was that slow. I know that Algol F did that, at least until you installed the patches from (as I recall) Princeton that suballocated bigger GETMAIN areas. What else did?
Nick Maclaren wrote:

> In article <cOednV5P0eG-74PanZ2dnUVZ_s6mnZ2d@comcast.com>, > glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
(snip)
> |> IBM S/360 and S/370 don't have a stack. CALL and RETURN are done > |> through a linked list that also includes space to save the registers.
> Specifically, several compilers generated code that used 'GETMAIN R' > to allocate the stack frame one procedure entry. Those compilers > used an underlying 'heap' mechanism for the call stack.
It is supposed to be best to have one routine do larger GETMAIN calls and parcel them out. Otherwise, direct calls to GETMAIN should work just fine. Most of my early assembly was Fortran callable so I didn't have to worry about such things. (Static save areas!)
> Been there - done that :-)
-- glen
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
> IBM S/360 and S/370 don't have a stack. CALL and RETURN are done > through a linked list that also includes space to save the registers. > Called routines are expected to save and restore most of the > general registers. The calling convention used by OS/360 and > many of the supported high-level languages has the calling routine > supply a 72 byte save area addressed in R13. R14 is the return > address, and R15 is the address of the routine being called. > The called routine then saves registers 14 through 12 in the save > area addressed by R13, arranges its own save area as the next > element in a linked list. Traditionally many assembly and Fortran > programs used a static save area (no recursion). Languages that > allow recursion dynamically allocate a save area along with space > for local variables.
re: http://www.garlic.com/~lynn/2007q.html#65 Direction of Stack Growth in the os/360 convention called routine only needs its own savearea if it is planning on calling other routine(s). a routine that doesn't make any calls ... won't need its own savearea. a routine is only allocating/chaining savearea (on entry), when it is calling other routine(s) (for their use). original/early cp67 kernel allocated 100 (contiguous) "saveareas" at boot initialized/managed as push/pop list. all internal kernel call/returns were done via svc (supervisor call) interrupt which would allocate/deallocate a savearea (from push/pop list). the called routine was still responsible for "saving" registers in the (recently) allocated savearea. when the 100 "savearea" were exhausted ... system failed. on entry, the pointer to the "active" savearea was in register 13 ... as per os/360 convention. early on (as undergraduate) in the 60s, i made a number of cp67 kernel modifications: several "called" kernel routines were "closed" ... i.e. they didn't call to any other routines. for these i changed the calling sequence from a supervisor call interrupt to (more familiar) BALR ... and these called routines, saved (calling routines) registers (on entry) in a single, kernel-wide savearea (actually in "page zero" ... so it was a per processor, kernel-wide savearea ... one for every processor in a multiprocessor configuration). if the initial 100 saveareas were exhausted (actually any time available saveareas were exhausted) ... it would savange some storage and replenish the list with another block of saveareas. .... the change to dynamicly extend kernel saveareas then complicated standard cp67 kernel failure analysis ... since it had been possible to examine all extent active and pending process indications by examining the fixed (location) preallocated, contiguous 100 saveareas. these eventually shipped in standard cp67 product. also as undergraduate i did an enhancement that moved some number of kernel routines out of fixed kernel storage and made them pageable. this required a little slight of hand ... while cp67 supported virtual memory and ran all of its virtual machines in virtual address spaces ... the cp67 kernel ran in "real addressing" (non virtual addressing mode). while the "pageable" kernel routines were fetched into memory and removed from memory via standard paging facilities ... the routines ran in "real addressing" mode w/o address translation turned on and couldn't depend on virtual address translation and/or hardware page faults. this didn't ship in standard cp67 product ... but was incorporated into the product as part of the morph from cp67 to vm370. for a little topic drift ... some recent posts about the "new", 40+ yr old virtual machine technology http://www.garlic.com/~lynn/2007b.html#23 How many 36-bit Unix ports in the old days? http://www.garlic.com/~lynn/2007b.html#26 How many 36-bit Unix ports in the old days? http://www.garlic.com/~lynn/2007l.html#23 John W. Backus, 82, Fortran developer, dies http://www.garlic.com/~lynn/2007p.html#7 what does xp do when system is copying http://www.garlic.com/~lynn/2007q.html#3 Virtualization: Don't Ask, Don't Tell http://www.garlic.com/~lynn/2007q.html#22 Enterprise: Accelerating the Progress of Linux http://www.garlic.com/~lynn/2007q.html#25 VMware: New King Of The Data Center? http://www.garlic.com/~lynn/2007q.html#49 Slimmed Down Windows Offers Glimpse Into Microsoft's Virtualization Ambitions http://www.garlic.com/~lynn/2007q.html#59 Virtualization: Everybody's Doing It, but Few Know How http://www.garlic.com/~lynn/2007q.html#64 Virtual Browsers: Disposable Security
"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
news:20071023.79D6AD0.9422@mojaveg.lsan.mdsg-pacwest.com...

> I'd forgotten about those machines [IBM 704 etc] when I said something > about > some processors only being able to use positive index offsets. > The negative indexing was the origin of the BES symbol?
Yes, it was. Dennis
johnl@iecc.com (John L) writes:

> The 704 had three index registers numbered 1, 2, and 4. If you >specified a register number with more than one bit set, it or'ed them > before subtracting. Cool though this feature was, in later >computers they came to their senses and provided 7 separate >registers.
Yes, but even on the 7094 mod II one could execute the instruction "Enter Multiple Tag Mode" which would carefully replicate this bizarre behavior for compatability. I don't think this was a matter of coming to their senses -- it was simply far too expensive to build 7 registers at the time of the 704. The ORing behavior was also a direct result of the implementation of the registers, in which the gating of the data onto a common bus with an OR rewrote the flip flops of the register. It was not an intentional design "feature". This is similar to the PDP-4 and PDP-7 opcode which shifted both directions at once, resulting in the OR of the two shifted results, for similar accidental implementation reasons.