Direction of Stack Growth| page 12

Reply by glen herrmannsfeldt ●October 23, 20072007-10-23

Jerry Avins wrote:

> karthikbalaguru wrote:

>> Reading through this and thinking about heap mechanism. I got some
>> query.  Why do we need the stack as the heap appears flexible ?

> Manipulating a heap requires software. Some stacks are implemented 
> entirely in hardware (PUSH, POP; CALL, RETURN). It would be very hard to 
> create or maintain a heap on a machine with no stack.

IBM S/360 and S/370 don't have a stack.  CALL and RETURN are done
through a linked list that also includes space to save the registers.
Called routines are expected to save and restore most of the
general registers.   The calling convention used by OS/360 and
many of the supported high-level languages has the calling routine
supply a 72 byte save area addressed in R13.  R14 is the return
address, and R15 is the address of the routine being called.
The called routine then saves registers 14 through 12 in the save
area addressed by R13, arranges its own save area as the next
element in a linked list.  Traditionally many assembly and Fortran
programs used a static save area (no recursion).  Languages that
allow recursion dynamically allocate a save area along with space
for local variables.

One inconvenience of a stack is that someone has to create the
stack, and so that routine has to be different.  For OS/360,
the calling convention described is similar to that used
by the OS to call your program.  It is then more convenient
than other systems to load a program into memory and call it
without a question of who owns the stack.

For small allocations, though, the stack likely has less overhead.

-- glen

Reply by Anne & Lynn Wheeler ●October 23, 20072007-10-23

glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
> IBM S/360 and S/370 don't have a stack.  CALL and RETURN are done
> through a linked list that also includes space to save the registers.
> Called routines are expected to save and restore most of the general
> registers.  The calling convention used by OS/360 and many of the
> supported high-level languages has the calling routine supply a 72
> byte save area addressed in R13.  R14 is the return address, and R15
> is the address of the routine being called.  The called routine then
> saves registers 14 through 12 in the save area addressed by R13,
> arranges its own save area as the next element in a linked list.
> Traditionally many assembly and Fortran programs used a static save
> area (no recursion).  Languages that allow recursion dynamically
> allocate a save area along with space for local variables.

q&d conversion of old greencard ios3270 to html
http://www.garlic.com/~lynn/gcard.html

call/save/return conventions
http://www.garlic.com/~lynn/gcard.html#50

Reply by robe...@yahoo.com ●October 23, 20072007-10-23

On Oct 23, 12:10 pm, karthikbalaguru <karthikbalagur...@gmail.com>
wrote:
> On Oct 23, 3:05 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>
> > Nick Maclaren wrote:
>
> > (snip)
>
> > > Witness that System/360 stacks were normally upwards growing in the
> > > 1960s and 1970s, and changed as the new generation of people with a
> > > DEC background moved in during the 1980s.
>
> > The S/360 that I used, running OS/360 and descendants, used
> > linked lists for subroutine calling, register saving, and
> > argument lists.
>
> I am eager to understand more about this.
> Linked lists for subroutine calling.
> Is it a good method or a bad method of design ?
> Where can i get more information about this ?

The standard S/360 calling convention used a liked list of register
save areas.  Mind you that this was designed before languages with
small subroutines and default external linkage became popular, so this
was really intended for a fairly heavyweight program to program call,
and is not a particularly good design by today's standards, and the
convention was almost never followed by compilers (or assembler
programmers) within a program, just at the external calls and entries.

In short, ever program was responsible for allocating a 18 word save
area for the callee to use.  During a call, R1 pointed to a list of
pointers, one for each parameter (everything was pass by reference),
with the high bit set on the last pointer.  Optionally, R1 might point
directly to a single parameter, but this was rare (usually it was just
R1 pointing to a single pointer).

R13 pointed to the register save area from the caller (R15 was the
called address, R14 the return).  The called program would save
registers in the passed save are, typically with a "STM
R14,R12,12(R13)", which would store R14, R15, R0..R13 staring in the
fourth word of the save area.  The called program would then allocate
a new save area, and store the address of the prior save area in the
second word of the new area.  That second word is how you did the
"pop" to get back to the caller.  The other two words in the save area
had uses in some cases, but in most programs were ignored.

See chapter 2 of:

http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/IEA2A670/CCONTENTS?SHELF=IEA2BK80&DN=SA22-7605-08&DT=20070426031033

If you wanted to do a linked list of activation records today, you'd
do it a bit differently, and you'd almost certainly have something
that looked a lot like a conventional stack allocated activation
record, just allocated from the heap.  Perhaps something like:

    +0 : back link
    +4 : return address
    +8 : local variable area for callee
    +n : parameters (after local area)

In that scheme, the called routine needs to publish not just its entry
point, but the required size of its local area.

This obviously has some overhead, namely the allocation of the
activation record from the heap rather than the trivial allocation
done by adding (or subtracting) to the stack pointer.  OTOH, some of
that can be optimized fairly easily.  For example, if you know , or
can limit, the size of the activation record(s) needed by a second
level callee, the caller can just preallocate it inside the first
callee's activation record.  Obviously things like recursive routines
need to not do that sort of thing.  Pool allocators might also work
well in many cases, especially since most activation records are
fairly small.

An interesting benefit of the linked list approach is that it does not
have the problem of consuming large quantities of address space with
unused stack, which eliminates one of the major problems with the
thread-per-connection approach to servers (there are others, but
that's a biggie).

> > I haven't looked at what Linux/390 does, though.
>
> What does linux follow/390 do ?
>
> Where can i get more information about this ?

On zLinux (64 bit) or Linux/390 (31 bit), it's more like the
conventional C-like calling conventions you see on other platforms,
although a little weird.

There's a downward growing stack, pointed to by R15, and a defined
stack frame format (basically there's a back chain, some preallocated
register save areas in which the callee can save callee save
registers, if necessary), and then the usual local and parameter
areas.  R14 is the return address (and is one of the callee regs).
The first five integer parameters are passed in R2..R6, the fisrt two
FP parms in F0 and F2 (for zLinux it's four FP parms in F0, F2, F4,
F6), return in R2 or F0.  With some fairly conventional special
handling for structures, and whatnot.  There are some other minor
differences between the 31 and 64 bit platforms (for example, 64 bit
parameters are passed in single registers in zLinux, in register pairs
in Linux/390).

Starting at about page nine of:

http://www.tachyonsoft.com/s8139db.pdf

Reply by glen herrmannsfeldt ●October 23, 20072007-10-23

robertwessel2@yahoo.com wrote:

(I wrote)

>>Note that IBM was consistently big-endian for S/370.
>>Though there are no bit addressing instructions, all the documentation
>>numbers bits with the least significant bit as zero.

>>It does get interesting with the extension to 64 bit z/architecture,
>>now the bits of a 32 bit register are numbered 32 to 63.
>>(That is, 32 bit instructions use bits 32 to 63 of the
>>64 bit registers defined by the architecture.)

> Your second statement is correct, I assume the first is a typo.  In
> IBM S/360 (and many other IBM designed ISAs, like PPC), the *most*
> significant bit of a register or word is zero.

Sometimes different words come out on the screen than
the ones you think you are typing.

Yes, I meant to say most significant bit is bit zero.
(Otherwise it would not have been consistent.)

Consistency seems worthwhile, otherwise you might end
up like VAX with middle-endian floating point formats, carried
over from the PDP-11 and continued on (as one option) in Alpha.

-- glen

Reply by Nick Maclaren ●October 23, 20072007-10-23

In article <cOednV5P0eG-74PanZ2dnUVZ_s6mnZ2d@comcast.com>,
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
|> Jerry Avins wrote:
|> > karthikbalaguru wrote:
|> 
|> >> Reading through this and thinking about heap mechanism. I got some
|> >> query.  Why do we need the stack as the heap appears flexible ?
|> 
|> > Manipulating a heap requires software. Some stacks are implemented 
|> > entirely in hardware (PUSH, POP; CALL, RETURN). It would be very hard to 
|> > create or maintain a heap on a machine with no stack.
|> 
|> IBM S/360 and S/370 don't have a stack.  CALL and RETURN are done
|> through a linked list that also includes space to save the registers.

Specifically, several compilers generated code that used 'GETMAIN R'
to allocate the stack frame one procedure entry.  Those compilers
used an underlying 'heap' mechanism for the call stack.

Been there - done that :-)


Regards,
Nick Maclaren.

Reply by John L ●October 23, 20072007-10-23

>Specifically, several compilers generated code that used 'GETMAIN R'
>to allocate the stack frame one procedure entry.  Those compilers
>used an underlying 'heap' mechanism for the call stack.
>
>Been there - done that :-)

Wow, was that slow.  I know that Algol F did that, at least until you
installed the patches from (as I recall) Princeton that suballocated
bigger GETMAIN areas.  What else did?

Reply by glen herrmannsfeldt ●October 23, 20072007-10-23

Nick Maclaren wrote:

> In article <cOednV5P0eG-74PanZ2dnUVZ_s6mnZ2d@comcast.com>,
> glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:

(snip)

> |> IBM S/360 and S/370 don't have a stack.  CALL and RETURN are done
> |> through a linked list that also includes space to save the registers.

> Specifically, several compilers generated code that used 'GETMAIN R'
> to allocate the stack frame one procedure entry.  Those compilers
> used an underlying 'heap' mechanism for the call stack.

It is supposed to be best to have one routine do larger GETMAIN
calls and parcel them out.  Otherwise, direct calls to GETMAIN
should work just fine.

Most of my early assembly was Fortran callable so I didn't have
to worry about such things.  (Static save areas!)

> Been there - done that :-)

-- glen

Reply by Anne & Lynn Wheeler ●October 23, 20072007-10-23

glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
> IBM S/360 and S/370 don't have a stack.  CALL and RETURN are done
> through a linked list that also includes space to save the registers.
> Called routines are expected to save and restore most of the
> general registers.   The calling convention used by OS/360 and
> many of the supported high-level languages has the calling routine
> supply a 72 byte save area addressed in R13.  R14 is the return
> address, and R15 is the address of the routine being called.
> The called routine then saves registers 14 through 12 in the save
> area addressed by R13, arranges its own save area as the next
> element in a linked list.  Traditionally many assembly and Fortran
> programs used a static save area (no recursion).  Languages that
> allow recursion dynamically allocate a save area along with space
> for local variables.

re:
http://www.garlic.com/~lynn/2007q.html#65 Direction of Stack Growth

in the os/360 convention called routine only needs its own savearea if
it is planning on calling other routine(s). a routine that doesn't make
any calls ... won't need its own savearea. a routine is only
allocating/chaining savearea (on entry), when it is calling other
routine(s) (for their use).

original/early cp67 kernel allocated 100 (contiguous) "saveareas" at
boot initialized/managed as push/pop list. all internal kernel
call/returns were done via svc (supervisor call) interrupt which would
allocate/deallocate a savearea (from push/pop list). the called routine
was still responsible for "saving" registers in the (recently) allocated
savearea. when the 100 "savearea" were exhausted ... system failed.  on
entry, the pointer to the "active" savearea was in register 13 ... as
per os/360 convention.

early on (as undergraduate) in the 60s, i made a number of cp67 kernel
modifications:

several "called" kernel routines were "closed" ... i.e. they didn't
call to any other routines. for these i changed the calling sequence
from a supervisor call interrupt to (more familiar) BALR ... and these
called routines, saved (calling routines) registers (on entry) in a
single, kernel-wide savearea (actually in "page zero" ... so it
was a per processor, kernel-wide savearea ... one for every processor
in a multiprocessor configuration).

if the initial 100 saveareas were exhausted (actually any time available
saveareas were exhausted) ... it would savange some storage and
replenish the list with another block of saveareas.

....

the change to dynamicly extend kernel saveareas then complicated
standard cp67 kernel failure analysis ...  since it had been possible to
examine all extent active and pending process indications by examining
the fixed (location) preallocated, contiguous 100 saveareas.

these eventually shipped in standard cp67 product. 

also as undergraduate i did an enhancement that moved some number of
kernel routines out of fixed kernel storage and made them pageable. this
required a little slight of hand ... while cp67 supported virtual memory
and ran all of its virtual machines in virtual address spaces ... the
cp67 kernel ran in "real addressing" (non virtual addressing
mode). while the "pageable" kernel routines were fetched into memory and
removed from memory via standard paging facilities ... the routines ran
in "real addressing" mode w/o address translation turned on and couldn't
depend on virtual address translation and/or hardware page faults. this
didn't ship in standard cp67 product ... but was incorporated into the
product as part of the morph from cp67 to vm370.

for a little topic drift ... some recent posts about the "new", 40+ yr
old virtual machine technology
http://www.garlic.com/~lynn/2007b.html#23 How many 36-bit Unix ports in the old days?
http://www.garlic.com/~lynn/2007b.html#26 How many 36-bit Unix ports in the old days?
http://www.garlic.com/~lynn/2007l.html#23 John W. Backus, 82, Fortran developer, dies
http://www.garlic.com/~lynn/2007p.html#7 what does xp do when system is copying
http://www.garlic.com/~lynn/2007q.html#3 Virtualization: Don't Ask, Don't Tell
http://www.garlic.com/~lynn/2007q.html#22 Enterprise: Accelerating the Progress of Linux
http://www.garlic.com/~lynn/2007q.html#25 VMware: New King Of The Data Center?
http://www.garlic.com/~lynn/2007q.html#49 Slimmed Down Windows Offers Glimpse Into Microsoft's Virtualization Ambitions
http://www.garlic.com/~lynn/2007q.html#59 Virtualization: Everybody's Doing It, but Few Know How
http://www.garlic.com/~lynn/2007q.html#64 Virtual Browsers: Disposable Security

Reply by Dennis Ritchie ●October 23, 20072007-10-23

"Everett M. Greene" <mojaveg@mojaveg.lsan.mdsg-pacwest.com> wrote in message 
news:20071023.79D6AD0.9422@mojaveg.lsan.mdsg-pacwest.com...

> I'd forgotten about those machines [IBM 704 etc] when I said something 
> about
> some processors only being able to use positive index offsets.
> The negative indexing was the origin of the BES symbol?

Yes, it was.

   Dennis

Reply by ●October 24, 20072007-10-24

johnl@iecc.com (John L) writes:

> The 704 had three index registers numbered 1, 2, and 4.  If you
>specified a register number with more than one bit set, it or'ed them
> before subtracting.  Cool though this feature was, in later
>computers they came to their senses and provided 7 separate
>registers.

Yes, but even on the 7094 mod II one could execute the instruction
"Enter Multiple Tag Mode" which would carefully replicate this bizarre
behavior for compatability.

I don't think this was a matter of coming to their senses -- it was
simply far too expensive to build 7 registers at the time of the 704.
The ORing behavior was also a direct result of the implementation of
the registers, in which the gating of the data onto a common bus with
an OR rewrote the flip flops of the register.  It was not an
intentional design "feature".  This is similar to the PDP-4 and PDP-7
opcode which shifted both directions at once, resulting in the OR of
the two shifted results, for similar accidental implementation reasons.

Previous 10 111213 14 15 Next

Direction of Stack Growth

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group