Boxis R700 User Manual Page 51

  • Download
  • Add to my manuals
  • Print
  • Page
    / 392
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 50
ATI R700 Technology
Conditional Execution 3-15
Copyright © 2009 Advanced Micro Devices, Inc. All rights reserved.
3.6.5 Stack Allocation
Each program type has a stack for maintaining branch and other program states.
The maximum number of available stack entries is controlled by a host-written
register or by the hardware implementation of the processor. The minimum
number of stack entries required to correctly execute a program is determined by
the deepest control-flow instruction.
Each stack entry contains a number of subentries. The number of subentries per
stack entry varies, based the number of thread groups (simultaneously executing
threads on a SIMD pipeline) per program type that are supported by the target
processor. If a processor that supports 64 thread groups per program type is
configured logically to use only 48 thread groups per program type, the stack
requirement for a 64-item processor still applies. Table 3.5 shows the number of
subentries per stack entry, based on the physical thread-group width of the
processor.
The CALL*, LOOP_START*, and PUSH* instructions each consume a certain
number of stack entries or subentries. These entries are released when the
corresponding POP, LOOP_END, or RETURN instruction is executed. The additional
stack space required by each of these flow-control instructions is described in
Table 3.6.
At any point during the execution of a program, if A is the total number of full
entries in use, and B is the total number of subentries in use, then STACK_SIZE
is calculated by:
A + B / (# of subentries per entry) <= STACK_SIZE
Table 3.5 Stack Subentries
Physical Thread-Group Width of Processor
16 32 48 64
Subentries per Entry 8 8 4 4
Table 3.6 Stack Space Required for Flow-Control Instructions
Instruction
Stack Size per Physical Thread-Group
Width
Comments16 32 48 64
PUSH, PUSH_ELSE when
whole quad mode is not
set, and ALU_PUSH_BEFORE
one
subentry
one
subentry
one
subentry
one
subentry
If a PUSH instruction is invoked, two
subentries on the stack must be
reserved to hold the current active
(valid) masks.
PUSH, PUSH_ELSE when
whole quad mode is set
one entry one entry one entry one entry
LOOP_START* one entry one entry one entry one entry
CALL, CALL_FS two
subentries
one
subentry
one
subentry
one
subentry
A 16-bit-wide processor needs two
subentries because the program
counter has more than 16 bits.
Page view 50
1 2 ... 46 47 48 49 50 51 52 53 54 55 56 ... 391 392

Comments to this Manuals

No comments