Boxis R700 User Manual Page 32

  • Download
  • Add to my manuals
  • Print
  • Page
    / 392
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 31
ATI R700 Technology
2-14 Data Sharing
Copyright © 2009 Advanced Micro Devices, Inc. All rights reserved.
this pool, a state register must be set up defining the number of registers
reserved for global usage.
The global GPRs are accessed through an index_mode (simd-global) in the
ALU instruction word. This new mode interprets the src or dest GPR address
as an absolute address in the range 0 to 127. This index mode works in
conjunction with the src-rel/dest-rel fields, allowing the instruction to mix
global and wavefront-local GPRs.
Additional index modes allow indexed addressing, where the address = GPR +
offset_from_instruction or INDEX_GLOBAL_AR_X (AR.X only; see Section 4.6.1,
“Relative Addressing,” page 4-6, as well as the opcode description for
ALU_WORD0, page 10-16). This allows inter-thread communication and kernel-
based addressing. (This requires using a MOVA* instruction to copy the index to
the AR.X register.)
This pool of global GPRs can be used to provide many powerful features,
including:
Atomic reduction variables per lane (the number depends on the number of
GPRs), such as:
max, min, small histogram per lane,
software-based barriers or synchronization primitives.
A set of constants that is unique per lane. This prevents:
the overhead of repeated fetches, and
divergent thread execution due to constant look-up.
2.6.1.2 Clause Temporary GPR Pool
The GPR pool can include partitions that hold clause temporary (temp) GPRs.
Clause temp GPRs prevent stalling and enable peak performance because they
are stored in two sections, one for the odd, the other for the even wavefront (see
Figure 2.2). Because there are two unique sections set aside for each wavefront
executing on the SIMD, there is no conflict between reads and writes of clause
temps between the even and odd wavefronts. When using global shared
registers, both wavefronts map the registers into the same locations in memory,
which can cause a conflict and a stall. This is because it takes a full instruction
for the write to be visible; thus, if there are a read and a write happening on the
same instruction group but from different wavefronts, there is a read/write conflict
that the hardware resolves by stalling one of the wavefronts until the write is
visible to the read.
The clause temp GPRs are accessed using the top GPR address locations. For
example, if four clause temp register are enabled using 124, 125, 126, and 127,
the address selects clause temp registers 0, 1, 2, and 3, respectively.
Clause temp registers can provide atomic (locked, uninterruptable) reduction per
lane to enable higher performance between all threads in a lane of a SIMD for
the wavefronts that execute on the even or odd instruction slot.
Page view 31
1 2 ... 27 28 29 30 31 32 33 34 35 36 37 ... 391 392

Comments to this Manuals

No comments