Boxis R700 User Manual Page 35

  • Download
  • Add to my manuals
  • Print
  • Page
    / 392
  • Table of contents
  • BOOKMARKS
  • Rated. / 5. Based on customer reviews
Page view 34
ATI R700 Technology
Data Sharing 2-17
Copyright © 2009 Advanced Micro Devices, Inc. All rights reserved.
bank_offset = (thread_id >> 2) * dst_stride + dst_index
thread_id - SIMD_WAVE_REL mode controls:
0 : absolute – relative to threads within a wavefront.
1 : relative to the thread at the beginning of each group.
dst_stride - Destination stride from instruction for write to shared memory, in
dwords. Legal values: 4,8,12,16,...64.
dst_index - Destination index from instruction for write to shared memory in
dwords. Legal values: 4,8,12,16,...60.
The shader program provides addresses for reads, with a bank id and dword
bank offset. A maximum of four threads access the shared memory per clock
cycle in one of three addressing modes, based on mux_cntl field of
MEM_DSR_WORD1 (see the microcode description for MEM_DSR_WORD1, on
page 10-48).
DSR_MUX_NONE enables each thread to address any aligned four-dword
entry of the shared memory.
DSR_MUX_FFT_PERMUTE selection enables a 16 bank dword butterfly
read across the 512 bit output of the shared memory.
DSR_MUX_WORD_SELECT enables any single dword read of the shared
memory.
If the compiler cannot determine that a read of four threads does not contain
bank conflicts, the instruction is repeated four times, with one of four successive
threads enabled on each pass to prevent conflicts. These instruction iterations
are forced if the waterfall bit is set in the instruction.
A special broadcast read mode can be enabled to do a fast read of one to four
dwords that can be returned either to all threads in a wavefront or to shared
registers. The broadcast read returns data to all GPRs within the wavefront in
four clocks cycles. This mode writes all threads, regardless of active or predicate
masks; the address to be read must be stored in the src GPR of the first thread
of the wavefront.
In SIMD_WAVE_REL absolute addressing mode, a clause can be used with one
set of writes, followed by a set of reads, without using any barrier synchronization
mechanism, as long as the exchanger operations are in one clause. The relative
modes require barrier synchronization.
Page view 34
1 2 ... 30 31 32 33 34 35 36 37 38 39 40 ... 391 392

Comments to this Manuals

No comments