Single instruction, multiple threads

Hub AI

Single instruction, multiple threads AI simulator

(@Single instruction, multiple threads_simulator)

Hub AI

Single instruction, multiple threads AI simulator

(@Single instruction, multiple threads_simulator)

Wikipedia

Grokipedia

Single instruction, multiple threads

Single instruction, multiple threads (SIMT) is an execution model used in parallel computing where a single central "control unit" broadcasts an instruction to multiple "processing units" for them to all optionally perform simultaneous synchronous and fully-independent parallel execution of that one instruction. Each PU has its own independent data and address registers, its own independent memory, but no PU in the array has a program counter. In Flynn's 1972 taxonomy this arrangement is a variation of SIMD termed an array processor.

The SIMT execution model has been implemented on several GPUs and is relevant for general-purpose computing on graphics processing units (GPGPU), e.g. some supercomputers combine CPUs with GPUs: in the ILLIAC IV that CPU was a Burroughs B6500.

The SIMT execution model is still only a way to present to the programmer what is fundamentally still a predicated SIMD concept. Programs must be designed with predicated SIMD in mind. With instruction issue (as a synchronous broadcast) being handled by the single control unit, SIMT cannot by design allow threads (PEs, lanes) to diverge by branching, because only the control unit has a program counter. If possible, therefore, branching is to be avoided.

The simplest way to understand SIMT is to imagine a multi-core (MIMD) system, where each core has its own register file, its own ALUs (both SIMD and Scalar) and its own data cache, but that unlike a standard multi-core system which has multiple independent instruction caches and decoders, as well as multiple independent Program Counter registers, the instructions are synchronously broadcast to all SIMT cores from a single unit with a single instruction cache and a single instruction decoder which reads instructions using a single Program Counter.

The key difference between SIMT and SIMD lanes is that each of the Processing Units in the SIMT Array have their own local memory, and may have a completely different Stack Pointer (and thus perform computations on completely different data sets), whereas the ALUs in SIMD lanes know nothing about memory per se, and have no register file. This is illustrated by the ILLIAC IV. Each SIMT core was termed a processing element (PE), and each PE had its own separate Memory (PEM). Each PE had an "Index register" which was an address into its PEM. In the ILLIAC IV the Burroughs B6500 primarily handled I/O, but also sent instructions to the Control Unit (CU), which would then handle the broadcasting to the PEs. Additionally, the B6500, in its role as an I/O processor, had access to all PEMs.

Additionally, each PE may be made active or inactive. If a given PE is inactive it will not execute the instruction broadcast to it by the Control Unit: instead it will sit idle until activated. Each PE can be said to be Predicated.

Also important to note is the difference between SIMT and SPMD - Single Program Multiple Data. SPMD, like standard multi-core systems, has multiple Program Counters, where SIMT only has one: in the (one) Control Unit.

In Flynn's taxonomy, Flynn's original papers cite two historic examples of SIMT processors termed "Array Processors": the SOLOMON and ILLIAC IV. SIMT was introduced by NVIDIA in the Tesla GPU microarchitecture with the G80 chip. ATI Technologies, now AMD, released a competing product slightly later on May 14, 2007, the TeraScale 1-based "R600" GPU chip.

See all

Wikipedia

Grokipedia

Wikipedia

Grokipedia

Single instruction, multiple threads

Single instruction, multiple threads (SIMT) is an execution model used in parallel computing where a single central "control unit" broadcasts an instruction to multiple "processing units" for them to all optionally perform simultaneous synchronous and fully-independent parallel execution of that one instruction. Each PU has its own independent data and address registers, its own independent memory, but no PU in the array has a program counter. In Flynn's 1972 taxonomy this arrangement is a variation of SIMD termed an array processor.

The SIMT execution model has been implemented on several GPUs and is relevant for general-purpose computing on graphics processing units (GPGPU), e.g. some supercomputers combine CPUs with GPUs: in the ILLIAC IV that CPU was a Burroughs B6500.

The SIMT execution model is still only a way to present to the programmer what is fundamentally still a predicated SIMD concept. Programs must be designed with predicated SIMD in mind. With instruction issue (as a synchronous broadcast) being handled by the single control unit, SIMT cannot by design allow threads (PEs, lanes) to diverge by branching, because only the control unit has a program counter. If possible, therefore, branching is to be avoided.

The simplest way to understand SIMT is to imagine a multi-core (MIMD) system, where each core has its own register file, its own ALUs (both SIMD and Scalar) and its own data cache, but that unlike a standard multi-core system which has multiple independent instruction caches and decoders, as well as multiple independent Program Counter registers, the instructions are synchronously broadcast to all SIMT cores from a single unit with a single instruction cache and a single instruction decoder which reads instructions using a single Program Counter.

The key difference between SIMT and SIMD lanes is that each of the Processing Units in the SIMT Array have their own local memory, and may have a completely different Stack Pointer (and thus perform computations on completely different data sets), whereas the ALUs in SIMD lanes know nothing about memory per se, and have no register file. This is illustrated by the ILLIAC IV. Each SIMT core was termed a processing element (PE), and each PE had its own separate Memory (PEM). Each PE had an "Index register" which was an address into its PEM. In the ILLIAC IV the Burroughs B6500 primarily handled I/O, but also sent instructions to the Control Unit (CU), which would then handle the broadcasting to the PEs. Additionally, the B6500, in its role as an I/O processor, had access to all PEMs.

Additionally, each PE may be made active or inactive. If a given PE is inactive it will not execute the instruction broadcast to it by the Control Unit: instead it will sit idle until activated. Each PE can be said to be Predicated.

Also important to note is the difference between SIMT and SPMD - Single Program Multiple Data. SPMD, like standard multi-core systems, has multiple Program Counters, where SIMT only has one: in the (one) Control Unit.

In Flynn's taxonomy, Flynn's original papers cite two historic examples of SIMT processors termed "Array Processors": the SOLOMON and ILLIAC IV. SIMT was introduced by NVIDIA in the Tesla GPU microarchitecture with the G80 chip. ATI Technologies, now AMD, released a competing product slightly later on May 14, 2007, the TeraScale 1-based "R600" GPU chip.

See all

Recent media

execution model used in parallel computing

Knowledge Base

Talk Channels

Special Pages

Single instruction, multiple threads

Recent from talks

Recent from talks

Contribute something to knowledge base

Subscribers

Supporters

Contributors

Moderators

Hub AI

Hub AI

Hub AI

Single instruction, multiple threads

Single instruction, multiple threads

Recent media

Recent media

Recent media

History

Media collections

Single instruction, multiple threads

Recent from talks

Recent from talks

Contribute something to knowledge base

Subscribers

Supporters

Contributors

Moderators

Hub AI

Hub AI

Hub AI

Single instruction, multiple threads

Single instruction, multiple threads

Recent media

Recent media

Recent media