Bob
Have a question related to this hub?
Alice
Got something to say related to this hub?
Share it here.
Bitwise ternary logic instructions can logically implement all possible bitwise operations between three inputs (256 permutations). They take three registers as input and an 8-bit immediate field. Each bit in the output is generated using an 8-bit Lookup table of the three corresponding bits in the inputs to select one of the 8 positions in the 8-bit immediate. Since only 8 combinations are possible using three bits, this allow all possible 3-input bitwise operations to be performed. In mathematical terminology: each corresponding bit of the three inputs is a ternary Boolean function with a Hasse diagram of order n=8.[1] Also known as minterms.
A full table showing all 256 possible 3-operand logical bitwise instruction may be found in the Power ISA description of xxeval
.[2] An additional insight is that if the 8-bit immediate were an operand (register) then in FPGA terminology, bitwise ternary logical instructions would implement an array of Hardware LUT3s.
In pseudocode the output from three single-bit inputs is illustrated by using r2, r1 and r0 as three binary digits of a 3-bit index, to treat the 8-bit immediate as a lookup table and to simply return the indexed bit:
result := imm8(r2<<2 + r1<<1 + r0)
A readable implementation in Python of three single-bit inputs (r0 r1 and r2) is shown below:
def ternlut8(r0, r1, r2, imm8):
"""Implementation of a LUT3 (ternary lookup)"""
# index will be in range 0 to 7
lut_index = 0
# r0 sets bit0, r1 bit1, and r2 bit2
if r0: lut_index |= 1 << 0
if r1: lut_index |= 1 << 1
if r2: lut_index |= 1 << 2
# return the requested indexed bit of imm8
return imm8 & (1 << lut_index) != 0
If the input registers are 64-bit then the output is correspondingly 64-bit, and would be constructed from selecting each indexed bit of the three inputs to create the corresponding indexed bit of the output:
def ternlut8_64bit(R0, R1, R2, imm8):
"""Implementation of a 64-bit ternary lookup instruction"""
result = 0
for i in range(64):
m = 1 << i # single mask bit of inputs
r0, r1, r2 = (R0 & m), (R1 & m), (R2 & m)
result |= ternlut8(r0, r1, r2, imm8) << i
return result
An example table of just three possible permutations out of the total 256 for the 8-bit immediate is shown below - Double-AND, Double-OR and Bitwise-blend. The immediate (the 8-bit lookup table) is named imm8, below. Note that the column has the value in binary of its corresponding header: imm8:0xCA is binary 11001010 in the "Bitwise blend" column:
A0 | A1 | A2 | Double AND (imm8=0x80) |
Double OR (imm8=0xFE) |
Bitwise blend (imm8=0xCA) |
---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 1 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 1 | 1 | 0 | 1 | 1 |
1 | 0 | 0 | 0 | 1 | 0 |
1 | 0 | 1 | 0 | 1 | 0 |
1 | 1 | 0 | 0 | 1 | 1 |
1 | 1 | 1 | 1 | 1 | 1 |
The number of uses is significant: anywhere that three logical bitwise operations are used in algorithms. Carry-save, SHA-1 SHA-2, MD5, and exactly-one and exactly-two bitcounting used in Harley-Seal Popcount.[3] vpternlog
speeds up MD5 by 20%[4]
Although unusual due to the high cost in hardware this instruction is found in a number of instruction set architectures
vpternlog
[6]xxeval
.[7]vpternlog
: Tom Forsyth explains, amusingly, the Intel test engineers being happy to have one instruction to test rather than 256.[8][9][10][11]