真值表归结为三元逻辑运算,vpternlog [英] Truth-table reduction to ternary logic operations, vpternlog
问题描述
我有很多包含多个变量的真值表(7个或更多),并且使用一种工具(例如,逻辑星期五1)简化逻辑公式.我可以手工完成,但这太容易出错了.然后,我将这些公式转换为编译器固有函数(例如, _mm_xor_epi32 )可以正常工作.
I have many truth-tables of many variables (7 or more) and I use a tool (eg logic friday 1) to simplify the logic formula. I could do that by hand but that is much too error prone. These formula I then translate to compiler intrinsics (eg _mm_xor_epi32) which works fine.
问题:使用 vpternlog ,我可以进行三元逻辑运算.但是我不知道有一种方法可以将我的真值表简化为(某种程度上)有效的vpternlog指令序列.
Question: with vpternlog I can make ternary logic operations. But I'm not aware of a method to simplify my truth-tables to sequences of vpternlog instructions that are (somewhat) efficient.
我并不是要问是否有人知道一种可以简化为任意三元逻辑运算的工具,尽管那会很棒,但我正在寻找一种实现这种简化的方法.
I'm not asking if someone knows a tool that simplifies to arbitrary ternary logic operations, although that would be great, I'm looking for a method to do such simplifications.
我在电气工程.
推荐答案
如何将真值表转换为 vpternlog
指令序列.
How to translate a truth table to a sequence of vpternlog
instructions.
- 将真值表转换为逻辑公式;例如使用Logic Friday.
- 以Synopsys方程格式(.eqn)存储逻辑公式.例如,我使用了一个网络,该网络具有6个输入节点A至F,两个输出节点F0和F1,以及一个较为复杂(非统一)的布尔函数.
BF_Q6.eqn的内容:
Content of BF_Q6.eqn:
INORDER = A B C D E F;
OUTORDER = F0 F1;
F0 = (!A*!B*!C*!D*!E*F) + (!A*!B*!C*!D*E*!F) + (!A*!B*!C*D*!E*!F) + (!A*!B*C*!D*!E*!F) + (!A*B*!C*!D*!E*!F) + (A*!B*!C*!D*!E*!F);
F1 = (!A*!B*!C*!D*E) + (!A*!B*!C*D*!E) + (!A*!B*C*!D*!E) + (!A*B*!C*!D*!E) + (A*!B*!C*!D*!E);
- 使用伯克利验证和综合研究中心的"ABC:用于顺序合成和验证的系统".我用的是Windows版本.在此处获取ABC.
- Use "ABC: A System for Sequential Synthesis and Verification" from the Berkeley Verification and Synthesis Research Center. I used the windows version. Get ABC here.
在ABC中,我运行:
abc 01> read_eqn BF_Q6.eqn
abc 02> choice; if -K 3; ps
abc 03> lutpack -N 3 -S 3; ps
abc 04> show
abc 05> write_bench BF_Q6.bench
您可能需要运行 choice;如果-K 3;ps
多次以获得更好的结果.
You may need to run choice; if -K 3; ps
multiple time to get better results.
生成的BF_Q6.bench包含一个FPGA的3-LUT:
The resulting BF_Q6.bench contains the 3-LUTs for an FPGA:
INPUT(A)
INPUT(B)
INPUT(C)
INPUT(D)
INPUT(E)
INPUT(F)
OUTPUT(F0)
OUTPUT(F1)
n11 = LUT 0x01 ( B, C, D )
n12 = LUT 0x1 ( A, E )
n14 = LUT 0x9 ( A, E )
n16 = LUT 0xe9 ( B, C, D )
n18 = LUT 0x2 ( n11, n14 )
F1 = LUT 0xae ( n18, n12, n16 )
n21 = LUT 0xd9 ( F, n11, n14 )
n22 = LUT 0xd9 ( F, n12, n16 )
F0 = LUT 0x95 ( F, n21, n22 )
4.可以将其机械翻译为C ++:
4. This can be translated to C++ mechanically:
__m512i not(__m512i a) { return _mm512_xor_si512(a, _mm512_set1_epi32(-1)); }
__m512i binary(__m512i a, __m512i b, int i) {
switch (i)
{
case 0: return _mm512_setzero_si512();
case 1: return not(_mm512_or_si512(a, b));
case 2: return _mm512_andnot_si512(a, b);
case 3: return not(a);
case 4: return _mm512_andnot_si512(b, a);
case 5: return not(b);
case 6: return _mm512_xor_si512(a, b);
case 7: return not(_mm512_and_si512(a, b));
case 8: return _mm512_and_si512(a, b);
case 9: return not(_mm512_xor_si512(a, b));
case 10: return b;
case 11: return _mm512_or_si512(not(a), b);
case 12: return a;
case 13: return mm512_or_si512(a, not(b));
case 14: return _mm512_or_si512(a, b);
case 15: return _mm512_set1_epi32(-1);
default: return _mm512_setzero_si512();
}
}
void BF_Q6(const __m512i A, const __m512i B, const __m512i C, const __m512i D, const __m512i E, const __m512i F, __m512i& F0, __m512i& F1) {
const auto n11 = _mm512_ternarylogic_epi64(B, C, D, 0x01);
const auto n12 = binary(A, E, 0x1);
const auto n14 = binary(A, E, 0x9);
const auto n16 = _mm512_ternarylogic_epi64(B, C, D, 0xe9);
const auto n18 = binary(n11, n14, 0x2);
F1 = _mm512_ternarylogic_epi64(n18, n12, n16, 0xae);
const auto n21 = _mm512_ternarylogic_epi64(F, n11, n14, 0xd9);
const auto n22 = _mm512_ternarylogic_epi64(F, n12, n16, 0xd9);
F0 = _mm512_ternarylogic_epi64(F, n21, n22, 0x95);
}
问题仍然是生成的C ++代码是否最佳.我认为这种方法(通常)不会产生最小的3-LUT网络,只是因为此问题对NP不利.此外,不可能向ABC通知指令并行性,并且不可能对变量的顺序进行优先排序,以使稍后将使用的变量不在LUT的第一位置(因为第一个源操作数被覆盖)结果).但是编译器可能足够聪明,可以进行此类优化.
The question remains whether the resulting C++ code is optimal. I don't think that this method (often) yields the smallest networks of 3-LUTs, simply because this problem is NP-hard. Additionally, it is not possible to inform ABC about instruction parallelism, and it is not possible to prioritize the order of variables such that variables that will be used later are not in the first position of the LUT (since the first source operand is overwritten with the result). But the compiler may be smart enough to do such optimizations.
ICC18进行以下组装:
ICC18 gives the following assembly:
00007FF75DCE1340 sub rsp,78h
00007FF75DCE1344 vmovups xmmword ptr [rsp+40h],xmm15
00007FF75DCE134A vmovups zmm2,zmmword ptr [r9]
00007FF75DCE1350 vmovups zmm1,zmmword ptr [r8]
00007FF75DCE1356 vmovups zmm5,zmmword ptr [rdx]
00007FF75DCE135C vmovups zmm4,zmmword ptr [rcx]
00007FF75DCE1362 vpternlogd zmm15, zmm15, zmm15, 0FFh
00007FF75DCE1369 vpternlogq zmm5, zmm1, zmm2, 0E9h
00007FF75DCE1370 vmovaps zmm3, zmm2
00007FF75DCE1376 mov rax, qword ptr[&E]
00007FF75DCE137E vpternlogq zmm3, zmm1, zmmword ptr[rdx], 1
00007FF75DCE1385 mov r11, qword ptr[&F]
00007FF75DCE138D mov rcx, qword ptr[F0]
00007FF75DCE1395 mov r10, qword ptr[F1]
00007FF75DCE139D vpord zmm0, zmm4, zmmword ptr[rax]
00007FF75DCE13A3 vpxord zmm4, zmm4, zmmword ptr[rax]
00007FF75DCE13A9 vpxord zmm0, zmm0, zmm15
00007FF75DCE13AF vpxord zmm15, zmm4, zmm15
00007FF75DCE13B5 vpandnd zmm1, zmm3, zmm15
00007FF75DCE13BB vpternlogq zmm1, zmm0, zmm5, 0AEh
00007FF75DCE13C2 vpternlogq zmm15, zmm3, zmmword ptr[r11], 0CBh
^^^^^ ^^^^^^^^^^^^^^^^
00007FF75DCE13C9 vpternlogq zmm5, zmm0, zmmword ptr[r11], 0CBh
00007FF75DCE13D0 vmovups zmmword ptr[r10], zmm1
00007FF75DCE13D6 vpternlogq zmm5, zmm15, zmmword ptr[r11], 87h
00007FF75DCE13DD vmovups zmmword ptr [rcx],zmm5
00007FF75DCE13E3 vzeroupper
00007FF75DCE13E6 vmovups xmm15,xmmword ptr [rsp+40h]
00007FF75DCE13EC add rsp,78h
00007FF75DCE13F0 ret
ICC18能够将 const auto n22 = _mm512_ternarylogic_epi64(F,n12,n16,0xd9);
的变量顺序更改为 vpternlogq zmm15,zmm3,zmmword ptr [r11],0CBh
,这样变量F不会被覆盖.(但是奇怪的是,从内存中提取了足够的3次...)
ICC18 is able to change the variable ordering in const auto n22 = _mm512_ternarylogic_epi64(F, n12, n16, 0xd9);
to vpternlogq zmm15, zmm3, zmmword ptr[r11], 0CBh
such that variable F is not overwritten. (But strangely enough fetched from memory 3 times...)
这篇关于真值表归结为三元逻辑运算,vpternlog的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!