真值表归结为三元逻辑运算,vpternlog [英] Truth-table reduction to ternary logic operations, vpternlog

查看:177
本文介绍了真值表归结为三元逻辑运算,vpternlog的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多包含多个变量的真值表(7个或更多),并且使用一种工具(例如,逻辑星期五1)简化逻辑公式.我可以手工完成,但这太容易出错了.然后,我将这些公式转换为编译器固有函数(例如, _mm_xor_epi32 )可以正常工作.

I have many truth-tables of many variables (7 or more) and I use a tool (eg logic friday 1) to simplify the logic formula. I could do that by hand but that is much too error prone. These formula I then translate to compiler intrinsics (eg _mm_xor_epi32) which works fine.

问题:使用 vpternlog ,我可以进行三元逻辑运算.但是我不知道有一种方法可以将我的真值表简化为(某种程度上)有效的vpternlog指令序列.

Question: with vpternlog I can make ternary logic operations. But I'm not aware of a method to simplify my truth-tables to sequences of vpternlog instructions that are (somewhat) efficient.

我并不是要问是否有人知道一种可以简化为任意三元逻辑运算的工具,尽管那会很棒,但我正在寻找一种实现这种简化的方法.

I'm not asking if someone knows a tool that simplifies to arbitrary ternary logic operations, although that would be great, I'm looking for a method to do such simplifications.

我在电气工程.

推荐答案

如何将真值表转换为 vpternlog 指令序列.

How to translate a truth table to a sequence of vpternlog instructions.

  1. 将真值表转换为逻辑公式;例如使用Logic Friday.
  2. 以Synopsys方程格式(.eqn)存储逻辑公式.例如,我使用了一个网络,该网络具有6个输入节点A至F,两个输出节点F0和F1,以及一个较为复杂(非统一)的布尔函数.

BF_Q6.eqn的内容:

Content of BF_Q6.eqn:

INORDER = A B C D E F; 
OUTORDER = F0 F1;
F0 = (!A*!B*!C*!D*!E*F) + (!A*!B*!C*!D*E*!F) + (!A*!B*!C*D*!E*!F) + (!A*!B*C*!D*!E*!F) + (!A*B*!C*!D*!E*!F) + (A*!B*!C*!D*!E*!F);
F1 = (!A*!B*!C*!D*E) + (!A*!B*!C*D*!E) + (!A*!B*C*!D*!E) + (!A*B*!C*!D*!E) + (A*!B*!C*!D*!E);

  1. 使用伯克利验证和综合研究中心的"ABC:用于顺序合成和验证的系统".我用的是Windows版本.在此处获取ABC.
  1. Use "ABC: A System for Sequential Synthesis and Verification" from the Berkeley Verification and Synthesis Research Center. I used the windows version. Get ABC here.

在ABC中,我运行:

abc 01> read_eqn BF_Q6.eqn
abc 02> choice; if -K 3; ps
abc 03> lutpack -N 3 -S 3; ps
abc 04> show
abc 05> write_bench BF_Q6.bench

您可能需要运行 choice;如果-K 3;ps 多次以获得更好的结果.

You may need to run choice; if -K 3; ps multiple time to get better results.

生成的BF_Q6.bench包含一个FPGA的3-LUT:

The resulting BF_Q6.bench contains the 3-LUTs for an FPGA:

INPUT(A)
INPUT(B)
INPUT(C)
INPUT(D)
INPUT(E)
INPUT(F)
OUTPUT(F0)
OUTPUT(F1)
n11         = LUT 0x01 ( B, C, D )
n12         = LUT 0x1 ( A, E )
n14         = LUT 0x9 ( A, E )
n16         = LUT 0xe9 ( B, C, D )
n18         = LUT 0x2 ( n11, n14 )
F1          = LUT 0xae ( n18, n12, n16 )
n21         = LUT 0xd9 ( F, n11, n14 )
n22         = LUT 0xd9 ( F, n12, n16 )
F0          = LUT 0x95 ( F, n21, n22 )

4.可以将其机械翻译为C ++:

4. This can be translated to C++ mechanically:

__m512i not(__m512i a) { return _mm512_xor_si512(a, _mm512_set1_epi32(-1)); }
__m512i binary(__m512i a, __m512i b, int i) {
    switch (i)
    {
        case 0: return _mm512_setzero_si512();
        case 1: return not(_mm512_or_si512(a, b));
        case 2: return _mm512_andnot_si512(a, b);
        case 3: return not(a);
        case 4: return _mm512_andnot_si512(b, a);
        case 5: return not(b);
        case 6: return _mm512_xor_si512(a, b);
        case 7: return not(_mm512_and_si512(a, b));
        case 8: return _mm512_and_si512(a, b);
        case 9: return not(_mm512_xor_si512(a, b));
        case 10: return b;
        case 11: return _mm512_or_si512(not(a), b);
        case 12: return a;
        case 13: return mm512_or_si512(a, not(b)); 
        case 14: return _mm512_or_si512(a, b);
        case 15: return _mm512_set1_epi32(-1);
        default: return _mm512_setzero_si512();
    }
}
void BF_Q6(const __m512i A, const __m512i B, const __m512i C, const __m512i D, const __m512i E, const __m512i F, __m512i& F0, __m512i& F1) {
    const auto n11 = _mm512_ternarylogic_epi64(B, C, D, 0x01);
    const auto n12 = binary(A, E, 0x1);
    const auto n14 = binary(A, E, 0x9);
    const auto n16 = _mm512_ternarylogic_epi64(B, C, D, 0xe9);
    const auto n18 = binary(n11, n14, 0x2);
    F1 = _mm512_ternarylogic_epi64(n18, n12, n16, 0xae);
    const auto n21 = _mm512_ternarylogic_epi64(F, n11, n14, 0xd9);
    const auto n22 = _mm512_ternarylogic_epi64(F, n12, n16, 0xd9);
    F0 = _mm512_ternarylogic_epi64(F, n21, n22, 0x95);
}

问题仍然是生成的C ++代码是否最佳.我认为这种方法(通常)不会产生最小的3-LUT网络,只是因为此问题对NP不利.此外,不可能向ABC通知指令并行性,并且不可能对变量的顺序进行优先排序,以使稍后将使用的变量不在LUT的第一位置(因为第一个源操作数被覆盖)结果).但是编译器可能足够聪明,可以进行此类优化.

The question remains whether the resulting C++ code is optimal. I don't think that this method (often) yields the smallest networks of 3-LUTs, simply because this problem is NP-hard. Additionally, it is not possible to inform ABC about instruction parallelism, and it is not possible to prioritize the order of variables such that variables that will be used later are not in the first position of the LUT (since the first source operand is overwritten with the result). But the compiler may be smart enough to do such optimizations.

ICC18进行以下组装:

ICC18 gives the following assembly:

00007FF75DCE1340  sub         rsp,78h
00007FF75DCE1344  vmovups     xmmword ptr [rsp+40h],xmm15
00007FF75DCE134A  vmovups     zmm2,zmmword ptr [r9]
00007FF75DCE1350  vmovups     zmm1,zmmword ptr [r8]
00007FF75DCE1356  vmovups     zmm5,zmmword ptr [rdx]
00007FF75DCE135C  vmovups     zmm4,zmmword ptr [rcx]

00007FF75DCE1362  vpternlogd  zmm15, zmm15, zmm15, 0FFh
00007FF75DCE1369  vpternlogq  zmm5, zmm1, zmm2, 0E9h
00007FF75DCE1370  vmovaps     zmm3, zmm2
00007FF75DCE1376  mov         rax, qword ptr[&E]
00007FF75DCE137E  vpternlogq  zmm3, zmm1, zmmword ptr[rdx], 1
00007FF75DCE1385  mov         r11, qword ptr[&F]
00007FF75DCE138D  mov         rcx, qword ptr[F0]
00007FF75DCE1395  mov         r10, qword ptr[F1]
00007FF75DCE139D  vpord       zmm0, zmm4, zmmword ptr[rax]
00007FF75DCE13A3  vpxord      zmm4, zmm4, zmmword ptr[rax]
00007FF75DCE13A9  vpxord      zmm0, zmm0, zmm15
00007FF75DCE13AF  vpxord      zmm15, zmm4, zmm15
00007FF75DCE13B5  vpandnd     zmm1, zmm3, zmm15
00007FF75DCE13BB  vpternlogq  zmm1, zmm0, zmm5, 0AEh
00007FF75DCE13C2  vpternlogq  zmm15, zmm3, zmmword ptr[r11], 0CBh
                              ^^^^^        ^^^^^^^^^^^^^^^^  
00007FF75DCE13C9  vpternlogq  zmm5, zmm0, zmmword ptr[r11], 0CBh
00007FF75DCE13D0  vmovups     zmmword ptr[r10], zmm1
00007FF75DCE13D6  vpternlogq  zmm5, zmm15, zmmword ptr[r11], 87h                                  
00007FF75DCE13DD  vmovups     zmmword ptr [rcx],zmm5

00007FF75DCE13E3  vzeroupper
00007FF75DCE13E6  vmovups     xmm15,xmmword ptr [rsp+40h]
00007FF75DCE13EC  add         rsp,78h
00007FF75DCE13F0  ret

ICC18能够将 const auto n22 = _mm512_ternarylogic_epi64(F,n12,n16,0xd9); 的变量顺序更改为 vpternlogq zmm15,zmm3,zmmword ptr [r11],0CBh,这样变量F不会被覆盖.(但是奇怪的是,从内存中提取了足够的3次...)

ICC18 is able to change the variable ordering in const auto n22 = _mm512_ternarylogic_epi64(F, n12, n16, 0xd9); to vpternlogq zmm15, zmm3, zmmword ptr[r11], 0CBh such that variable F is not overwritten. (But strangely enough fetched from memory 3 times...)

这篇关于真值表归结为三元逻辑运算,vpternlog的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆