关闭优化后无法解析的外部符号aullshr [英] Unresolved external symbol aullshr when optimization is turned off

查看：264 发布时间：2020/7/30 21:04:05 c visual-c++ intrinsics bit-fields uefi

本文介绍了关闭优化后无法解析的外部符号__aullshr的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Visual Studio 2015 C/C ++编译器编译一段UEFI C代码.

编译器的目标是 IA32 ，而不是X64.

使用"/O1"打开优化时，构建就可以了.

使用"/Od"关闭优化时，该构建产生以下错误:

error LNK2001: unresolved external symbol __aullshr

根据此处，有一个解释为什么编译器可以隐式调用这种函数:

事实证明，此功能是几个编译器支持之一由Microsoft C/C ++编译器显式调用的函数. 在这种情况下，每当32位编译器都将调用此函数需要将两个64位整数相乘. EDK不链接与Microsoft的库一起使用，并且不提供此功能.

还有其他类似功能吗?当然，还有更多64位版本除法，余数和移位.

但是根据此处:

...实现固有功能的编译器通常会启用它们 仅当程序请求优化时 ...

当我用/Od显式关闭优化时仍如何调用此类函数?

ADD 1-2:16 PM 2019/2/16

对于__aullshr函数，我似乎错了.

它不是编译器固有函数.据 UEFI编码标准5.6.3.4位字段:

位字段只能是INT32类型，带符号INT32，UINT32或 typedef名称定义为三个INT32变体之一.

所以我的最终解决方案是修改UEFI代码以使用UINT32而不是UINT64.

解决方案

用于创建UEFI应用程序的构建设置忽略了MSVC的代码源希望提供的辅助函数的静态库. MSVC的代码源有时会插入对辅助函数的调用，就像gcc在32位平台或其他各种情况下对64x64进行乘法或除法的方式一样. (例如，没有硬件popcnt的目标上的popcount.)

在这种情况下，将MSVC手持到不太愚蠢的代码源中(本身是一件好事)，恰好删除了代码库中所有使用辅助函数的用法.很好，但不能解决您的构建设置. 如果以后添加需要帮助者的代码，它可能会再次中断.即使在-O2，uint64_t shr(uint64_t a, unsigned c) { return a >> c; }也会编译为包括对辅助函数的调用.

在没有优化的情况下按常量进行移位使用_aullshr，而不是内联为shrd/shr. 此确切问题(损坏的-Od版本)将在uint64_t x中再次出现； x >> 4或您来源中的某些内容.

(我不知道MSVC在哪里保留其辅助函数库.我们认为这是一个静态库，您可以在不引入DLL依赖项的情况下进行链接(对于UEFI是不可能的)，但是我们不知道它是否可能与一些您需要避免与UEFI链接的CRT启动代码.

在此示例中可以清楚地看到未优化与已优化的问题.带有优化功能的MSVC不需要辅助功能，但是它的Braindead -Od代码则是.

对于位域访问，MSVC显然使用了位域成员基本类型的右移.在您的情况下，您将其设置为64位类型，而32位x86没有64位整数移位(使用MMX或SSE2除外).使用-Od甚至对于恒定计数，它会将数据放入EDX:EAX，将移位计数放入cl(就像x86移位指令一样)，然后调用__aullshr.

__a = ??
ull = unsigned long long.
shr =右移(例如同名的x86 asm指令).
与x86移位指令完全一样，它需要在cl中进行移位计数.

使用Godbolt编译器资源管理器x86 MSVC 19.16 -Od ，其中UINT64作为位域成员类型.

;; from int Method1(unsigned __int64) PROC 
    ...
   ; extract IssueStruct.Bits.field8
    mov     eax, DWORD PTR _IssueStruct$[ebp]
    mov     edx, DWORD PTR _IssueStruct$[ebp+4]
    mov     cl, 57                                    ; 00000039H
    call    __aullshr        ; emulation of   shr  edx:eax,  cl
    and     eax, 1
    and     edx, 0
    ;; then store that to memory and cmp/jcc both halves.  Ultra braindead

很明显，由于恒定移位并只能访问1位，因此很容易优化，因此MSVC实际上并没有在-O2 处调用helper函数.但是，它仍然效率很低！即使没有位域宽于32位，它也无法完全优化基本类型的64位.

; x86 MSVC 19.16 -O2   with unsigned long long as the bitfield type
int Method1(unsigned __int64) PROC                              ; Method1, COMDAT
    mov     edx, DWORD PTR _Data$[esp]       ; load the high half of the inputs arg
    xor     eax, eax                         ; zero the low half?!?
    mov     ecx, edx                         ; copy the high half
    and     ecx, 33554432       ; 02000000H  ; isolate bit 57
    or      eax, ecx                         ; set flags from low |= high
    je      SHORT $LN2@Method1
    and     edx, 536870912      ; 20000000H   ; isolate bit 61
    xor     eax, eax                          ; re-materialize low=0 ?!?
    or      eax, edx                          ; set flags from low |= high
    je      SHORT $LN2@Method1
    mov     eax, 1
    ret     0
$LN2@Method1:
    xor     eax, eax
    ret     0
int Method1(unsigned __int64) ENDP                              ; Method1

显然，这对于实现下半部分的0确实是愚蠢的，而不是仅仅忽略它. 如果我们将位字段成员类型更改为unsigned ，则MSVC会做得更好. (在Godbolt链接中，我将其更改为bf_t，因此我可以使用与UINT64分开的typedef，并将其保留给其他工会成员.)

有了基于unsigned field : 1位域成员的结构，MSVC不需要-Od

的帮助器

它甚至可以在-O2上提供更好的代码，，因此，您绝对应该在实际的生产代码中执行此操作. 仅对需要大于32位的字段使用uint64_t或unsigned long long成员，如果您关心MSVC的性能，MSVC显然存在针对位字段的64位类型的优化遗漏的错误成员.

;; MSVC -O2 with plain  unsigned  (or uint32_t) bitfield members
int Method1(unsigned __int64) PROC                              ; Method1, COMDAT
    mov     eax, DWORD PTR _Data$[esp]
    test    eax, 33554432                     ; 02000000H
    je      SHORT $LN2@Method1
    test    eax, 536870912                    ; 20000000H
    je      SHORT $LN2@Method1
    mov     eax, 1
    ret     0
$LN2@Method1:
    xor     eax, eax
    ret     0
int Method1(unsigned __int64) ENDP                              ; Method1

我可能已经像((high >> 25) & (high >> 29)) & 1那样使用2条shr指令和2条and指令(以及mov)无分支地实现了它.但是，如果确实可以预测，则分支是合理的，并且可以打破数据依赖关系.但是clang在这里做得很好，使用not + test一次测试两个位. (然后setcc再次将结果作为整数).这比我的想法有更好的延迟，特别是在没有消除运动的CPU上. clang也不会对基于64位类型的位域进行优化.两种方式我们都得到相同的代码.

# clang7.0 -O3 -m32    regardless of bitfield member type
Method1(unsigned long long):                            # @Method1(unsigned long long)
    mov     ecx, dword ptr [esp + 8]
    xor     eax, eax           # prepare for setcc
    not     ecx
    test    ecx, 570425344     # 0x22000000
    sete    al
    ret

UEFI编码标准:

gcc和clang甚至在C或C ++模式下，即使在32位模式和-Wall -Wextra -Wpedantic模式下也不会警告unsigned long long作为位域类型.我不认为ISO C或ISO C ++会有问题.

进一步，应使用指出应该不建议将普通int作为位域类型，因为签名是由实现定义的.而且ISO C ++标准讨论了从char到long long的位域类型.

我认为您的有关非int位域的MSVC警告一定来自某种编码标准实施包，因为即使使用--Wall，Godbolt上的常规MSVC也不会这样做.

警告C4214:使用了非标准扩展名:除int外的位字段类型

I am compiling a piece of UEFI C code with Visual Studio 2015 C/C++ compiler.

The compiler is targeting IA32, not X64.

When turning on the optimization with "/O1", the build is OK.

When turning off the optimization with "/Od", the build gives below error:

error LNK2001: unresolved external symbol __aullshr

According to here, there's an explanation why this kind of functions can be called implicitly by the compiler:

It turns out that this function is one of several compiler support functions that are invoked explicitly by the Microsoft C/C++ compiler. In this case, this function is called whenever the 32-bit compiler needs to multiply two 64-bit integers together. The EDK does not link with Microsoft's libraries and does not provide this function.

Are there other functions like this one? Sure, several more for 64-bit division, remainder and shifting.

But according to here:

...Compilers that implement intrinsic functions generally enable them only when a program requests optimization...

So how could such functions still be called when I explicitly turned off the optimization with /Od??

ADD 1 - 2:32 PM 2/16/2019

It seems I am wrong about the __aullshr function.

It is not a compiler intrinsic function. According to here, it turns out to be a runtime library function, whose implementation can be found in: C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\intel\ullshr.asm or C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\crt\src\i386\ullshr.asm

Such VC runtime functions are brought in by the compiler for 32-bit applications to do 64-bit operations.

But I still don't know why /O1 can build pass while /Od failed? It seems optimization switch can affect the usage of VC runtime library.

ADD 2 - 4:59 PM 2/17/2019

I found the code that cause the build failure.

It turns out to be some C struct bit field operation. There's a 64-bit C struct which has a lot of bit fields backed by a single UINT64 variable. When I comment out the single line of code that access these bit fields, the build is pass. It seems _aullshr() function is used to access these bit fields when /Od is specified.

Since this is part of the firmware code, I am wondering if it is a good practice to turn-off the optimization with /Od?

ADD 3 - 9:33 AM 2/18/2019

I created below minimal reproducible example for VS2015.

First, there's a static lib project:

(test.c)

typedef unsigned __int64    UINT64;

typedef union {
    struct {
        UINT64 field1 : 16;
        UINT64 field2 : 16;
        UINT64 field3 : 6;
        UINT64 field4 : 15;
        UINT64 field5 : 2;
        UINT64 field6 : 1;
        UINT64 field7 : 1;
        UINT64 field8 : 1;  //<=========
        UINT64 field9 : 1;
        UINT64 field10 : 1;
        UINT64 field11 : 1;
        UINT64 field12 : 1; //<=========
        UINT64 field13 : 1;
        UINT64 field14 : 1;
    } Bits;
    UINT64 Data;
} ISSUE_STRUCT;


int
Method1
(
    UINT64        Data
)
{
    ISSUE_STRUCT              IssueStruct;
    IssueStruct.Data = Data;

    if (IssueStruct.Bits.field8 == 1 && IssueStruct.Bits.field12 == 1) { // <==== HERE
        return 1;
    }
    else
    {
        return 0;
    }
}

Then a Windows DLL project:

(DllMain.c)

#include <Windows.h>
typedef unsigned __int64    UINT64;

int
Method1
(
    UINT64        Data
);

int __stdcall DllMethod1
(
    HINSTANCE hinstDLL,
    DWORD fdwReason,
    LPVOID lpReserved
)
{
    if (Method1(1234)) //<===== Use the Method1 from the test.obj
    {
        return 1;
    }
    return 2;
}

Build process:

First, compile the test.obj:

cl.exe /nologo /arch:IA32 /c /GS- /W4 /Gs32768 /D UNICODE /O1b2 /GL /EHs-c- /GR- /GF /Gy /Zi /Gm /Gw /Od /Zl test.c

(add: VC++ 2015 compiler gives below warning for test.obj:

warning C4214: nonstandard extension used: bit field types other than int

)

Then compile the DllMain.obj:

cl /nologo /arch:IA32 /c /GS- /W4 /Gs32768 /D UNICODE /O1b2 /GL /EHs-c- /GR- /GF /Gy /Zi /Gm /Gw /Od /Zl DllMain.c

Then link the DllMain.obj to the test.obj

link DllMain.obj ..\aullshr\test.obj /NOLOGO /NODEFAULTLIB /IGNORE:4001 /OPT:REF /OPT:ICF=10 /MAP /ALIGN:32 /SECTION:.xdata,D /SECTION:.pdata,D /MACHINE:X86 /LTCG /SAFESEH:NO /DLL /ENTRY:DllMethod1 /DRIVER

It will give below error:

Generating code Finished generating code test.obj : error LNK2001: unresolved external symbol __aullshr DllMain.dll : fatal error LNK1120: 1 unresolved externals

If I remove the bit field manipulation code at HERE in the test.c, the link error will disappear.
If I only remove the /Od from the compile options for the test.c, the link error will disappear.

ADD 4 - 12:40 PM 2/18/2019

Thanks to @PeterCordes in his comment, there's an even simpler way to reproduce this issue. Just invoke below method:

uint64_t shr(uint64_t a, unsigned c) { return a >> c; }

Then compile the source code with below command:

cl /nologo /arch:IA32 /c /GS- /W4 /Gs32768 /D UNICODE /O1b2 /GL /EHs-c- /GR- /GF /Gy /Zi /Gm /Gw /Od /Zl DllMain.c

link DllMain.obj /NOLOGO /NODEFAULTLIB /IGNORE:4001 /OPT:REF /OPT:ICF=10 /MAP /ALIGN:32 /SECTION:.xdata,D /SECTION:.pdata,D /MACHINE:X86 /LTCG /SAFESEH:NO /DLL /ENTRY:DllMethod1 /DRIVER

This issue can be reproduced for:

Microsoft (R) C/C++ Optimizing Compiler Version 18.00.40629 for x86 (VS2013)
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24210 for x86 (VS2015)
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x86 (VS2015)

As mandated in the UEFI coding standard 5.6.3.4 Bit Fields :

Bit fields may only be of type INT32, signed INT32, UINT32, or a typedef name defined as one of the three INT32 variants.

So my final solution is to modify the UEFI code to use UINT32 instead of UINT64.

解决方案

Your build setup for creating UEFI applications omits the static library of helper functions that MSVC's code-gen expects to be available. MSVC's code-gen sometimes inserts calls to helper functions, the same way that gcc does for 64x64 multiply or divide on 32-bit platforms, or various other things. (e.g. popcount on targets without hardware popcnt.)

In this case hand-holding MSVC into less-stupid code-gen (a good thing in itself) happens to remove all the uses of helper functions for your codebase. That's good but doesn't fix your build setup. It can break again if you add code in the future that needs a helper. uint64_t shr(uint64_t a, unsigned c) { return a >> c; } does compile to include a call to the helper function even at -O2.

A shift by a constant without optimization uses _aullshr, instead of inlining as shrd / shr. This exact problem (broken -Od builds) would recur with uint64_t x ; x >> 4 or something in your source.

(I don't know where MSVC keeps its library of helper functions. We think it's a static library that you could link without introducing a DLL dependency (impossible for UEFI), but we don't know if its maybe bundled with some CRT startup code that you need to avoid linking with for UEFI.)

The un-optimized vs. optimized issue is clear with this example. MSVC with optimization doesn't need the helper function, but its braindead -Od code does.

For bitfield access, MSVC apparently uses a right shift of the base type of the bitfield member. In your case, you made it a 64-bit type, and 32-bit x86 doesn't have a 64-bit integer shift (except using MMX or SSE2). With -Od even for constant counts it puts the data in EDX:EAX, the shift count in cl (just like for x86 shift instructions), and calls __aullshr.

__a = ??
ull = unsigned long long.
shr = shift right (like the x86 asm instruction of the same name).
it takes the shift count in cl, exactly like x86 shift instructions.

From the Godbolt compiler explorer, x86 MSVC 19.16 -Od, with UINT64 as the bitfield member type.

;; from int Method1(unsigned __int64) PROC 
    ...
   ; extract IssueStruct.Bits.field8
    mov     eax, DWORD PTR _IssueStruct$[ebp]
    mov     edx, DWORD PTR _IssueStruct$[ebp+4]
    mov     cl, 57                                    ; 00000039H
    call    __aullshr        ; emulation of   shr  edx:eax,  cl
    and     eax, 1
    and     edx, 0
    ;; then store that to memory and cmp/jcc both halves.  Ultra braindead

Obviously for constant shift and access to only 1 bit, that's easy to optimize, so MSVC doesn't actually call the helper function at -O2. It's still pretty inefficient, though! It fails to fully optimize away the 64-bitness of the base type, even though none of the bitfields are wider than 32.

; x86 MSVC 19.16 -O2   with unsigned long long as the bitfield type
int Method1(unsigned __int64) PROC                              ; Method1, COMDAT
    mov     edx, DWORD PTR _Data$[esp]       ; load the high half of the inputs arg
    xor     eax, eax                         ; zero the low half?!?
    mov     ecx, edx                         ; copy the high half
    and     ecx, 33554432       ; 02000000H  ; isolate bit 57
    or      eax, ecx                         ; set flags from low |= high
    je      SHORT $LN2@Method1
    and     edx, 536870912      ; 20000000H   ; isolate bit 61
    xor     eax, eax                          ; re-materialize low=0 ?!?
    or      eax, edx                          ; set flags from low |= high
    je      SHORT $LN2@Method1
    mov     eax, 1
    ret     0
$LN2@Method1:
    xor     eax, eax
    ret     0
int Method1(unsigned __int64) ENDP                              ; Method1

Obviously this is really stupid materializing a 0 for the low half instead of just ignoring it. MSVC does much better if we change the bitfield member type to unsigned. (In the Godbolt link, I changed that to bf_t so I could use a typedef separate from UINT64, keeping that for the other union member.)

With a struct based on unsigned field : 1 bitfield members, MSVC doesn't need the helper at -Od

And it even makes better code at -O2, so you should definitely do that in your real production code. Only use uint64_t or unsigned long long members for fields that need to be wider than 32 bits, if you care about performance on MSVC which apparently has a missed-optimization bug with 64-bit types for bitfield members.

;; MSVC -O2 with plain  unsigned  (or uint32_t) bitfield members
int Method1(unsigned __int64) PROC                              ; Method1, COMDAT
    mov     eax, DWORD PTR _Data$[esp]
    test    eax, 33554432                     ; 02000000H
    je      SHORT $LN2@Method1
    test    eax, 536870912                    ; 20000000H
    je      SHORT $LN2@Method1
    mov     eax, 1
    ret     0
$LN2@Method1:
    xor     eax, eax
    ret     0
int Method1(unsigned __int64) ENDP                              ; Method1

I might have implemented it branchlessly like ((high >> 25) & (high >> 29)) & 1 with 2 shr instructions and 2 and instructions (and a mov). If it's really predictable, though, branching is reasonable and breaks the data dependency. clang does a nice job here, though, using not + test to test both bits at once. (And setcc to get the result as an integer again). This has better latency than my idea, especially on CPUs without mov-elimination. clang doesn't have a missed optimization for bitfields based on 64-bit types, either. We get the same code either way.

# clang7.0 -O3 -m32    regardless of bitfield member type
Method1(unsigned long long):                            # @Method1(unsigned long long)
    mov     ecx, dword ptr [esp + 8]
    xor     eax, eax           # prepare for setcc
    not     ecx
    test    ecx, 570425344     # 0x22000000
    sete    al
    ret

UEFI coding standard:

The EDK II coding standard 5.6.3.4 Bit Fields says that:

Bit fields may only be of type INT32, signed INT32, UINT32, or a typedef name defined as one of the three INT32 variants.

I don't know why they make up these "INT32" names when C99 already has perfectly good int32_t. It's also unclear why they'd place this restriction. Perhaps because of the MSVC missed-optimization bug? Or maybe to aid human programmer comprehension by disallowing some "weird stuff".

gcc and clang don't warn about unsigned long long as a bitfield type, even in 32-bit mode and with -Wall -Wextra -Wpedantic, in C or C++ mode. I don't think ISO C or ISO C++ have a problem with it.

Further, Should use of bit-fields of type int be discouraged? points out that plain int as a bitfield type should be discouraged because the signedness is implementation-defined. And that the ISO C++ standard discusses bitfield types from char to long long.

I think your MSVC warning about non-int bitfields must be from some kind of coding-standard-enforcement package, because normal MSVC on Godbolt doesn't do that even with `-Wall.

warning C4214: nonstandard extension used: bit field types other than int

这篇关于关闭优化后无法解析的外部符号__aullshr的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

关闭优化后无法解析的外部符号aullshr [英] Unresolved external symbol aullshr when optimization is turned off

问题描述

ADD 1-2:16 PM 2019/2/16

UEFI编码标准:

ADD 1 - 2:32 PM 2/16/2019

ADD 2 - 4:59 PM 2/17/2019

ADD 3 - 9:33 AM 2/18/2019

ADD 4 - 12:40 PM 2/18/2019

UEFI coding standard:

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

关闭优化后无法解析的外部符号__aullshr [英] Unresolved external symbol __aullshr when optimization is turned off

问题描述

ADD 1-2:16 PM 2019/2/16

UEFI编码标准:

ADD 1 - 2:32 PM 2/16/2019

ADD 2 - 4:59 PM 2/17/2019

ADD 3 - 9:33 AM 2/18/2019

ADD 4 - 12:40 PM 2/18/2019

UEFI coding standard:

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

关闭优化后无法解析的外部符号aullshr [英] Unresolved external symbol aullshr when optimization is turned off

登录关闭