Clang-将C标头编译为LLVM IR/位码 [英] Clang - Compiling a C header to LLVM IR/bitcode

查看:87
本文介绍了Clang-将C标头编译为LLVM IR/位码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有以下普通的C头文件:

Say I have the following trivial C header file:

// foo1.h
typedef int foo;

typedef struct {
  foo a;
  char const* b;
} bar;

bar baz(foo*, bar*, ...);

我的目标是获取此文件,并生成一个看起来像这样的LLVM模块:

%struct.bar = type { i32, i8* }
declare { i32, i8* } @baz(i32*, %struct.bar*, ...)

换句话说,将带有声明的C .h文件转换为等效的LLVM IR,包括类型解析,宏扩展等.

In other words, convert a C .h file with declarations into the equivalent LLVM IR, including type resolution, macro expansion, and so on.

通过Clang传递它以生成LLVM IR会生成一个空模块(因为实际上没有使用任何定义):

Passing this through Clang to generate LLVM IR produces an empty module (as none of the definitions are actually used):

$ clang -cc1 -S -emit-llvm foo1.h -o - 
; ModuleID = 'foo1.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"

!llvm.ident = !{!0}

!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}

我的第一个直觉是求助于Google,我遇到了两个相关的问题:来自邮件列表的一个,和来自StackOverflow的一个.两者都建议使用-femit-all-decls标志,所以我尝试了:

My first instinct was to turn to Google, and I came across two related questions: one from a mailing list, and one from StackOverflow. Both suggested using the -femit-all-decls flag, so I tried that:

$ clang -cc1 -femit-all-decls -S -emit-llvm foo1.h -o -
; ModuleID = 'foo1.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"

!llvm.ident = !{!0}

!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}

结果相同.

我也尝试禁用优化(同时使用-O0-disable-llvm-optzns),但这对输出没有影响.使用以下变体 did 产生所需的IR:

I've also tried disabling optimizations (both with -O0 and -disable-llvm-optzns), but that made no difference for the output. Using the following variation did produce the desired IR:

// foo2.h
typedef int foo;

typedef struct {
  foo a;
  char const* b;
} bar;

bar baz(foo*, bar*, ...);

void doThings() {
  foo a = 0;
  bar myBar;
  baz(&a, &myBar);
}

然后运行:

$ clang -cc1 -S -emit-llvm foo2.h -o -
; ModuleID = 'foo2.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"

%struct.bar = type { i32, i8* }

; Function Attrs: nounwind
define void @doThings() #0 {
entry:
  %a = alloca i32, align 4
  %myBar = alloca %struct.bar, align 8
  %coerce = alloca %struct.bar, align 8
  store i32 0, i32* %a, align 4
  %call = call { i32, i8* } (i32*, %struct.bar*, ...)* @baz(i32* %a, %struct.bar* %myBar)
  %0 = bitcast %struct.bar* %coerce to { i32, i8* }*
  %1 = getelementptr { i32, i8* }* %0, i32 0, i32 0
  %2 = extractvalue { i32, i8* } %call, 0
  store i32 %2, i32* %1, align 1
  %3 = getelementptr { i32, i8* }* %0, i32 0, i32 1
  %4 = extractvalue { i32, i8* } %call, 1
  store i8* %4, i8** %3, align 1
  ret void
}

declare { i32, i8* } @baz(i32*, %struct.bar*, ...) #1

attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}

除了占位符doThings之外,这正是我希望输出看起来像的样子!问题在于这需要1.)使用标头的修改版本,以及2.)事先了解事物的类型.这导致我...

Besides the placeholder doThings, this is exactly what I want the output to look like! The problem is that this requires 1.) using a modified version of the header, and 2.) knowing the types of things in advance. Which leads me to...

基本上,我正在使用LLVM生成语言的实现.该实现应通过仅指定C头文件和关联的库(无手动声明)来支持C互操作,然后在链接时由编译器使用它们,以确保函数调用与它们的签名匹配.因此,我将问题缩小为2种可能的解决方案:

Basically, I'm building an implementation for a language using LLVM to generate code. The implementation should support C interop by specifying C header files and associated libs only (no manual declarations), which will then be used by the compiler before link-time to ensure that function invocations match their signatures. Hence, I've narrowed the problem down to 2 possible solutions:

  1. 将头文件转换为LLVM IR/位码,然后可以获取每个函数的类型签名
  2. 使用libclang解析标头,然后从生成的AST中查询类型(如果对此问题没有足够的答案,则是我的最后手段")
  1. Turn the header files into LLVM IR/bitcode, which can then get the type signature of each function
  2. Use libclang to parse the headers, then query the types from the resulting AST (my 'last resort' in case there is no sufficient answer for this question)

TL; DR

我需要获取一个C头文件(例如上面的foo1.h),并且在不进行更改的情况下,使用Clang或OR生成上述预期的LLVM IR,找到从C头获取函数签名的另一种方法文件(最好使用libclang或构建C解析器)

TL;DR

I need to take a C header file (such as the above foo1.h) and, without changing it, generate the aforementioned expected LLVM IR using Clang, OR, find another way to get function signatures from C header files (preferrably using libclang or building a C parser)

推荐答案

也许不是那么优雅的解决方案,但仍然遵循doThings函数的想法,该函数由于使用了定义而迫使编译器发出IR:

Perhaps the less elegant solution, but staying with the idea of a doThings function that forces the compiler to emit IR because the definitions are used:

使用此方法识别的两个问题是,它需要修改标题,并且需要对所涉及的类型有更深入的了解,以便生成用法"以放入函数中.这两个都可以相对简单地克服:

The two problems you identify with this approach are that it requires modifying the header, and that it requires a deeper understanding of the types involved in order to generate "uses" to put in the function. Both of these can be overcome relatively simply:

  1. 不是直接编译标题,而是从包含所有使用"代码的.c文件中#include(或更可能是其预处理版本,或多个标题)编译标题.很简单:

  1. Instead of compiling the header directly, #include it (or more likely, a preprocessed version of it, or multiple headers) from a .c file that contains all the "uses" code. Straightforward enough:

// foo.c
#include "foo.h"
void doThings(void) {
    ...
}

  • 您不需要详细的类型信息即可生成名称的特定用法,无需像上面的"uses"代码中那样将结构实例化与参数以及所有复杂性进行匹配. 您实际上不需要自己收集功能签名.

    您所需要的只是名称本身的列表,并跟踪它们是用于函数还是对象类型.然后,您可以重新定义使用"功能,如下所示:

    All you need is the list of the names themselves and to keep track of whether they're for a function or for an object type. You can then redefine your "uses" function to look like this:

    void * doThings(void) {
        typedef void * (*vfun)(void);
        typedef union v { void * o; vfun f; } v;
    
        return (v[]) {
            (v){ .o = &(bar){0} },
            (v){ .f = (vfun)baz },
        };
    }
    

    这大大简化了名称的必要使用",可以将其强制转换为统一的函数类型(并使用其指针而不是调用它),或将其包装在&(){0}中(实例化不管它是什么).这意味着您根本不需要存储实际的类型信息,只需存储从标题中提取名称的 context 类型.

    This greatly simplifies the necessary "uses" of a name to either casting it to a uniform function type (and taking its pointer rather than calling it), or wrapping it in &( and ){0} (instantiating it regardless of what it is). This means you don't need to store actual type information at all, only the kind of context from which you extracted the name in the header.

    (显然,给虚拟函数和占位符类型扩展了唯一的名称,这样它们就不会与您实际想要保留的代码冲突)

    (obviously give the dummy function and the placeholder types extended unique names so they don't clash with the code you actually want to keep)

    这极大地简化了解析步骤,因为您只需要识别结构/联合或函数声明的上下文,而无需实际处理周围的信息.

    This simplifies the parsing step tremendously since you only have to recognise the context of a struct/union or function declaration, without actually needing to do very much with the surrounding information.

    一个简单但有点怪异的起点(我可能会使用它,因为我的标准:D较低)可能是:

    A simple but hackish starting point (which I would probably use because I have low standards :D ) might be:

    • grep通过#include指令的标头使用带尖括号的参数(即,您也不想为其生成声明的已安装标头).
    • 使用此列表创建一个虚拟包含文件夹,其中包含所有必需的包含文件,但为空
    • 对其进行预处理,以希望简化语法(clang -E -I local-dummy-includes/ -D"__attribute__(...)=" foo.h > temp/foo_pp.h或类似的东西)
    • structunion后面加上名称,在}之后是名称或name (的grep开头,并使用此荒谬的简化非分析来构建虚拟函数中的使用列表,并发出.c文件的代码.
    • grep through the headers for #include directives that take an angle-bracketed argument (i.e. an installed header you don't want to also generate declarations for).
    • use this list to create a dummy include folder with all of the necessary include files present but empty
    • preprocess it in the hope that'll simplify the syntax (clang -E -I local-dummy-includes/ -D"__attribute__(...)=" foo.h > temp/foo_pp.h or something similar)
    • grep through for struct or union followed by a name, } followed by a name, or name (, and use this ridiculously simplified non-parse to build the list of uses in the dummy function, and emit the code for the .c file.

    它不会抓住一切可能性;但是经过一些调整和扩展,它实际上可能会处理大量实际的标头代码.您可以在稍后的阶段用专用的简化解析器(仅用于查看所需上下文的模式)来代替它.

    It won't catch every possibility; but with a bit of tweaking and extension, it probably will actually deal with a large subset of realistic header code. You could replace this with a dedicated simplified parser (one built to only look at the patterns of the contexts you need) at a later stage.

    这篇关于Clang-将C标头编译为LLVM IR/位码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆