将Haskell程序编译为LLVM IR缺少主程序 [英] Compiled Haskell program to LLVM IR is missing main

查看:150
本文介绍了将Haskell程序编译为LLVM IR缺少主程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

遵循此SO帖子关于Haskell程序的编译 到LLVM IR,我使用了相同的Haskell程序,并尝试运行其生成的LLVM IR代码:

following this SO post regarding the compilation of Haskell programs to LLVM IR, I took the same Haskell program and tried to run its resulting LLVM IR code:

quicksort [] = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
  where
    lesser  = filter (<  p) xs
    greater = filter (>= p) xs

main = print(quicksort([5,2,1,0,8,3]))

我首先将其编译为LLVM IR

I first compiled it to LLVM IR with

$ ghc -keep-llvm-files main.hs

然后我使用以下命令将其转换为位代码:

Then I transformed it to bitcode with:

$ llvm-as main.ll

但是,当我尝试使用lli运行它时,遇到有关缺少主线的以下错误:

However, when I tried to run it with lli I get the following error regarding a missing main:

$ lli main.bc
'main' function not found in module.

我做错什么了吗?谢谢.

Am I doing something wrong? thanks.

(摘自K. A. Buhr的回答)

(from answer by K. A. Buhr)

$ ls -l main*
main.hs
$ ghc -keep-llvm-files main.hs
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
$ ls -l main*
main
main.hi
main.hs
main.ll
main.o
$ rm main main.hi main.o
$ llvm-as main.ll
$ llc main.bc -filetype=obj -o main.o
$ ghc -o main main.o
$ ./main
[0,1,2,3,5,8]

推荐答案

tl; dr..入口点(可能)命名为ZCMain_main_closure,它是一种数据结构,它引用一个块.代码,而不是代码本身.尽管如此,它仍可以由Haskell运行时解释,并且直接与main.hs程序中函数main :: IO ()的Haskell值"相对应.

tl;dr. The entry point is (probably) named ZCMain_main_closure, and it's a data structure that references a block of code, rather than a block of code itself. Still, it's interpretable by the Haskell runtime, and it corresponds directly to the Haskell "value" of the function main :: IO () in your main.hs program.

更长的答案涉及到比链接程序更多的信息,但这就是问题.当您使用像这样的C程序时:

The longer answer involves more than you ever wanted to know about linking programs, but here's the deal. When you take a C program like:

#include <stdio.h>
int main()
{
        printf("I like C!\n");
}

使用gcc将其编译为目标文件:

compile it to an object file with gcc:

$ gcc -Wall -c hello.c

并检查目标文件的符号表:

and inspect the object file's symbol table:

$ nm hello.o
0000000000000000 T main
                 U printf

您将看到它包含符号main的定义和对外部符号printf的(未定义)引用.

you will see that it contains a definition of the symbol main and an (undefined) reference to an external symbol printf.

现在,您可能会想到main是该程序的入口点".哈哈哈哈!您认为这是天真又愚蠢的事情!

Now, you might imagine that main is the "entry point" of this program. Hah hah hah! What a naive and silly thing for you to think!

实际上,真正的Linux专家知道您程序的入口点根本不在目标文件hello.o中.它在哪里?好吧,它在"C运行时" 中,这是一个由实际创建可执行文件时:

In fact, real Linux gurus know that the entry point to your program isn't in the object file hello.o at all. Where is it? Well, it's in the "C runtime", a little file that gets linked in by gcc when you actually create your executable:

$ nm /usr/lib/x86_64-linux-gnu/crt1.o
0000000000000000 D __data_start
0000000000000000 W data_start
0000000000000000 R _IO_stdin_used
                 U __libc_csu_fini
                 U __libc_csu_init
                 U __libc_start_main
                 U main
0000000000000000 T _start
$

请注意,此目标文件具有对main未定义引用,该引用将链接到您在hello.o中的所谓入口点.这个小存根定义了 real 入口点,即_start.您可以说这是实际的入口点,因为如果将程序链接到可执行文件中,将会看到_start符号和ELF入口点的位置(这是内核实际上首先将控制权转移到的地址)当您execve()您的程序)将重合:

Note that this object file has an undefined reference to main which will be linked to your so-called entry point in hello.o. It's this little stub defines the real entry point, namely _start. You can tell this is the actual entry point because if you link the program into an executable, you'll see that the location of the _start symbol and the ELF entry point (which is the address to which the kernel actually first transfers control when you execve() your program) will coincide:

$ gcc -o hello hello.o
$ nm hello | egrep 'T _start'
0000000000400430 T _start
$ readelf -h hello | egrep Entry
Entry point address:               0x400430

这就是说,程序的入口"实际上是一个非常复杂的概念.

All this is to say, the "entry point" of a program is actually a pretty complex concept.

当您使用LLVM工具链而不是GCC编译并运行C程序时,情况都非常相似.这是设计使所有内容与GCC兼容. hello.ll文件中的所谓入口点只是C函数main,而不是程序的 real 入口点.这仍然由crt1.o存根提供.

When you compile and run a C program with the LLVM toolchain instead of GCC, the situation is all pretty similar. That's by design to keep everything compatible with GCC. The so-called entry point in your hello.ll file is just the C function main, and it's not the real entry point of your program. That's still provided by the crt1.o stub.

现在,如果我们(最终)从谈论C切换到谈论Haskell,则Haskell运行时显然比C运行时复杂十亿倍,但它是在C运行时之上构建的.因此,当您以常规方式编译Haskell程序时:

Now, if we (finally) switch from talking about C to talking about Haskell, the Haskell runtime is, obviously, about a billion times more complicated than the C runtime, but it's been built on top of the C runtime. So, when you compile a Haskell program the normal way:

$ ghc main.hs
stack ghc -- main.hs
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
$

您会看到可执行文件具有一个名为_start的入口点:

you can see that the executable has an entry point named _start:

$ nm main | egrep 'T _start'
0000000000406560 T _start

实际上与调用C入口点之前的C运行时存根相同:

which is actually the same C runtime stub as before that calls the C entry point:

$ nm main | egrep 'T main'
0000000000406dc4 T main
$ 

但是这个 main不是您的Haskell main. main是GHC在链接时动态创建的程序中的C main函数.您可以通过运行以下程序来查看此类程序:

but this main is not your Haskell main. This main is a C main function in a program dynamically created by GHC at link time. You can look at such a program by running:

$ ghc -v -keep-tmp-files -fforce-recomp main.hs

并在/tmp子目录中的某个地方翻找一个名为ghc_4.c的文件:

and rummaging around for a file named ghc_4.c somewhere in a /tmp subdirectory:

$ cat /tmp/ghc10915_0/ghc_4.c
#include "Rts.h"
extern StgClosure ZCMain_main_closure;
int main(int argc, char *argv[])
{
 RtsConfig __conf = defaultRtsConfig;
 __conf.rts_opts_enabled = RtsOptsSafeOnly;
 __conf.rts_opts_suggestions = true;
 __conf.rts_hs_main = true;
 return hs_main(argc,argv,&ZCMain_main_closure,__conf);
}

现在,您是否看到对ZCMain_main_closure的外部引用?不管您是否相信,它都是程序的Haskell入口点,无论您是使用普通GHC管道还是通过LLVM后端进行编译,都应该在main.o中找到它:

Now, do you see that external reference to ZCMain_main_closure? That, believe it or not, is the Haskell entry point for your program, and you should find it in main.o, whether you compiled using the vanilla GHC pipeline or via the LLVM backend:

$ egrep ZCMain_main_closure main.ll
%ZCMain_main_closure_struct = type <{i64, i64, i64, i64}>
...

现在,它不是功能". Haskell运行时系统可以理解这是一种特殊格式的数据结构(闭包).上面的hs_main()函数(还有另一个入口点!)是Haskell运行时的主要入口点:

Now, it's not a "function". It's a specially formatted data structure (a closure) that the Haskell runtime system understands. The hs_main() function above (yet another entry point!) is the main entry point into the Haskell runtime:

$ nm ~/.stack/programs/x86_64-linux/ghc-8.4.3/lib/ghc-8.4.3/rts/libHSrts.a | egrep hs_main
0000000000000000 T hs_main
$

,它接受Haskell主函数的关闭,作为Haskell入口点,开始执行程序.

and it accepts a closure for a Haskell main function as the Haskell entry point to begin executing your program.

因此,如果您遇到了所有麻烦,希望将Haskell程序隔离在*.ll文件中,则可以通过跳转到其入口点以某种方式直接运行它,那么对于您来说,我有一些坏消息. ..;)

So, if you went through all this trouble in the hopes of isolating a Haskell program in an *.ll file that you could somehow run directly by jumping to its entry point, then I've got some bad news for you... ;)

这篇关于将Haskell程序编译为LLVM IR缺少主程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆