如何在LLVM中实现字符串数据类型? [英] How can I implement a string data type in LLVM?

查看:228
本文介绍了如何在LLVM中实现字符串数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我一直在研究 LLVM ,我发现它是一个非常有趣的体系结构.但是,通过本教程和参考资料,我看不到任何如何实现

I have been looking at LLVM lately, and I find it to be quite an interesting architecture. However, looking through the tutorial and the reference material, I can't see any examples of how I might implement a string data type.

关于整数,实数和其他数字类型,甚至数组,函数和结构,有很多文档,但是AFAIK与字符串无关.我是否必须添加新数据类型到后端?有没有一种使用内置数据类型的方法?任何见识将不胜感激.

There is a lot of documentation about integers, reals, and other number types, and even arrays, functions and structures, but AFAIK nothing about strings. Would I have to add a new data type to the backend? Is there a way to use built-in data types? Any insight would be appreciated.

推荐答案

什么是字符串?字符数组.

What is a string? An array of characters.

什么是字符?整数.

因此,尽管我不是LLVM专家,但我猜想,例如,如果您想表示某个8位字符集,则应使用i8(8位整数)数组,或者指向i8的指针.实际上,如果我们有一个简单的hello world C程序:

So while I'm no LLVM expert by any means, I would guess that if, eg, you wanted to represent some 8-bit character set, you'd use an array of i8 (8-bit integers), or a pointer to i8. And indeed, if we have a simple hello world C program:

#include <stdio.h>

int main() {
        puts("Hello, world!");
        return 0;
}

然后我们使用llvm-gcc对其进行编译,并转储生成的LLVM程序集:

And we compile it using llvm-gcc and dump the generated LLVM assembly:

$ llvm-gcc -S -emit-llvm hello.c
$ cat hello.s
; ModuleID = 'hello.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
target triple = "x86_64-linux-gnu"
@.str = internal constant [14 x i8] c"Hello, world!\00"         ; <[14 x i8]*> [#uses=1]

define i32 @main() {
entry:
        %retval = alloca i32            ; <i32*> [#uses=2]
        %tmp = alloca i32               ; <i32*> [#uses=2]
        %"alloca point" = bitcast i32 0 to i32          ; <i32> [#uses=0]
        %tmp1 = getelementptr [14 x i8]* @.str, i32 0, i64 0            ; <i8*> [#uses=1]
        %tmp2 = call i32 @puts( i8* %tmp1 ) nounwind            ; <i32> [#uses=0]
        store i32 0, i32* %tmp, align 4
        %tmp3 = load i32* %tmp, align 4         ; <i32> [#uses=1]
        store i32 %tmp3, i32* %retval, align 4
        br label %return

return:         ; preds = %entry
        %retval4 = load i32* %retval            ; <i32> [#uses=1]
        ret i32 %retval4
}

declare i32 @puts(i8*)

请注意对文件末尾声明的puts函数的引用.在C中,看跌期权是

Notice the reference to the puts function declared at the end of the file. In C, puts is

int puts(const char *s)

在LLVM中,是

i32 @puts(i8*)

通信应该清楚.

顺便说一句,这里生成的LLVM非常冗长,因为我编译时没有进行优化.如果您将其打开,则不必要的说明会消失:

As an aside, the generated LLVM is very verbose here because I compiled without optimizations. If you turn those on, the unnecessary instructions disappear:

$ llvm-gcc -O2 -S -emit-llvm hello.c
$ cat hello.s 
; ModuleID = 'hello.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
target triple = "x86_64-linux-gnu"
@.str = internal constant [14 x i8] c"Hello, world!\00"         ; <[14 x i8]*> [#uses=1]

define i32 @main() nounwind  {
entry:
        %tmp2 = tail call i32 @puts( i8* getelementptr ([14 x i8]* @.str, i32 0, i64 0) ) nounwind              ; <i32> [#uses=0]
        ret i32 0
}

declare i32 @puts(i8*)

这篇关于如何在LLVM中实现字符串数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆