如何实现数组-LLVM [英] How to implement arrays - llvm

查看:219
本文介绍了如何实现数组-LLVM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用llvm c ++ api编写一种玩具语言,并且正在尝试实现数组.我尝试了几种不同的方法,但都效果不佳.

I am working on a toy language written using the llvm c++ api and I am trying to implement arrays. I have tried several different things none of which have worked very well.

这就是我要去的地方

  • 可以类似于数组的类型(可以使用结构,向量或数组)
  • 可以传递给函数并从中返回
  • 可以具有无限嵌套的数组(例如[8 x [8 x [8 x [...])
  • 可以重新分配
  • 数组都是相同的类型
  • 数组在创建时指定了有限的长度.
  • A type that can resemble an array (struct, vector or array would work)
  • Can be passed to and returned from functions
  • Can have infinitely nested arrays (eg. [8 x [8 x [8 x [...])
  • Can be re-assignable
  • Arrays are all of the same type
  • Arrays are of finite length specified on creation.

理想情况下,它们会迅速地类似于数组.

Ideally, they would resemble arrays in swift.

当前,我正在使用基本数组类型.这将检查所有框,除了可以将一维数组传递给函数并从函数返回之外,多维(嵌套)数组必须通过指针传递.这意味着我不仅必须将返回的指针比特转换为正确的类型,而且还必须遍历每个元素并将其存储在适当的位置.很快,这不仅变得非常凌乱,而且很慢.

Currently I am using the basic array type. This checks all of the boxes except that while a one dimensional array can be passed to and returned from a function, multidimensional (nested) arrays must be passed by pointer. This means that I have to not only bitcast the returned pointer to the proper type but then loop through every element and store it in the appropriate place. Quickly this becomes not only very messy but also slow.

据我所知,您不能有嵌套向量.例如.这不起作用:

As far as I can tell, you cannot have nested vectors. eg. this does not work:

<8 x <8 x i32>>

使用malloc

最后,我尝试使用malloc分配适当的空间,然后仅对所有内容使用指针.与我当前的解决方案有很多相同的问题.

Using malloc

Lastly, I tried using malloc to allocate the appropriate space then just use pointers for everything. This had many of the same problems as my current solution.

以下面的示例为例:

  %1 = alloca [5 x i32], align 16
  %2 = alloca i32*, align 8
  %3 = getelementptr inbounds [5 x i32], [5 x i32]* %1, i32 0, i32 0
  %4 = call i32* @_Z8getArrayPi(i32* %3)
  store i32* %4, i32** %2, align 8

无法将指针%4存储到%1中,因此必须创建新的分配.否则(使用嵌套数组)将需要复制每个元素,并且会出现与当前解决方案相同的问题.同样,理想情况下,编译器将处理所有指针,因此对于语言用户而言,它就像是一个数组.

There is no way to store the pointer %4 into %1 so a new allocation must be created. Otherwise (with nested arrays) each element would need to be copied and there would be the same problem as in my current solution. Also, ideally the compiler would deal with all the pointers, so it would just look like an array to the user of the language.

在不清楚的情况下-我的问题是如何实现数组,以便可以执行上面列出的所有操作(而不必复制每个元素).

In case it was not clear - my question is how do I implement arrays so that I can do all of the things listed above (without having to copy every element).

推荐答案

序言

我认为alloca是正确的方法,如果它们将保持固定大小,但是您必须写很多样板来实现有保证的(C ++风格)省略以优化函数返回.

I think alloca is the correct approach if they're going to be fixed size but you'll have to write quite a bit of boilerplate to implement guaranteed (C++ style) elision to optimize function returns.

现在问题出在所有权语义上,如果要完美地做到这一点,您将需要确定它们的生存期,并确定它们是否需要堆分配,或者只能放弃堆栈分配.但是,现在让我们忽略优化. Swift/ARC的实现在幕后非常混乱,并且使用了很多优化功能,因此您也不必对它进行建模,更不用说Swift/ObjC ARC代码和非托管"代码之间的空白了.

Now the problem is in ownership semantics, if you want to do it perfectly you will need to determine their lifetime and establish whether they need heap allocations or can get away with stack allocations only. But let's ignore optimization for now. The Swift/ARC implementation is quite messy behind the scenes and uses a lot of optimizations so you don't necessarily want to model it either, let alone the gaps between Swift/ObjC ARC code an "unmanaged" code.

简单的解决方案像Swift"

如果您不想处理该问题,假设您知道某个函数的所有可能退出路径,则可以在堆上使用某种形式的运行时分配数组,并提供侵入式引用计数语义.这就是Swift使用的模型(最优化之上)-您堆分配每个数组,并实现一个控制块来跟踪数组的引用计数和大小(例如,如果要进行动态边界检查).在这种情况下,您将使用简单的按引用传递语义,例如:

If you don't want to deal with that, assuming you know all the possible exit paths from a function, you can allocate your arrays using some form of a runtime on the heap and provide intrusive reference counting semantics. This is the model Swift uses (with a ton of optimizations on top) - you heap allocate every array and implement a control block to track the reference count and size of your array (if you want dynamic boundary checking for example). In this case, you would be going with simple pass-by-reference semantics, as such:

  • 只能 通过对堆对象的引用来表示数组.
  • 输入函数时,如果传入了特殊的数组类型(您可以注释LLVM的Value来跟踪它们),则会增加引用计数.
  • 在终止的基本块上,您发出对运行时函数的调用,该调用将对每个被跟踪的对象减少该函数内被跟踪对象的引用计数(那里有很多优化空间).
  • 请确保上面有一个例外,当从函数返回的数组不会减少引用计数以确保它在返回中仍然有效时,您必须确保在弹出调用帧时保持对它们的跟踪(为简单起见)假设您只能返回一个参考).
  • 最困难的部分是保留/释放语义,当您不能静态确定数组的生存期时,引用计数就变得很有用.每次将引用存储在任何类型的数据结构中时,都需要保留运行时(增加引用计数).所有者销毁后,其中的所有已引用对象都会被释放(如果没有引用,则递减refcount&释放).
  • 存储数组引用时,将保留它(增加refcount),这是假设存储对象(为了简化起见),存储对其他数组的引用的数组具有静态(在编译器中难以实现)或动态感知它存储的对象类型的数量,并且知道如果存储引用计数的对象,则在销毁引用对象时,它需要释放所有引用.
  • Arrays can only be represented by a reference to a heap object.
  • When you enter a function, if your special array type (you can annotate LLVM's Value to track those) is passed in, you bump up the refcount.
  • At the terminating basic block you emit a call to a runtime function that would say decrement the refcount of a tracked object within the function (a lot of room for optimization there) for every tracked object.
  • Make sure there's an exception to the above where when an array being returned from a function will not have the reference count decremented ensuring it survives the return, you have to ensure you keep track of those as you pop call frames (for simplicity let's assume you can only return a single reference).
  • Hardest part is retain/release semantics when you cannot statically determine the lifetime of the array, this is where reference counting becomes useful. Every time the reference is stored in any kind of a data structure, you need to have the runtime retain (increase the refcount). Upon destruction of the owner, any refcounted object in it is released (decrement refcount & release if no refs).
  • When your array reference is stored, it is retained (refcount increased), this assumes the storing object, to make it simple, an array storing references to other arrays has either static (harder to implement in the compiler) or dynamic awareness of the object types it stores and that it's aware that if it's storing reference counted objects, upon its destruction it needs to release all references.

这几乎是它的基础,并且是ARC工作原理的非常精简的版本.现在要考虑的事情:

That's pretty much the basics of it an is a very very dumbed down version of how ARC works. Now here's things to think about:

  • 非线性控制流(线程,exept,协程,闭包).放松特别有趣.
  • 在您的语言中,每种非平凡的类型都具有相似的特征(引用对象),只要您不允许使用原始指针,这会使事情变得容易得多.您可以用这种方式来处理闭包(事实上,这就是运行时块所做的事情.)
  • 当您开始拥有指针或通过非托管代码封送托管对象时,一切都会变得一团糟.
  • 根据您选择的设计方式,非所有的引用(指针)将需要特别注意.
  • 复制对象.

我建议从一个对象开始,该对象将是使用您的语言管理的所有内容的基础,包括您的数组类型,并附带一些类型信息. (就像libobjc4中的NSObject一样简单得多).

I would suggest starting off with a refcounted object that's going to be a base of everything managed in your language including your array types and carry some sort of type information with it. (like a far less complex version ofNSObject in libobjc4).

这篇关于如何实现数组-LLVM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆