是否可以为移动GC实现生成带有类型信息的ansi C函数? [英] Is it possible to generate ansi C functions with type information for a moving GC implementation?

查看:59
本文介绍了是否可以为移动GC实现生成带有类型信息的ansi C函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道有什么方法可以将类型信息添加到生成的C方法中.我正在将高级编程语言转换为C,我想添加一个移动的垃圾收集器.但是,要做到这一点,我需要方法变量具有类型信息,否则我可以修改看起来像指针的原始值.

I am wondering what methods there are to add typing information to generated C methods. I'm transpiling a higher-level programming language to C and I'd like to add a moving garbage collector. However to do that I need the method variables to have typing information, otherwise I could modify a primitive value that looks like a pointer.

一种显而易见的方法是将所有(原始和非原始)变量封装在一个结构中,该结构具有一个额外的(枚举)变量来键入信息,但是这会导致内存和性能开销,因此,转码的含义是嵌入式平台.如果我要接受内存开销,那么显而易见的选择是对所有对象使用堆句柄,然后我就可以自由移动堆块了.但是,我想知道是否有更有效的更好方法.

An obvious approach would be to encapsulate all (primitive and non-primitive) variables in a struct that has an extra (enum) variable for typing information, however this would cause memory and performance overhead, the transpiled code is namely meant for embedded platforms. If I were to accept the memory overhead the obvious option would be to use a heap handle for all objects and then I'd be able to freely move heap blocks. However I'm wondering if there's a more efficient better approach.

我想出了一个潜在的解决方案,即根据变量是否为原语预先声明和分组变量(我可以在编译器中完成此操作),并在每个方法的最后添加偏移量变量(我需要在扫描堆栈区域时准确找到它),这告诉我非原始变量在何处开始以及在何处结束,因此我只能对其进行扫描.这意味着每种方法都将使用额外的16/32位(取决于体系结构)内存,但是与堆句柄方法相比,这仍应具有更高的内存效率.

I've come up with a potential solution, namely to predeclare and group variables based whether they're primitives or not (I can do that in the transpiler), and add an offset variable to each method at the end (I need to be able to find it accurately when scanning the stack area), that tells me where the non-primitive variables begin and where they end, so I can only scan those. This means that each method will use an additional 16/32-bit (depending on arch) of memory, however this should still be more memory efficient than the heap handle approach.

示例:

void my_func() {
  int i = 5;
  int z = 3;
  bool b = false;
  void* person;
  void* person_info = ...;
  .... // logic
  volatile int offset = 0x034;
}

我的目标是在GCC编译器上通用的东西,因此我的关注点是:

My aim is for something that works universally across GCC compilers, thus my concerns are:

  • 编译器是否可以根据变量在变量中的声明方式对其进行重新排序源代码?
  • 我可以强制编译器将一些数据放入方法的堆栈框架(使用易失性)?
  • 扫描纸叠时可以准确找到偏移量吗?

我想避免组装,因此这种方法(默认情况下)可以在多个平台上工作,但是即使方法涉及组装(如果可靠),我也乐于接受.

I'd like to avoid assembly so this approach can work (by default) across multiple platforms, however I'm open for methods even if they involve assembly (if they're reliable).

推荐答案

可以在C函数名称中以某种方式编码键入信息.这是由C ++和其他实现完成的,称为名称处理.

Typing information could be somehow encoded in the C function name; this is done by C++ and other implementations and called name mangling.

实际上,由于生成了所有C代码,因此您可以决定采用不同的约定:生成长的C标识符,这些标识符实际上是唯一的,并且在整个程序范围内都是随机的,例如 tiziw_7oa7eIzzcxv03TmmZ 并将他们的打字信息保存在其他地方(例如某个数据库).在Linux上,这样的方法对 libbacktrace readelf(1) gdb(1)),因此在 bismon

Actually, you could decide, since all your C code is generated, to adopt a different convention: generate long C identifiers which are practically unique and sort-of random program-wide, such as tiziw_7oa7eIzzcxv03TmmZ and keep their typing information elsewhere (e.g. some database). On Linux, such an approach is friendly to both libbacktrace and dlsym(3) + dladdr(3) (and of course nm(1) or readelf(1) or gdb(1)), so used in both bismon and RefPerSys projects.

键入信息实际上与调用约定 x86-64 ABI要求使用不同的处理器寄存器来传递浮点或指针.

Typing information is practically tied to calling conventions and ABIs. For example, the x86-64 ABI for Linux mandates different processor registers for passing floating points or pointers.

阅读垃圾收集手册或至少阅读P.Wilson Boehm的GC),而不是精确的一个.在我以前的 GCC MELT 项目中,我为世代复制GC生成了C或C ++代码. Bismon

Read the Garbage Collection handbook or at least P.Wilson Uniprocessor Garbage Collection Techniques survey. You could decide to use tagged integers instead of boxing them, and you could decide to have a conservative GC (e.g. Boehm's GC) instead of a precise one. In my old GCC MELT project I generated C or C++ code for a generational copying GC. Similar techniques are used both in Bismon and in RefPerSys.

由于要移植到C,请考虑其他替代方法,例如 libgccjit LLVM .查看 libjit asmjit .

Since you are transpiling to C, consider also alternatives, such as libgccjit or LLVM. Look into libjit and asmjit.

还要研究其他编译器(C编译器)的实现,包括 Chicken/Scheme Bigloo .

Study also the implementation of other transpilers (compilers to C), including Chicken/Scheme and Bigloo.

GCC编译器能否按照源代码中声明的方式对变量进行重新排序?

Can the GCC compiler reorder the variables from how they're declared in the source code?

当然可以,具体取决于您要进行的优化.某些变量甚至不存在于二进制文件中(例如那些保留在寄存器中的变量).

Of course yes, depending upon the optimizations you are asking. Some variables won't even exist in the binary (e.g. those staying in registers).

我可以强制编译器将一些数据放入方法的堆栈框架中(使用volatile)吗?

Can I force the compiler to put some data in the method's stack frame (using volatile)?

更好地生成一个包含所有语言变量的 struct 变量,并将优化留给编译器.您会感到惊讶(请参阅报告草稿).

Better generate a single struct variable containing all your language variables, and leave optimizations to the compiler. You will be surprised (see this draft report).

扫描纸叠时可以准确找到偏移量吗?

Can I find the offset accurately when scanning the stack?

这是最困难的事情,在编译器优化中很多/a>(例如,如果您对生成的C代码使用 -O1 -O3 运行 gcc ,则为 some 使用最新的GCC-例如 GCC 9 Ocaml 编译器的实现.

This is the most difficult, and depends a lot of compiler optimizations (e.g. if you run gcc with -O1 or -O3 on the generated C code; in some cases a recent GCC -e.g GCC 9 or GCC 10 on x86-64 for Linux- is capable of tail-call optimizations; check by compiling using gcc -O3 -S -fverbose-asm then looking into the produced assembler code). If you accept some small target processor and compiler specific tricks, this is doable. Study the implementation of the Ocaml compiler.

向我发送电子邮件(至 basile@starynkevitch.net )以供讨论.请在其中提及您问题的网址.

Send me (to basile@starynkevitch.net) an email for discussion. Please mention the URL of your question in it.

如果您想要具有多线程的高效世代复制GC,事情将变得非常棘手.问题是您可以负担多少年的开发费用.

If you want to have an efficient generational copying GC with multi-threading, things become extremely tricky. The question is then how many years of development can you afford spending.

如果您的语言中有例外,请格外小心.您可以非常谨慎地生成对 longjmp 的调用.

If you have exceptions in your language, take also a great care. You could with great caution generate calls to longjmp.

请参见我的此答案.

使用转堆技术,细节在于罪恶

在Linux上(具体来说!),另请参阅我的 manydl.c 程序.它表明,在Linux x86-64笔记本电脑上,您实际上可以生成成千上万的 dlopen(3) -ed 插件.然后阅读 如何编写共享库

On Linux (specifically!) see also my manydl.c program. It demonstrates that on a Linux x86-64 laptop you could generate, in practice, hundred of thousands of dlopen(3)-ed plugins. Read then How to write shared libraries

还要研究 SBCL

Study also the implementation of SBCL and of GNU Prolog, at least for inspiration.

PS.完全不依赖于体系结构且独立于操作系统的编译器的梦想是一种幻想.

PS. The dream of a totally architecture-neutral and operating-system independent transpiler is an illusion.

这篇关于是否可以为移动GC实现生成带有类型信息的ansi C函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆