如何创建一个轻量级的C code沙盒? [英] How to create a lightweight C code sandbox?

查看:145
本文介绍了如何创建一个轻量级的C code沙盒?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想建一个C pre-处理器/编译器,允许本地和在线来源收集功能。即:

I'd like to build a C pre-processor / compiler that allows functions to be collected from local and online sources. ie:

#fetch MP3FileBuilder http://scripts.com/MP3Builder.gz
#fetch IpodDeviceReader http://apple.com/modules/MP3Builder.gz

void mymodule_main() {
  MP3FileBuilder(&some_data);
}

这是容易的部分。

难的是的我需要一个可靠的方式来沙箱直接或不受限制地访问磁盘或系统资源(包括内存分配和堆栈)进口code 。我想办法安全地运行不可信C $ C $ç(模块)的小片段,而不将它们放在单独的进程的开销,虚拟机或间preTER(一个单独的线程是可以接受的虽然)。

The hard part is I need a reliable way to "sandbox" the imported code from direct or unrestricted access to disk or system resources (including memory allocation and the stack). I want a way to safely run small snippets of untrusted C code (modules) without the overhead of putting them in separate process, VM or interpreter (a separate thread would be acceptable though).

要求


  • 我需要把配额的数据和资源,包括CPU时间的访问。

  • 我将阻止标准库直接访问

  • 我想阻止恶意code,创建无限递归

  • 我想限制的静态和动态分配到特定的限制

  • 我想要捕获所有异常模块可能会提高(如由0分)。

  • 的模块只能与其他模块通过核心接口进行交互

  • 的模块只能通过核心接口系统(I / O等。)
  • 互动
  • 模块必须允许位OPS,数学,数组,枚举,循环和分支。

  • 模块不能使用ASM

  • 我想限制指针,并为模块保留的存储器阵列存取(通过自定义的safe_malloc())

  • 必须支持ANSI C或子集(见下文)

  • 系统必须是轻量级的和跨平台(包括嵌入式系统)。

  • 的系统必须是GPL或LGPL兼容。

我很高兴地满足于C的一个子集我不需要像的东西模板或类。我在事情高级语言不做好喜欢快速数学,位运算,和二进制数据的搜索和处理感兴趣。

I'm happy to settle for a subset of C. I don't need things like templates or classes. I'm primarily interested in the things high-level languages don't do well like fast maths, bit operations, and the searching and processing of binary data.

不会的现有C code能不加修改地重用创建一个模块意向。的意图是,模块将需要符合旨在限制该模块基本逻辑和转换操作(例如像一个视频反code或玉米pression操作)的一组规则和限制。

It is not the intention that existing C code can be reused without modification to create a module. The intention is that modules would be required to conform to a set of rules and limitations designed to limit the module to basic logic and transformation operations (like a video transcode or compression operations for example).

其理论输入这样的编译器/ pre处理器将是一个单一的ANSI C文件(或安全的子集)与module_main功能,NO包括或pre-处理器指令,没有ASM,这将允许循环,分支,函数调用,指针数学(仅限于分配给该模块的范围),位转移,位域,管型,枚举,数组,整数,浮点数,字符串和数学。还有什么是可选的。

The theoretical input to such a compiler/pre-processor would be a single ANSI C file (or safe subset) with a module_main function, NO includes or pre-processor directives, no ASM, It would allow loops, branching, function calls, pointer maths (restricted to a range allocated to the module), bit-shifting, bitfields, casts, enums, arrays, ints, floats, strings and maths. Anything else is optional.

示例实现

下面是一个伪code片段来解释这更好的。这里一个模块超过它的内存分配配额并同时创建无限递归。

Here's a pseudo-code snippet to explain this better. Here a module exceeds it's memory allocation quota and also creates infinite recursion.

buffer* transcodeToAVI_main( &in_buffer ) {
    int buffer[1000000000]; // allocation exceeding quota
    while(true) {} // infinite loop
    return buffer;
}

下面是一个转换的版本,其中我们的preprocessor增加了观察点,以检查内存使用和递归和异常处理程序包裹了整个事情。

Here's a transformed version where our preprocessor has added watchpoints to check for memory usage and recursion and wrapped the whole thing in an exception handler.

buffer* transcodeToAVI_main( &in_buffer ) {
    try {
        core_funcStart(__FILE__,__FUNC__); // tell core we're executing this function
        buffer = core_newArray(1000000000, __FILE__, __FUNC__); // memory allocation from quota
        while(true) {
           core_checkLoop(__FILE__, __FUNC__, __LINE__) && break; // break loop on recursion limit
        } 
        core_moduleEnd(__FILE__,__FUNC__);
    } catch {
        core_exceptionHandler(__FILE__, __FUNC__);
    }
    return buffer;
}

我意识到执行这些检查影响模块性能,但我怀疑它仍然跑赢高级别或VM语言为它的目的是要解决的任务。我并不想阻止模块做危险的事情顾左右而言他,我只是想迫使这些危险的事情以可控的方式发生(通过用户反馈等)。即:模块X已经超出了它的内存分配,继续或中止?

I realise performing these checks impact the module performance but I suspect it will still outperform high-level or VM languages for the tasks it is intended to solve. I'm not trying to stop modules doing dangerous things outright, I'm just trying to force those dangerous things to happen in a controlled way (like via user feedback). ie: "Module X has exceeded it's memory allocation, continue or abort?".

更新

到目前为止,我已经得到了最好是使用自定义的编译器(像砍死TCC)与边界检查和一些自定义功能和循环code赶上递归。我还是想听听你对别的什么,我需要检查或者什么解决方案是在那里的想法。我想象,删除ASM和使用之前检查指针解决了很多在下面previous答案pssed的关注EX $ P $的。我加了奖金撬出SO社会的一些更多的反馈。

The best I've got so far is to use a custom compiler (Like a hacked TCC) with bounds checking and some custom function and looping code to catch recursions. I'd still like to hear thoughts on what else I need to check for or what solutions are out there. I imagine that removing ASM and checking pointers before use solves a lot of the concerns expressed in previous answers below. I added a bounty to pry some more feedback out of the SO community.

对于赏金我要找:


  • 针对上述定义的理论体系潜在的漏洞的详细信息

  • 在每个访问检查指针可能的优化

  • 的概念实验开放源码实现(如谷歌本地客户端)

  • 支持多种操作系统和设备(不OS /硬件基础的解决方案)的解决方案

  • 支持大多数C操作或解决方案甚至C ++(如果可能的话)

有关可与海湾合作委员会(即pre-处理器或的GCC补丁)。

Extra credit for a method that can work with GCC (ie, a pre-processor or small GCC patch).

我还会给任何人谁最终能证明我试图不能在所有做的事情兼顾。您将需要pretty说服力的,但因为没有反对意见迄今真的钉的,为什么他们认为这是不可能的技术方面。在那些表示谁的防守没有这个问题最初被提出作为一种安全地运行C ++。我现在已经缩减的要求为C的有限子集。

I'll also give consideration to anyone who can conclusively prove what I'm attempting cannot be done at all. You will need to be pretty convincing though because none of the objections so far have really nailed the technical aspects of why they think it's impossible. In the defence of those who said no this question was originally posed as a way to safely run C++. I have now scaled back the requirement to a limited subset of C.

我的C的理解可以被归类为中级,我的PC硬件的理解也许是下面的先进的一个步骤。试试,如果你能指导你的答案为这一水平。由于我没有C专家我会根据给定一个答案,表决票走了大部分的答案是如何紧密地来到我的要求。您可以为您的索赔(受访者)提供足够的证据,并通过投票(其他人)协助。一旦赏金倒计时达到6小时我会分配一个答案。

My understanding of C could be classed as "intermediate", my understanding of PC hardware is maybe a step below "advanced". Try to coach your answers for that level if you can. Since I'm no C expert I'll be going largely based on votes given to an answer as well as how closely the answer comes to my requirements. You can assist by providing sufficient evidence for your claims (respondents) and by voting (everyone else). I'll assign an answer once the bounty countdown reaches 6 hours.

最后,我相信解决这一问题将是朝着维持日益网络和偏执狂的世界C'S意义的重要一步。至于其他语言缩小差距的性能代价和计算能力的增长将越来越难自圆其说C开发的额外风险(因为它现在是与ASM)。我相信,你的答案会比进球数那么点更大的相关性,所以请贡献一下就可以了,即使赏金已过期。

Finally, I believe solving this problem would be a major step towards maintaining C's relevance in an increasingly networked and paranoid world. As other languages close the gap performance-wise and computing power grows it will be harder and harder to justify the added risk of C development (as it is now with ASM). I believe your answers will have a much greater relevance than scoring a few SO points so please contribute what you can, even if the bounty has expired.

推荐答案

由于C标准是太广泛被允许,你需要周围走另一条路:指定需要C的最小的子集,以及试图实现。甚至ANSI C已经是太复杂了,让不必要的行为。

Since the C standard is much too broad to be allowed, you would need to go the other way around: specify the minimum subset of C which you need, and try to implement that. Even ANSI C is already too complicated and allows unwanted behaviour.

C的方面是最有问题的是指针:C语言需要指针arithmitic,而那些不检查。例如:

The aspect of C which is most problematic are the pointers: the C language requires pointer arithmitic, and those are not checked. For example:

char a[100];
printf("%p %p\n", a[10], 10[a]);

将两个打印相同的地址。由于 A [10] == 10 [A] == *(10 + A)== *(A + 10)

所有这些指针访问无法在编译时进行检查。这是相同的复杂性要求编译器为这需要解决停机问题程序中的所有错误。

All these pointer accesses cannot be checked at compile time. That's the same complexity as asking the compiler for 'all bugs in a program' which would require solving the halting problem.

由于您希望此功能可以在同一个进程(可能在不同的线程)中运行你的应用程序和安全模块之间共享内存,因为这是有一个线程的整点:更快的共享数据访问。然而,这也意味着,两个线程能够读写同样的存储器

Since you want this function to be able to run in the same process (potentially in a different thread) you share memory between your application and the 'safe' module since that's the whole point of having a thread: share data for faster access. However, this also means that both threads can read and write the same memory.

和,因为你不能证明编译时间,其中三分球结束了,你所要做的,在运行时。这意味着,code像'一[10]已被翻译成类似'get_byte(A + 10)在这一点上我不会把它Ç了。

And since you cannot prove compile time where pointers end up, you have to do that at runtime. Which means that code like 'a[10]' has to be translated to something like 'get_byte(a + 10)' at which point I wouldn't call it C anymore.

谷歌本地客户端

所以,如果这是真的,谷歌是如何做到这一点呢?那么,与此相反的要求在这里(跨平台(包括嵌入式系统)),谷歌集中在x86,其中有额外的配页也保护段寄存器分页。这使得它可以创建一个沙箱,其中另一个线程不共享同样的方式相同的内存:沙箱是由分段限制为只改变自己的内存范围。此外:

So if that's true, how does google do it then? Well, in contrast to the requirements here (cross-platform (including embedded systems)), Google concentrates on x86, which has in additional to paging with page protections also segment registers. Which allows it to create a sandbox where another thread does not share the same memory in the same way: the sandbox is by segmentation limited to changing only its own memory range. Furthermore:


  • 安全x86汇编结构列表组装

  • 的gcc改为发出这些安全的构造

  • 此列表的方式,是构建可验证

  • 加载模块后,此验证完成

因此​​,这是特定平台,而不是一个简单的解决方案,尽管工作之一。了解更多他们的<一个href=\"http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/34913.pdf\"相对=nofollow>研究论文。

So this is platform specific and is not a 'simple' solution, although a working one. Read more at their research paper.

结论

所以,你去什么路线,你需要用一些新的东西可核实,并开始了
只有这样,你可以通过调整现有的编译器或生成一个新的开始。但是,试图模仿ANSI C需要一个思考的指针问题。谷歌模仿他们的沙箱不是ANSI C,但在x86的一个子集,这使他们能够利用现有的编译器到一个伟大的被捆绑到了X86的缺点延长。

So whatever route you go, you need to start out with something new which is verifiable and only then you can start by adapting an existing a compiler or generating a new one. However, trying to mimic ANSI C requires one to think about the pointer problem. Google modelled their sandbox not on ANSI C but on a subset of x86, which allowed them to use existing compilers to a great extend with the disadvantage of being tied to x86.

这篇关于如何创建一个轻量级的C code沙盒?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆