使用LLVM测试C / C ++代码 [英] Instrumenting C/C++ codes using LLVM

查看:187
本文介绍了使用LLVM测试C / C ++代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是阅读关于LLVM项目,并且它可以用于使用LLVM前端的分析器Clang对C / C ++代码进行静态分析。我想知道是否可以使用LLVM在源代码中提取对内存(变量,本地以及全局)的所有访问。



是否有任何内置存在于LLVM中的库,我可以用它来提取这些信息。
如果不是,请建议我如何写函数做同样的(现有的源代码,参考,教程,示例...)
我想的是,我会先转换源代码进入LLVM bc然后仪器它做分析,但不知道怎么做。






I试图找出自己应该使用哪个IR来实现我的目的(Clang的抽象语法树(AST)或LLVM的SSA中间表示(IR)。),但是不能真正弄清哪个IR。
这是我想要做的。
给定任何C / C ++程序(如下面给出的),我试图插入调用一些函数,在每个指令之前和之后读/写从内存。例如考虑下面的C ++程序(Account.cpp)

  #include< stdio.h> 

类帐户{
int balance;

public:
Account(int b){
balance = b;
}

int read(){
int r;
r = balance;
return r;
}

void deposit(int n){
balance = balance + n;
}

void withdraw(int n){
int r = read();
balance = r - n;
}
};

int main(){
Account * a = new Account(10);
a-> deposit(1);
a-> withdraw(2);
delete a;
}

所以在测试后我的程序应该是:

  #include< stdio.h> 

类帐户{
int balance;

public:
Account(int b){
balance = b;
}

int read(){
int r;
foo();
r = balance;
foo();
return r;
}

void deposit(int n){
foo();
balance = balance + n;
foo();
}

void withdraw(int n){
foo();
int r = read();
foo();
foo();
balance = r - n;
foo();
}
};

int main(){
Account * a = new Account(10);
a-> deposit(1);
a-> withdraw(2);
delete a;
}

其中foo()可以是任何函数,例如获取当前系统时间或增量一个计数器等。我理解,要插入像上面的功能,我将不得不先获得IR,然后运行仪器通过IR,这将插入这样的调用到IR,但我真的不知道如何实现它。请给我建议我的例子如何去。



我也明白,一旦我把程序编译成IR,这将是真的很难得到1:1映射在我的原始程序和仪器化IR之间。因此,是否可以将在IR中所做的更改(因为工具)反映到原始程序中。



为了开始使用LLVM pass和如何使用一个在我自己,我看一个例子,添加运行时检查LLVM IR加载和存储,SAFECode的加载/存储检测通过( http://llvm.org/viewvc/llvm-project/safecode/trunk/include/safecode/LoadStoreChecks.h ?view = markup http://llvm.org/viewvc/llvm-project/safecode/trunk/lib/InsertPoolChecks/LoadStoreChecks.cpp?view=markup )。但我不知道如何运行这个pass。

解决方案

首先,你必须在一个程序上运行这个程序。决定是否要使用clang或LLVM。它们都运行在非常不同的数据结构上,它们有优点和缺点。



从你的问题的稀疏描述,我建议在LLVM中使用优化传递。使用IR将使它更容易消毒,分析和注入代码,因为这是它的设计目的。缺点是,你的项目将依赖于LLVM,这可能是或可能不是你的问题。您可以使用C后端输出结果,但这不会由人类使用。



使用优化过程时的另一个重要缺点是,您也会丢失原始源代码中的所有符号。即使 Value 类(稍后更多)具有 getName 方法,您应该从不<强>依赖它来包含任何有意义的东西。这是为了帮助你调试你的通行证和没有别的。



你还必须有一个基本的了解编译器。例如,有一点需要了解基本块静态单一作业表单。幸运的是,他们不是很难学习或理解的概念(维基百科的文章应该是足够的)。



在开始编码之前,你首先要做一些阅读以下是几个链接,可帮助您开始:




  • 体系结构概述 :LLVM的快速架构概述。


  • 文档头 :您可以在其中找到以下所有链接和更多内容。


  • LLVM的IR参考 :这是LLVM IR的完整描述,这是您将要操作的。语言相对简单,所以没有太多学习。


  • 程序员手册 :简要介绍使用LLVM时需要了解的基本信息。


  • p> 写入通行证 :您需要知道的用于编写转换或分析通行证的所有内容。


  • LLVM通行证 : LLVM提供的所有通行证的完整列表,您可以和应该使用。这些可以真正帮助清理代码,使其更容易分析。例如,当使用循环时, lcssa simplified-loop indvar pass将会拯救你的生命。


  • 值继承树 :这是Value类的doxygen页面。这里的重要一点是继承树,您可以遵循以获取IR参考页面中定义的所有说明的文档。


  • 类型继承树 :与上述类似,但类型。




罢工>一旦你明白了那一切,那么它的蛋糕。找到内存访问?搜索存储加载说明。到仪器?只需使用 Value 类的正确子类创建所需的内容,并在存储和加载指令之前或之后插入它。因为你的问题有点太宽泛,我不能真正帮助你。 (见下面的更正)



顺便说一句,我不得不在几个星期前做类似的事情。在大约2-3个星期,我能够学习所有我需要的关于LLVM,创建一个分析传递,以找到一个循环内的内存访问(和更多),并用我创建的转换传递仪器。没有涉及到花哨的算法(除了LLVM提供的),一切都很简单。道德的故事是,LLVM易于学习和使用。






更正:当我说你要做的是搜索加载存储指令时发生错误。 >

加载存储指令将只提供访问使用指针到堆。为了获得所有的内存访问,你还必须看看可以表示堆栈上的内存位置的值。在发生在后端的优化遍中的寄存器分配阶段期间确定值是写入堆栈还是存储在寄存器中。



现在,除非您提供更多有关您正在寻找什么样的内存访问的信息,在什么上下文和你如何打算他们,我不能帮助你这么多。


I just read about the LLVM project and that it could be used to do static analysis on C/C++ codes using the analyzer Clang which the front end of LLVM. I wanted to know if it is possible to extract all the accesses to memory(variables, local as well as global) in the source code using LLVM.

Is there any inbuilt library present in LLVM which I could use to extract this information. If not please suggest me how to write functions to do the same.(existing source code, reference, tutorial, example...) Of what i have thought, is I would first convert the source code into LLVM bc and then instrument it to do the analysis, but don't know exactly how to do it.


I tried to figure out myself which IR should I use for my purpose ( Clang's Abstract Syntax Tree (AST) or LLVM's SSA Intermediate Representation (IR). ), but couldn't really figure out which one to use. Here is what I m trying to do. Given any C/C++ program (like the one given below), I am trying to insert calls to some function, before and after every instruction that reads/writes to/from memory. For example consider the below C++ program ( Account.cpp)

#include <stdio.h>

class Account {
  int balance;

public:
  Account(int b) {
    balance = b;
  }

  int read() {
    int r;
    r = balance;
    return r;
  }

  void deposit(int n) {
    balance = balance + n;
  }

  void withdraw(int n) {
    int r = read();
    balance = r - n;
  }
};

int main () {
  Account* a = new Account(10);
  a->deposit(1);
  a->withdraw(2);
  delete a;
}

So after the instrumentation my program should look like:

#include <stdio.h>

class Account {
  int balance;

public:
  Account(int b) {
    balance = b;
  }

  int read() {
    int r;
    foo();
    r = balance;
    foo();
    return r;
  }

  void deposit(int n) {
    foo();
    balance = balance + n;
    foo();
  }

  void withdraw(int n) {
    foo();
    int r = read();
    foo();
    foo();
    balance = r - n;
    foo();
  }
};

int main () {
  Account* a = new Account(10);
  a->deposit(1);
  a->withdraw(2);
  delete a;
}

where foo() may be any function like get the current system time or increment a counter .. so on. I understand that to insert function like above I will have to first get the IR and then run an instrumentation pass on the IR which will insert such calls into the IR, but I don't really know how to achieve it. Please suggest me with examples how to go about it.

Also I understand that once I compile the program into the IR, it would be really difficult to get 1:1 mapping between my original program and the instrumented IR. So, is it possible to reflect the changes made in the IR ( because of instrumentation ) into the original program.

In order to get started with LLVM pass and how to make one on my own, I looked at an example of a pass that adds run-time checks to LLVM IR loads and stores, the SAFECode's load/store instrumentation pass (http://llvm.org/viewvc/llvm-project/safecode/trunk/include/safecode/LoadStoreChecks.h?view=markup and http://llvm.org/viewvc/llvm-project/safecode/trunk/lib/InsertPoolChecks/LoadStoreChecks.cpp?view=markup). But I couldn't figure out how to run this pass. Please give me steps how to run this pass on some program say the above Account.cpp.

解决方案

First off, you have to decide whether you want to work with clang or LLVM. They both operate on very different data structures which have advantages and disadvantages.

From your sparse description of your problem, I'll recommend going for optimization passes in LLVM. Working with the IR will make it much easier to sanitize, analyze and inject code because that's what it was designed to do. The downside is that your project will be dependent on LLVM which may or may not be a problem for you. You could output the result using the C backend but that won't be usable by a human.

Another important downside when working with optimization passes is that you also lose all symbols from the original source code. Even if the Value class (more on that later) has a getName method, you should never rely on it to contain anything meaningful. It's meant to help you debug your passes and nothing else.

You will also have to have a basic understanding of compilers. For example, it's a bit of a requirement to know about basic blocks and static single assignment form. Fortunately they're not very difficult concepts to learn or understand (the Wikipedia articles should be adequate).

Before you can start coding, you first have to do some reading so here's a few links to get you started:

  • Architecture Overview: A quick architectural overview of LLVM. Will give you a good idea of what you're working with and whether LLVM is the right tool for you.

  • Documentation Head: Where you can find all the links below and more. Refer to this if I missed anything.

  • LLVM's IR reference: This is the full description of the LLVM IR which is what you'll be manipulating. The language is relatively simple so there isn't too much to learn.

  • Programmer's manual: A quick overview of basic stuff you'll need to know when working with LLVM.

  • Writting Passes: Everything you need to know to write transformation or analysis passes.

  • LLVM Passes: A comprehensive list of all the passes provided by LLVM that you can and should use. These can really help clean up the code and make it easier to analyze. For example, when working with loops, the lcssa, simplify-loop and indvar passes will save your life.

  • Value Inheritance Tree: This is the doxygen page for the Value class. The important bit here is the inheritance tree that you can follow to get the documentation for all the instructions defined in the IR reference page. Just ignore the ungodly monstrosity that they call the collaboration diagram.

  • Type Inheritance Tree: Same as above but for types.

Once you understand all that then it's cake. To find memory accesses? Search for store and load instructions. To instrument? Just create what you need using the proper subclass of the Value class and insert it before or after the store and load instruction. Because your question is a bit too broad, I can't really help you more than this. (See correction below)

By the way, I had to do something similar a few weeks ago. In about 2-3 weeks I was able to learn all I needed about LLVM, create an analysis pass to find memory accesses (and more) within a loop and instrument them with a transformation pass I created. There was no fancy algorithms involved (except the ones provided by LLVM) and everything was pretty straightforward. Moral of the story is that LLVM is easy to learn and work with.


Correction: I made an error when I said that all you have to do is search for load and store instructions.

The load and store instruction will only give accesses that are made to the heap using pointers. In order to get all memory accesses you also have to look at the values which can represent a memory location on the stack. Whether the value is written to the stack or stored in a register is determined during the register allocation phase which occurs in an optimization pass of the backend. Meaning that it's platform dependent and shouldn't be relied on.

Now unless you provide more information about what kind of memory accesses you're looking for, in what context and how you intend to instrument them, I can't help you much more then this.

这篇关于使用LLVM测试C / C ++代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆