从LLVM IR获取精确的行/列调试信息 [英] Get precise line/column debug info from LLVM IR

查看:41
本文介绍了从LLVM IR获取精确的行/列调试信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试按行号和列号(由第三方工具报告)在LLVM Pass中找到指令以对其进行检测.为此,我正在使用 clang -g -O0 -emit-llvm 编译我的源文件,并使用以下代码在元数据中寻找信息:

I am trying to locate instructions in an LLVM Pass by line and column number (reported by an third-party tool) to instrument them. To achieve this, I am compiling my source files with clang -g -O0 -emit-llvm and looking for the information in the metadata using this code:

const DebugLoc &location = instruction->getDebugLoc();
// location.getLine()
// location.getCol()

不幸的是,此信息绝对不准确.考虑Fibonacci函数的以下实现:

Unfortunately, this information is absolutely imprecise. Consider the following implementation of the Fibonacci function:

unsigned fib(unsigned n) {
    if (n < 2)
        return n;

    unsigned f = fib(n - 1) + fib(n - 2);
    return f;
}

我想在生成的LLVM IR中找到与赋值 unsigned f = ... 相对应的单个LLVM指令.我对右侧的所有计算都不感兴趣.生成的LLVM块(包括相关的调试元数据)为:

I would like to locate the single LLVM instruction corresponding to the assignment unsigned f = ... in the resulting LLVM IR. I am not interested in all the calculations of the right-hand side. The generated LLVM block including relevant debug metadata is:

[...]

if.end:                                           ; preds = %entry
  call void @llvm.dbg.declare(metadata !{i32* %f}, metadata !17), !dbg !18
  %2 = load i32* %n.addr, align 4, !dbg !19
  %sub = sub i32 %2, 1, !dbg !19
  %call = call i32 @fib(i32 %sub), !dbg !19
  %3 = load i32* %n.addr, align 4, !dbg !20
  %sub1 = sub i32 %3, 2, !dbg !20
  %call2 = call i32 @fib(i32 %sub1), !dbg !20
  %add = add i32 %call, %call2, !dbg !20
  store i32 %add, i32* %f, align 4, !dbg !20
  %4 = load i32* %f, align 4, !dbg !21
  store i32 %4, i32* %retval, !dbg !21
  br label %return, !dbg !21

[...]

!17 = metadata !{i32 786688, metadata !4, metadata !"f", metadata !5, i32 5, metadata !8, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [f] [line 5]
!18 = metadata !{i32 5, i32 11, metadata !4, null}
!19 = metadata !{i32 5, i32 15, metadata !4, null}
!20 = metadata !{i32 5, i32 28, metadata !4, null}
!21 = metadata !{i32 6, i32 2, metadata !4, null}
!22 = metadata !{i32 7, i32 1, metadata !4, null}

如您所见, store 指令的元数据!dbg!20 指向第5行第28列,即调用到 fib(n-2).更糟糕的是,加法操作和减法 n-2 都还指向该函数调用,由!dbg!20 标识.

As you can see, the metadata !dbg !20 of the store instruction points to line 5 column 28, which is the call to fib(n - 2). Even worse, the add operation and the subtraction n - 2 both also point to that function call, identified by !dbg !20.

有趣的是,由 clang -Xclang -ast-dump -fsyntax-only 发出的Clang AST具有所有这些信息.因此,我怀疑它在代码生成阶段会以某种方式丢失.似乎在代码生成过程中,Clang到达了某个内部序列点并将所有以下指令关联到该位置,直到出现下一个序列点(例如函数调用)为止.为了完整起见,这是AST中的声明语句:

Interestingly, the Clang AST emitted by clang -Xclang -ast-dump -fsyntax-only has all that information. Thus, I suspect that it is somehow lost during the code generation phase. It seems that during code generation Clang reaches some internal sequence point and associates all following instructions to that position until the next sequence point (e.g. function call) occurs. For completeness, here is the declaration statement in the AST:

|-DeclStmt 0x7ffec3869f48 <line:5:2, col:38>
| `-VarDecl 0x7ffec382d680 <col:2, col:37> col:11 used f 'unsigned int' cinit
|   `-BinaryOperator 0x7ffec3869f20 <col:15, col:37> 'unsigned int' '+'
|     |-CallExpr 0x7ffec382d7e0 <col:15, col:24> 'unsigned int'
|     | |-ImplicitCastExpr 0x7ffec382d7c8 <col:15> 'unsigned int (*)(unsigned int)' <FunctionToPointerDecay>
|     | | `-DeclRefExpr 0x7ffec382d6d8 <col:15> 'unsigned int (unsigned int)' Function 0x7ffec382d490 'fib' 'unsigned int (unsigned int)'
|     | `-BinaryOperator 0x7ffec382d778 <col:19, col:23> 'unsigned int' '-'
|     |   |-ImplicitCastExpr 0x7ffec382d748 <col:19> 'unsigned int' <LValueToRValue>
|     |   | `-DeclRefExpr 0x7ffec382d700 <col:19> 'unsigned int' lvalue ParmVar 0x7ffec382d3d0 'n' 'unsigned int'
|     |   `-ImplicitCastExpr 0x7ffec382d760 <col:23> 'unsigned int' <IntegralCast>
|     |     `-IntegerLiteral 0x7ffec382d728 <col:23> 'int' 1
|     `-CallExpr 0x7ffec3869ef0 <col:28, col:37> 'unsigned int'
|       |-ImplicitCastExpr 0x7ffec3869ed8 <col:28> 'unsigned int (*)(unsigned int)' <FunctionToPointerDecay>
|       | `-DeclRefExpr 0x7ffec3869e10 <col:28> 'unsigned int (unsigned int)' Function 0x7ffec382d490 'fib' 'unsigned int (unsigned int)'
|       `-BinaryOperator 0x7ffec3869eb0 <col:32, col:36> 'unsigned int' '-'
|         |-ImplicitCastExpr 0x7ffec3869e80 <col:32> 'unsigned int' <LValueToRValue>
|         | `-DeclRefExpr 0x7ffec3869e38 <col:32> 'unsigned int' lvalue ParmVar 0x7ffec382d3d0 'n' 'unsigned int'
|         `-ImplicitCastExpr 0x7ffec3869e98 <col:36> 'unsigned int' <IntegralCast>
|           `-IntegerLiteral 0x7ffec3869e60 <col:36> 'int' 2

是否可以提高调试元数据的准确性,或者以其他方式解决相应的指令?理想情况下,我希望保持Clang不变,即不对其进行修改和重新编译.

Is it either possible to improve the accuracy of the debug metadata, or resolve the corresponding instruction in a different way? Ideally, I would like to leave Clang untouched, i.e. not modify and recompile it.

推荐答案

结果证明,该问题已通过 Apple LLVM 6.1.0版本(clang-602.0.49)(基于LLVM 3.6.0svn).下载预构建的二进制文件后,生成的LLVM IR现在看起来像这样:

Turns out, this has been fixed with the introduction of MDLocation in LLVM release 3.6.0. At the time of writing, the current clang compiler shipped with Xcode Command Line Tools still generates the former "buggy" location information, even though it's version string says Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn). After downloading the pre-built binary, the generated LLVM IR now looks like this:

[...]

; <label>:7                                       ; preds = %0
  call void @llvm.dbg.declare(metadata i32* %f, metadata !21, metadata !14), !dbg !22
  %8 = load i32* %2, align 4, !dbg !23
  %9 = sub i32 %8, 1, !dbg !23
  %10 = call i32 @fib(i32 %9), !dbg !24
  %11 = load i32* %2, align 4, !dbg !25
  %12 = sub i32 %11, 2, !dbg !25
  %13 = call i32 @fib(i32 %12), !dbg !26
  %14 = add i32 %10, %13, !dbg !24
  store i32 %14, i32* %f, align 4, !dbg !22
  %15 = load i32* %f, align 4, !dbg !27
  store i32 %15, i32* %1, !dbg !28
  br label %16, !dbg !28


[...]

!22 = !MDLocation(line: 5, column: 14, scope: !4)
!23 = !MDLocation(line: 5, column: 22, scope: !4)
!24 = !MDLocation(line: 5, column: 18, scope: !4)
!25 = !MDLocation(line: 5, column: 35, scope: !4)
!26 = !MDLocation(line: 5, column: 31, scope: !4)
!27 = !MDLocation(line: 6, column: 12, scope: !4)
!28 = !MDLocation(line: 6, column: 5, scope: !4)

位置元数据始终指向表达式的开头.例如,对于分配,这是第5行第14列中的左侧说明符 f .如!dbg!24 所示,不幸的是,这可能仍然是模棱两可的.

The location metadata always points to the beginning of an expression. For the assignment, for instance, this is the left hand side specifier f at line 5 column 14. As seen in !dbg !24, this might still be ambiguous, unfortunately.

还有另一处更改:如果指令中未附加调试元数据,则对 getLine() getColumn()的访问将失败. DebugLoc 类提供了一种方便的方法来检查此内容:

There has been one more change: Access to getLine() and getColumn() will fail if no debug metadata is attached to the instruction. The DebugLoc class offers a convenient way to check this:

const DebugLoc &location = instruction->getDebugLoc();
if (location) {
    // location.getLine()
    // location.getCol()
} else {
    // No location metadata available
}

这篇关于从LLVM IR获取精确的行/列调试信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆