识别java字节代码中的循环 [英] Identify loops in java byte code

查看:124
本文介绍了识别java字节代码中的循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用java字节代码。

I am trying to instrument java byte code.

我想识别进入和退出java循环,但我有发现循环的识别非常具有挑战性。
我花了几个小时看着 ASM 开源反编译器(我认为他们必须一直解决这个问题)但是,我来得很短。

I want to recognize the entry and exit of a java loop, but I have found the identification of loops to be quite challenging. I have spent a good few hours looking at ASM and open source de-compilers (whom I thought must solve this problem all the time), however, I came up short.

我正在扩充/扩展的工具正在使用ASM,所以理想情况下我想知道如何检测不同循环结构的进入和退出java via ASM 。但是,我也欢迎建议一个好的开源解编译器,因为显然它们可以解决同样的问题。

The tool I am augmenting / extending, is using ASM, so ideally I would like to know how to instrument the entry and exit of the different loop constructs in java via ASM. However, I would also welcome a recommendation for a good open source de-compiler, as clearly they would have solved the same problem.

推荐答案

编辑4 :有点背景/序言。


  • 唯一在代码中向后跳转的方法是通过一个循环。在Peter的回答中并不严格。你可以在没有它的情况下来回跳跃意味着它是一个循环。简化的情况是这样的:

  • "The only way to jump backward in the code is via a loop." in Peter's answer isn't strictly true. You could jump back and forth without it meaning it's a loop. A simplified case would be something like this:

0: goto 2
1: goto 3
2: goto 1

当然,这个特殊的例子非常人为,有点傻。但是,假设源到字节码编译器将如何表现可能会导致意外。正如彼得和我在各自的答案中所表明的那样,两个流行的编译器可以产生相当不同的输出(即使没有混淆)。它很少重要,因为当您执行代码时,所有这些都会被JIT编译器很好地优化。
这就是说,在绝大多数情况下,向后跳跃将是循环开始位置的合理指示。与其余部分相比,找出循环的入口点是简单部分。

Of course, this particular example is very artificial and a bit silly. However, making assumptions as to how the source-to-bytecode compiler is going to behave could lead to surprises. As Peter and I have shown in our respective answers, two popular compilers can produce a rather different output (even without obfuscation). It rarely matters, because all of this tends to be optimised rather well by the JIT compiler when you execute the code. This being said, in the vast majority of cases, jumping backwards will be a reasonable indication as to where a loop starts. Compared with the rest, finding out the entry point of a loop is the "easy" part.

在考虑任何循环启动/退出检测之前,你应该看一下进入,退出和继承者的定义。虽然循环只有一个入口点,但它可能有多个出口点和/或多个后继点,通常由 break 语句(有时带有标签)引起, return 语句和/或异常(明确捕获与否)。虽然您没有提供有关您正在调查的仪器类型的详细信息,但是当然您还需要考虑插入代码的位置(如果这是您想要做的)。通常,某些检测可能必须在每个退出语句之前完成,或者代替每个后继语句(在这种情况下,您必须移动原始语句)。

Before considering any loop start/exit instrumentation, you should look into the definitions of what entry, exit and successors are. Although a loop will only have one entry point, it may have multiple exit points and/or multiple successors, typically caused by break statements (sometimes with labels), return statements and/or exceptions (explicitly caught or not). While you haven't given details regarding the kind of instrumentations you're investigating, it's certainly worth considering where you want to insert code (if that's what you want to do). Typically, some instrumentation may have to be done before each exit statement or instead of each successor statement (in which case you'll have to move the original statement).

Soot 是一个这样做的好框架。它有许多中间表示,使字节码分析更方便(例如Jimple)。

Soot is a good framework to do this. It has a number of intermediate representations that make bytecode analysis more convenient (e.g. Jimple).

你可以构建一个 BlockGraph ,例如 ExceptionalBlockGraph 。一旦您将控制流图分解为这样的块图,从节点中,您应该能够识别支配者(即具有箭头的块返回它们)。这将为您提供循环的开始。

You can build a BlockGraph based on your method body, for example an ExceptionalBlockGraph. Once you've decomposed the control flow graph into such a block graph, from the nodes, you should be able to identity the dominators (i.e. blocks that have an arrow coming back to them). This will give you the start of the loop.

您可以在本文的第4.3至4.7节

编辑:

在与@Peter的讨论中回答了他的回答。说同样的例子:

Following the discussion with @Peter in comments to his answer. Talking the same example:

public int foo(int i, int j) {
    while (true) {
        try {
            while (i < j)
                i = j++ / i;
        } catch (RuntimeException re) {
            i = 10;
            continue;
        }
        break;
    }
    return j;
}

这一次,使用Eclipse编译器编译(没有特定选项:只需自动编译在IDE内)。
这段代码没有被混淆(除了代码不好,但这是另一回事)。
这是结果(来自 javap -c ):

This time, compiled with the Eclipse compiler (no specific option: simply autocompilation from within the IDE). This code hasn't been obfuscated (apart from being bad code, but that's a different matter). Here is the result (from javap -c):

public int foo(int, int);
  Code:
   0:   goto    10
   3:   iload_2
   4:   iinc    2, 1
   7:   iload_1
   8:   idiv
   9:   istore_1
   10:  iload_1
   11:  iload_2
   12:  if_icmplt   3
   15:  goto    25
   18:  astore_3
   19:  bipush  10
   21:  istore_1
   22:  goto    10
   25:  iload_2
   26:  ireturn
  Exception table:
   from   to  target type
     0    15    18   Class java/lang/RuntimeException

有一个介于3到12之间的循环(从10开始跳转)和另一个循环循环,由于从8到22的除零发生异常。
javac 编译器结果不同,可以猜测是否存在0到22之间的外循环和0到12之间的内循环,这里的嵌套不太明显。

There is a loop between 3 and 12 (jumped in starting a 10) and another loop, due to the exception occurring from the division by zero at 8 to 22. Unlike the javac compiler result, where one could make as guess that there was an outer loop between 0 and 22 and an inner loop between 0 and 12, the nesting is less obvious here.

编辑2:

用一个不那么尴尬的例子来说明你可能遇到的问题。这是一个相对简单的循环:

To illustrate the kind of problems you may get with a less awkward example. Here is a relatively simple loop:

public void foo2() {
    for (int i = 0; i < 5; i++) {
        System.out.println(i);
    }
}

在Eclipse中进行(正常)编译后, javap -c 给出:

After (normal) compilation within Eclipse, javap -c gives this:

public void foo2();
  Code:
   0:   iconst_0
   1:   istore_1
   2:   goto    15
   5:   getstatic   #25; //Field java/lang/System.out:Ljava/io/PrintStream;
   8:   iload_1
   9:   invokevirtual   #31; //Method java/io/PrintStream.println:(I)V
   12:  iinc    1, 1
   15:  iload_1
   16:  iconst_5
   17:  if_icmplt   5
   20:  return

在循环中执行任何操作之前,直接从2跳到15。到17是循环的标题(入口点)。有时,标头块可能包含更多指令,特别是如果退出条件涉及更多评估,或者如果它是 do {} while()循环。
循环的入口和退出的概念可能并不总是反映出你作为Java源代码理解的内容(包括你可以为的事实) $ c>循环为,而循环,例如)。使用 break 也可以导致多个退出点。

Before doing anything within the loop, you jump straight from 2 to 15. Block 15 to 17 is the header of the loop (the "entry point"). Sometimes, the header block could contain far more instructions, especially if the exit condition involves more evaluation, or if it's a do {} while() loop. The concept of "entry" and "exit" of a loop may not always reflect what you'd write sensibly as Java source code (including the fact that you can rewrite for loops as while loops, for example). Using break can also lead to multiple exit points.

顺便说一句,阻止,我的意思是字节码的序列,你不能跳到其中,你不能跳到中间:它们只是从第一行输入(不一定是从前一行输入,可能是从其他地方跳转)并退出从最后一个(不一定到下一行,它也可以跳到其他地方)。

By the way, by "block", I mean a sequence of bytecode into which you can't jump and out of which you can't jump in the middle: they're only entered from the first line (not necessarily from the previous line, possibly from a jump from somewhere else) and exited from the last (not necessarily to the following line, it can jump somewhere else too).

编辑3:

自从我上次查看Soot以来,似乎已经添加了分析循环的新类/方法,这使得它更方便。

It seems that new classes/methods to analyse loops have been added since last time I had looked at Soot, which make it a bit more convenient.

这是一个完整的例子。

要分析的类/方法( TestLoop.foo()

The class/method to analyse (TestLoop.foo())

public class TestLoop {
    public void foo() {
        for (int j = 0; j < 2; j++) {
            for (int i = 0; i < 5; i++) {
                System.out.println(i);
            }
        }
    }
}

何时由Eclipse编译器编译,生成此字节码( javap -c ):

When compiled by the Eclipse compiler, this produces this bytecode (javap -c):

public void foo();
  Code:
   0:   iconst_0
   1:   istore_1
   2:   goto    28
   5:   iconst_0
   6:   istore_2
   7:   goto    20
   10:  getstatic   #25; //Field java/lang/System.out:Ljava/io/PrintStream;
   13:  iload_2
   14:  invokevirtual   #31; //Method java/io/PrintStream.println:(I)V
   17:  iinc    2, 1
   20:  iload_2
   21:  iconst_5
   22:  if_icmplt   10
   25:  iinc    1, 1
   28:  iload_1
   29:  iconst_2
   30:  if_icmplt   5
   33:  return

这是一个程序,使用Soot加载类(假设它在类路径上)并显示其块和循环:

Here is a program that loads the class (assuming it's on the classpath here) using Soot and displays its blocks and loops:

import soot.Body;
import soot.Scene;
import soot.SootClass;
import soot.SootMethod;
import soot.jimple.toolkits.annotation.logic.Loop;
import soot.toolkits.graph.Block;
import soot.toolkits.graph.BlockGraph;
import soot.toolkits.graph.ExceptionalBlockGraph;
import soot.toolkits.graph.LoopNestTree;

public class DisplayLoops {
    public static void main(String[] args) throws Exception {
        SootClass sootClass = Scene.v().loadClassAndSupport("TestLoop");
        sootClass.setApplicationClass();

        Body body = null;
        for (SootMethod method : sootClass.getMethods()) {
            if (method.getName().equals("foo")) {
                if (method.isConcrete()) {
                    body = method.retrieveActiveBody();
                    break;
                }
            }
        }

        System.out.println("**** Body ****");
        System.out.println(body);
        System.out.println();

        System.out.println("**** Blocks ****");
        BlockGraph blockGraph = new ExceptionalBlockGraph(body);
        for (Block block : blockGraph.getBlocks()) {
            System.out.println(block);
        }
        System.out.println();

        System.out.println("**** Loops ****");
        LoopNestTree loopNestTree = new LoopNestTree(body);
        for (Loop loop : loopNestTree) {
            System.out.println("Found a loop with head: " + loop.getHead());
        }
    }
}

查看Soot文档了解更多信息有关如何加载类的详细信息。 Body 是循环体的模型,即从字节码生成的所有语句。这使用了中间的Jimple表示,它相当于字节码,但更容易分析和处理。

Check the Soot documentation for more details on how to load classes. The Body is a model for the body of the loop, i.e. all the statements made from the bytecode. This uses the intermediate Jimple representation, which is equivalent to the bytecode, but easier to analyse and process.

以下是该程序的输出:

正文:

    public void foo()
    {
        TestLoop r0;
        int i0, i1;
        java.io.PrintStream $r1;

        r0 := @this: TestLoop;
        i0 = 0;
        goto label3;

     label0:
        i1 = 0;
        goto label2;

     label1:
        $r1 = <java.lang.System: java.io.PrintStream out>;
        virtualinvoke $r1.<java.io.PrintStream: void println(int)>(i1);
        i1 = i1 + 1;

     label2:
        if i1 < 5 goto label1;

        i0 = i0 + 1;

     label3:
        if i0 < 2 goto label0;

        return;
    }

区块:

Block 0:
[preds: ] [succs: 5 ]
r0 := @this: TestLoop;
i0 = 0;
goto [?= (branch)];

Block 1:
[preds: 5 ] [succs: 3 ]
i1 = 0;
goto [?= (branch)];

Block 2:
[preds: 3 ] [succs: 3 ]
$r1 = <java.lang.System: java.io.PrintStream out>;
virtualinvoke $r1.<java.io.PrintStream: void println(int)>(i1);
i1 = i1 + 1;

Block 3:
[preds: 1 2 ] [succs: 4 2 ]
if i1 < 5 goto $r1 = <java.lang.System: java.io.PrintStream out>;

Block 4:
[preds: 3 ] [succs: 5 ]
i0 = i0 + 1;

Block 5:
[preds: 0 4 ] [succs: 6 1 ]
if i0 < 2 goto i1 = 0;

Block 6:
[preds: 5 ] [succs: ]
return;

循环:

Found a loop with head: if i1 < 5 goto $r1 = <java.lang.System: java.io.PrintStream out>
Found a loop with head: if i0 < 2 goto i1 = 0

LoopNestTree 使用 LoopF​​inder ,它使用 ExceptionalBlockGraph 构建块列表。
Loop class将为您提供entry语句和exit语句。如果您愿意,您应该能够添加额外的声明。 Jimple非常方便(它与字节码足够接近,但有一个稍高的级别,以免手动处理所有内容)。然后,您可以根据需要输出修改后的 .class 文件。 (请参阅Soot文档。)

LoopNestTree uses LoopFinder, which uses an ExceptionalBlockGraph to build the list of blocks. The Loop class will give you the entry statement and the exit statements. You should then be able to add extra statements if you wish. Jimple is quite convenient for this (it's close enough to the bytecode, but has a slightly higher level so as not to deal with everything manually). You can then output your modified .class file if needed. (See the Soot documentation for this.)

这篇关于识别java字节代码中的循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆