有没有一种聪明的方法来确定Java字节码指令的长度? [英] Is there a clever way to determine the length of Java bytecode instructions?

查看:92
本文介绍了有没有一种聪明的方法来确定Java字节码指令的长度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为Java创建一个静态分析工具,如果我可以从 .class中的字节码中获取它,那么有一些关于我正在分析的程序的信息会更容易获得。 code> files。

I'm creating a static analysis tool for Java, and there's some information about the programs I'm analyzing that will be easier to get if I can get it from the bytecode in .class files.

我不关心每一个指令。例如,我可能只需要查看是否有任何 getfield 指令。

I don't care about every single one the instructions that might be in the class file. E.g., I might only need to see if there are any getfield instructions.

问题在于,由于每条指令都有一个可变长度,似乎在一般情况下,我需要(在我的代码中)指定每个操作码的长度之前我可以确定(例如) getfield 指令的开始和结束位置。

The problem is that since each instruction has a variable length, it seems that in the general case, I need to (in my code) specify the length of every single opcode before I can determine where the (e.g.) getfield instructions start and end.

对于其他一些指令集(如 x86 ),有一些规则,如低于0x0F的任何操作码都是1字节,任何等于或大于0x0F的操作都是两个字节。

For some other instruction sets (like x86), there are rules like "any opcode below 0x0F is 1 byte, anything equal to or greater than 0x0F is two bytes."

Java字节码指令中是否有这样的方便模式?

Is there any convenient pattern like this in the Java bytecode instructions?

推荐答案

如果您尝试将指令操作码映射到指令大小,您将得到以下令人沮丧的表:

If you try to map instruction op codes to instruction sizes, you’ll get the following discouraging table:

0 - 15       1 bytes
16           2 bytes
17           3 bytes
18           2 bytes
19 - 20      3 bytes
21 - 25      2 bytes
26 - 53      1 bytes
54 - 58      2 bytes
59 - 131     1 bytes
132          3 bytes
133 - 152    1 bytes
153 - 168    3 bytes
169          2 bytes
170 - 171    special handling
172 - 177    1 bytes
178 - 184    3 bytes
185 - 186    5 bytes
187          3 bytes
188          2 bytes
189          3 bytes
190 - 191    1 bytes
192 - 193    3 bytes
194 - 195    1 bytes
196          special handling
197          4 bytes
198 - 199    3 bytes
200 - 201    5 bytes

换句话说,有没有大小信息在指令的数值中编码,也没有其位模式,但是还有另一个属性,你可以考虑某种模式:在~200个定义的指令中,大约150条指令的大小只有一个字节,只留下〜 50条指令,完全需要处理。即使这一小组指令也可以进一步细分为逻辑组,大多数占用三个字节,第二大组取两个字节。

In other words, there is no size information encoded in the instruction’s numeric value nor its bit pattern, but there is another property, which you can consider some sort of pattern: out of the ~200 defined instructions, roughly 150 instructions have the size of one byte, leaving only ~50 instructions which require any handling at all. Even this small group of instructions can be subdivided further into logical groups, the majority taking three bytes, the second biggest group taking two bytes.

所以方法的代码冲通过说明可能看起来像:

So the code of a method rushing through the instructions may look like:

static void readByteCode(ByteBuffer bb) {
    while(bb.hasRemaining()) {
        switch(bb.get()&0xff) {
            case BIPUSH: // one byte embedded constant
            case LDC:    // one byte embedded constant pool index
            // follow-up: one byte embedded local variable index
            case ILOAD:  case LLOAD:  case FLOAD:  case DLOAD:  case ALOAD:
            case ISTORE: case LSTORE: case FSTORE: case DSTORE: case ASTORE: case RET:
            case NEWARRAY: // one byte embedded array type
                bb.get();
                break;

            case IINC: // one byte local variable index, another one for the constant
            case SIPUSH: // two bytes embedded constant
            case LDC_W: case LDC2_W: // two bytes embedded constant pool index
            // follow-up: two bytes embedded branch offset
            case IFEQ: case IFNE: case IFLT: case IFGE: case IFGT: case IFLE:
            case IF_ICMPEQ: case IF_ICMPNE: case IF_ICMPLT: case IF_ICMPGE:
            case IF_ICMPGT: case IF_ICMPLE: case IF_ACMPEQ: case IF_ACMPNE:
            case GOTO: case JSR: case IFNULL: case IFNONNULL:
            // follow-up: two bytes embedded constant pool index to member or type
            case GETSTATIC: case PUTSTATIC: case GETFIELD: case PUTFIELD:
            case INVOKEVIRTUAL: case INVOKESPECIAL: case INVOKESTATIC: case NEW:
            case ANEWARRAY: case CHECKCAST: case INSTANCEOF:
                bb.getShort();
                break;

            case MULTIANEWARRAY:// two bytes pool index, one byte dimension
                bb.getShort();
                bb.get();
                break;

            // follow-up: two bytes embedded constant pool index to member, two reserved
            case INVOKEINTERFACE: case INVOKEDYNAMIC:
                bb.getShort();
                bb.getShort();
                break;

            case GOTO_W: case JSR_W:// four bytes embedded branch offset
                bb.getInt();
                break;

            case LOOKUPSWITCH:
                // special handling left as an exercise for the reader...
                break;
            case TABLESWITCH:
                // special handling left as an exercise for the reader...
                break;
            case WIDE:
                int widened=bb.get()&0xff;
                bb.getShort(); // local variable index
                if(widened==IINC) {
                    bb.getShort(); // constant offset value
                }
                break;
            default: // one of the ~150 instructions taking one byte
        }
    }
}

我有意将一些指令分开,后续字节数相同,但含义不同。毕竟,你想在某些地方插入一些实际的逻辑,我想。

I intentionally kept some of the instructions separated having the same number of follow-up bytes, but with a different meaning. After all, you want to insert some actual logic at certain places, I guess.

注意处理两个开关字节码指令被省略,它们需要填充,其实现需要知道缓冲区内的代码对齐,这是调用者的控制。这取决于您的具体应用。请参阅 <$的文档c $ c> lookupswitch tableswitch

Note that the handling of the two switch bytecode instructions is left out, they require padding whose implementation requires knowledge about the code alignment within the buffer, which is in control of the caller. So that’s up to your specific application. Refer to the documentation of lookupswitch and tableswitch.

当然,处理所有单字节指令为 default 意味着代码不会捕获未知或无效的指令。如果你想要安全,你必须插入案例......

Of course, handling of all single byte instructions as default implies that the code won’t catch unknown or invalid instructions. If you want safety, you’ll have to insert the cases…

这篇关于有没有一种聪明的方法来确定Java字节码指令的长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆