编译器诡计? [英] Compiler Trick?

查看:70
本文介绍了编译器诡计?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到一个非常奇怪的代码由C#编译器生成

,希望有人能够对它有所了解。我把这个例子减少到了最低限度,我正在使用.NET 1.1框架。

这是C#代码:


使用系统;


命名空间示例

{

class SwitchTest

{

static void Main(string [] args)

{

switch(args [0])

{

案例A:

休息;


案例" B":

休息;

}

}

}

}


这是IL生成的(调试和发布几乎相同):


..method private hidebysig static void Main(string [] args)cil managed

{

.entrypoint

//代码大小43(0x2b)

.maxstack 2

.locals init(字符串V_0)

IL_0000:ldstr" A"

IL_0005:ldstr" B"

IL_000a:leave.s IL_000c

IL_000c:ldarg.0

IL_000d:ldc.i4.0

IL_000e:ldelem.ref

IL_000f:dup

IL_0010:stloc.0

IL_0011:brfalse.s IL_002a

IL_0013:ldloc.0

IL_0014:调用字​​符串[mscorlib] System.String :: IsInterned(string)

IL_0019:stloc.0

IL_001a:ldloc.0

IL_001b:ldstr" A"

IL_0020:beq.s IL_002a

IL_0022:ldloc.0

IL_0023:ldstr" B"

IL_0028:beq.s IL_002a

IL_002a:ret

} //方法结束SwitchTest :: Main

问题:案例陈述的两个ldstr的目的是什么

值? LEAVE指令清除了eval堆栈,因此它们在其他地方不被使用

。我一开始以为它可能与

Interned字符串缓存有关,但这些都是常量,所以应该已经在

缓存中了。


有什么想法吗?


Brian
br ********* @ yahoo.com

解决方案

" Brian Tyler" <峰; br ********* @ yahoo.com>写在

news:ur ************* @ TK2MSFTNGP12.phx.gbl:

我见过很多C#
编译器生成的一段奇怪的代码,希望有人能够对它有所了解。我已经将示例缩减到最低限度,我正在使用
.NET 1.1框架。这是C#代码:

使用System;

命名空间示例
{class> SwitchTest
{
static void Main (string [] args)
{
switch(args [0])
{
案例A":
break;
案例B:
休息;
}
}
}
}

这就是IL生成的(几乎同样适用于调试和
发布):

.method private hidebysig static void Main(string [] args)cil managed
{
.entrypoint
//代码大小43(0x2b)
。maxstack 2
.locals init(字符串V_0)
IL_0000:ldstr" A"
IL_0005:ldstr" B"
IL_000a:leave.s IL_000c
IL_000c:ldarg.0
IL_000d:ldc.i4.0
IL_000e:ldelem.ref
IL_000f:dup
IL_0010: stloc.0
IL_0011:brfalse.s IL_002a
IL_0013:ldloc.0
IL_0014:调用字​​符串
[mscorlib] System.String :: IsInterned(string)IL_0019:stloc.0
IL_001b:ldstr" A"
IL_0020:beq.s IL_002a
IL_0022:ldloc.0
IL_0023:ldstr" B"
IL_0028:beq.s IL_002a
IL_002a:ret
} //方法结束SwitchTest :: Main

问题:两个ldstr的用途是什么?声明
价值观? LEAVE指令清除了eval堆栈,因此它们不会在其他地方使用。我一开始认为它可能与Interned字符串缓存有关,但这些都是常量,所以应该已经在缓存中了。

任何想法?

Brian
br ********* @ yahoo.com


你好布莱恩,

我会尝试逐行解释这个问题(主要不适合你,因为你似乎是

知道IL,但是对于这个新闻组的其他读者):

IL_0000:ldstrA<
IL_0005:ldstr" B"


A和B现在加载在执行堆栈上

[A] [b]< ---堆栈,先进入。


操作码:

0x72 +令牌 - > 1字节+令牌:字符串池中的4字节索引

0x72 +令牌

"令牌是用户定义字符串的标记,其RID部分实际为

#US blob流中的偏移量


这用于查看将在交换机上测试的所有值

声明符合开关类型。如果其中一个不是有效的

unicode字符串,则ldstr将失败并抛出异常。

这比调用某些代码要有效得多检查类型。这个

例外将是handeld internaly在框架中作为第一次机会

异常并且不会传播给用户。


总计:10个字节

IL_000a:leave.s IL_000c
清除堆栈并分支x个字节,在这种情况下,lable用于

计算分支步骤


离开指令或其短参数形式用于退出

保护块(try块)或异常处理程序块。但是,你不能使用

这个指令来退出过滤器,最后还是故障块。

所以这个leave.s will(如果没有第一次机会的话)

强制执行故障阻止的例外情况。现在我们确定

只是案例标签的有效类型。


[]


操作码:0xDD + 1字节偏移量 - > 2个字节


总计:12个字节

IL_000c:ldarg.0
加载第一个参数,那是'数组引用


[数组引用]


操作码:0x02 - > 1个字节


总计:13个字节

IL_000d:ldc.i4.0
加载常数0为​​int32


[数组引用] [0]


操作码:0x16 - > 1个字节


总计:14个字节

IL_000e:ldelem.ref
从引用的数组加载索引项目


[item]


操作码:0x9A - > 1个字节


总计:15bytes


IL_000f:dup
复制堆栈中的项目


[item] [item]


操作码:0x25 - > 1byte


总计:16bytes

IL_0010:stloc.0
将最顶层的商品存储到第一个本地var


[item]


操作码:0x0A - > 1byte


总计:17字节


IL_0011:brfalse.s IL_002a
branche如果堆栈上的项目(引用字符串)是零。


因此:如果该项目不是字符串,它将分支到

代码中加载字符串的部分


[]


操作码:0x2C + 1字节 - > 2byte


总计:19bytes

IL_0013:ldloc.0
如果有有效的参考,则从本地加载参考

(您可能认为这是多余的,但这是必要的:只需要考虑执行堆栈中的
:分支后会有1个项目如果这个

之前没有被移除到本地变量之前)


[item]


操作码:0x06 - > 1个字节


总计:20个字节

IL_0014:调用字​​符串
[mscorlib] System.String :: IsInterned(string)


调用Method IsInterned。

公共语言运行库自动维护一个名为

intern pool的表,其中包含一个在程序中声明的每个唯一文字的实例

字符串常量,以及以编程方式添加的任何唯一的实例

字符串。


实习池保存字符串存储空间。如果为几个变量分配一个文字字符串

常量,则每个变量设置为在实习池中引用相同的

常量,而不是引用几个不同的

具有相同值的String实例。


此方法在实习池中查找字符串。如果字符串已经被实习,则返回对该实例的引用;否则,返回

null引用。


[字符串引用]


操作码:0x28 +令牌 - - > 5个字节


总计:25个字节

IL_0019:stloc.0
存储对局部变量的引用(供以后重用)。 dup不能使用
,因为执行堆栈必须也是有效的,如果可以的话,那就是
分支!


[ ]


操作码:0x0A - > 1byte


总计:26个字节

IL_001a:ldloc.0
加载它进行比较


[字符串参考]


操作码:0x06 - > 1个字节


总计:27个字节

IL_001b:ldstr" A"
比较负载值


[字符串参考] [A]


操作码:0x72 +令牌 - > 5个字节


总计:32个字节

IL_0020:beq.s IL_002a
分支如果等于短版本

将当找到合适的字符串时分支返回。


[]


操作码:0x2E +偏移量 - > 2个字节


总计:34个字节

IL_0022:ldloc.0
加载它进行比较


[字符串参考]


操作码:0x06 - > 1个字节


总计:35个字节

IL_0023:ldstr" B"
比较负载值


[字符串引用] [b]


操作码:0x72 +令牌 - > 5个字节


总计:40个字节

IL_0028:beq.s IL_002a
分支如果等于短版本

将当找到合适的字符串时分支返回。


[]


操作码:0x2E +偏移量 - > 2个字节


总计:42个字节

IL_002a:ret



将从方法返回将最后一项传递给堆栈(如果有
一个)作为返回值。堆栈是空的 - >没有回报价值 - >好吧,这是一个

无效方法


操作码:0x2A - > 1个字节


总计:43个字节

希望这会有所帮助。


我是极客我甚至用原始x86编写COM对象编码

汇编程序!

(你有没有看过一个完全正常工作且有效的DirectX Startupcode

2.272字节可执行文件大小?);-)

Greets

Peter


-

------ ooo --- OOO --- ooo ------


Peter Koen - www.kema.at

MCAD CAI / RS CASE / RS IAT


------ ooo --- OOO --- ooo ------


Brian Tyler< br ***** ****@yahoo.com>写道:


< snip>

问题:两个ldstr对于case语句的价值是什么?
值? LEAVE指令清除了eval堆栈,因此它们在其他地方不被使用。我一开始以为它可能与
Interned字符串缓存有关,但这些都是常量,所以应该已经在缓存中了。

任何想法?




它们用于测试引用相等性。前两个(在方法的

开始时)确保它们都被实习(这样

IsInterned会做正确的事)和最后两个只是测试

以获得参考相等。


这意味着带字符串的开关/盒子永远不必完整.Equals

调用每个案例 - 它只需要查明字符串值

是否已被实习,如果是,则比较实际版本的

字符串的引用代码中列出的案例。


-

Jon Skeet - < sk *** @ pobox.com>
http://www.pobox.com/~skeet

如果回复该组,请不要给我发邮件


Jon Skeet [C#MVP]< sk *** @ pobox.com>写在

新闻:MP ************************ @ msnews.microsoft.c om:
< blockquote class =post_quotes> Brian Tyler< br ********* @ yahoo.com>写道:

< snip>

问题:两个ldstr对于case语句的价值是什么?
值? LEAVE指令清除了eval堆栈,因此它们不会在其他地方使用。我一开始以为它可能与Interned字符串缓存有关,但这些都是
常量,所以应该已经在缓存中了。

任何想法?



它们用于测试引用相等性。前两个(在方法开始时)确保它们都被实习(因此IsInterned会做正确的事)并且最后两个只是测试
引用相等。

这意味着带字符串的开关/案例永远不必完整.Equals
调用每个案例 - 它只需要找出字符串值是否
已被实习,如果是这样,请比较
字符串'对代码中列出的案例的引用版本。




Jon,


你从哪里获得这些信息?我不认为这是

真的正确。


你说的是执行3行的2次。但他们只被称为

一次。你没有注意到所有其他分支机构都要前往

ret状态吗?


一个声明怎么能只做一个类型安全的负载和一个离开一个

守卫区块表现平等?不可能......编译器必须

知道这些结构并假设算法用户可能已经想到了...... b $ b不可能!


并且没有必要制作确定他们都是实习生。他们肯定会在这个地方实行



对不起Jon,但我认为你在这里完全错了。如果你

确定你是对的,那么PLZ。告诉我你在哪里有你的信息

关于这个...


greets

Peter

-

------ ooo --- OOO --- ooo ------


Peter Koen - www.kema.at

MCAD CAI / RS CASE / RS IAT

------ ooo --- OOO --- ooo ------


I have seen a very strange piece of code being generated by the C# compiler
and was hoping someone might be able to shed some light on it. I''ve reduced
the example down to a bare minimum and I am using the .NET 1.1 framework.
Here is the C# code:

using System;

namespace Example
{
class SwitchTest
{
static void Main(string[] args)
{
switch(args[0])
{
case "A":
break;

case "B":
break;
}
}
}
}

And this is the IL generated (pretty much the same for debug and release):

..method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 43 (0x2b)
.maxstack 2
.locals init (string V_0)
IL_0000: ldstr "A"
IL_0005: ldstr "B"
IL_000a: leave.s IL_000c
IL_000c: ldarg.0
IL_000d: ldc.i4.0
IL_000e: ldelem.ref
IL_000f: dup
IL_0010: stloc.0
IL_0011: brfalse.s IL_002a
IL_0013: ldloc.0
IL_0014: call string [mscorlib]System.String::IsInterned(string)
IL_0019: stloc.0
IL_001a: ldloc.0
IL_001b: ldstr "A"
IL_0020: beq.s IL_002a
IL_0022: ldloc.0
IL_0023: ldstr "B"
IL_0028: beq.s IL_002a
IL_002a: ret
} // end of method SwitchTest::Main
QUESTION: What is the purpose of the two ldstr for the case statement
values? The LEAVE instruction clears out the eval stack so they aren''t used
elsewhere. I thought at first that it might have something to do with the
Interned string cache, but these are constants and so should already be in
the cache.

Any ideas?

Brian
br*********@yahoo.com

解决方案

"Brian Tyler" <br*********@yahoo.com> wrote in
news:ur*************@TK2MSFTNGP12.phx.gbl:

I have seen a very strange piece of code being generated by the C#
compiler and was hoping someone might be able to shed some light on
it. I''ve reduced the example down to a bare minimum and I am using the
.NET 1.1 framework. Here is the C# code:

using System;

namespace Example
{
class SwitchTest
{
static void Main(string[] args)
{
switch(args[0])
{
case "A":
break;

case "B":
break;
}
}
}
}

And this is the IL generated (pretty much the same for debug and
release):

.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 43 (0x2b)
.maxstack 2
.locals init (string V_0)
IL_0000: ldstr "A"
IL_0005: ldstr "B"
IL_000a: leave.s IL_000c
IL_000c: ldarg.0
IL_000d: ldc.i4.0
IL_000e: ldelem.ref
IL_000f: dup
IL_0010: stloc.0
IL_0011: brfalse.s IL_002a
IL_0013: ldloc.0
IL_0014: call string
[mscorlib]System.String::IsInterned(string) IL_0019: stloc.0
IL_001a: ldloc.0
IL_001b: ldstr "A"
IL_0020: beq.s IL_002a
IL_0022: ldloc.0
IL_0023: ldstr "B"
IL_0028: beq.s IL_002a
IL_002a: ret
} // end of method SwitchTest::Main
QUESTION: What is the purpose of the two ldstr for the case statement
values? The LEAVE instruction clears out the eval stack so they aren''t
used elsewhere. I thought at first that it might have something to do
with the Interned string cache, but these are constants and so should
already be in the cache.

Any ideas?

Brian
br*********@yahoo.com

Hello Brian,
I''ll try to explain this line by line (mainly not for you, since you seem
to know IL, but for the other readers of this newsgroup):
IL_0000: ldstr "A"
IL_0005: ldstr "B"
A and B are now loaded on the execution stack
[A][b] <--- stack, first in is left.

opcodes:
0x72 + token --> 1 byte + token: 4 byte index into the string pool
0x72 + token
"token is a token of a user-defined string, whose RID portion is actually
an offset in the #US blob stream

This is used to see if all values that will be tested on the switch
statement comply with the switch type. if one of them wouldn''t be a valid
unicode string the ldstr would fail and an exception would be thrown.
this is far more effective than calling some code to check the type. this
exception will be handeld internaly in the framework as a first-chance
exception and will not be propagated to the user.

total: 10 bytes
IL_000a: leave.s IL_000c clear the stack and branch x bytes, in this case a lable is used for
calculating the branch step

The leave instruction, or its short-parameter form, is used to exit a
guarded block (a try block) or an exception handler block. You cannot use
this instruction, however, to exit a filter, finally, or fault block.
So this leave.s will (if there has not been a first chance exception that
imposed a fault block) step out of the try. now we are sure that there
are only valid types for the case labels.

[]

opcode: 0xDD + 1 byte offset --> 2 bytes

total: 12 bytes
IL_000c: ldarg.0 load first argument, that''s the array reference

[array reference]

opcode: 0x02 --> 1 byte

total: 13 bytes
IL_000d: ldc.i4.0 load constant 0 as int32

[array reference][0]

opcode: 0x16 --> 1 byte

total: 14 bytes
IL_000e: ldelem.ref load the indexed item from the referenced array

[item]

opcode: 0x9A --> 1 byte

total: 15bytes

IL_000f: dup duplicates the item on the stack

[item][item]

opcode: 0x25 --> 1byte

total: 16bytes
IL_0010: stloc.0 stores topmost item into first local var

[item]

opcode: 0x0A --> 1byte

total: 17 bytes

IL_0011: brfalse.s IL_002a branche if item on stack (reference to a string ) is zero.

therefore: if the item wasn''t a string it branches to the part of the
code where the strings get loaded

[]

opcode: 0x2C + 1 byte --> 2byte

total: 19bytes
IL_0013: ldloc.0 load the reference back from the local if there was a valid reference
(you might think this is redundant, but it''s necessary: just think about
the execution stack: there would be 1 more item after the branch if this
one hadn''t been removed before to the local variable)

[item]

opcode: 0x06 --> 1 byte

total: 20 bytes
IL_0014: call string
[mscorlib]System.String::IsInterned(string)
call on Method IsInterned.
The common language runtime automatically maintains a table, called the
"intern pool", which contains a single instance of each unique literal
string constant declared in a program, as well as any unique instance of
String you add programmatically.

The intern pool conserves string storage. If you assign a literal string
constant to several variables, each variable is set to reference the same
constant in the intern pool instead of referencing several different
instances of String that have identical values.

This method looks up string in the intern pool. If string has already
been interned, a reference to that instance is returned; otherwise, a
null reference is returned.

[string reference]

opcode: 0x28 + token --> 5 bytes

total: 25 bytes
IL_0019: stloc.0 store the reference to the local variable (for later reuse). dup can''t be
used because execution stack has to be also valid if the could would
branch!

[]

opcode: 0x0A --> 1byte

total: 26 bytes
IL_001a: ldloc.0 load it back for comparing

[string reference]

opcode: 0x06 --> 1 byte

total: 27 bytes
IL_001b: ldstr "A" load value for compare

[string reference][A]

opcode: 0x72 + token --> 5 bytes

total: 32 bytes
IL_0020: beq.s IL_002a branch if equal short version
will branch to ret when the suitable string was found.

[]

opcode: 0x2E + offset -> 2 bytes

total: 34 bytes
IL_0022: ldloc.0 load it back for comparing

[string reference]

opcode: 0x06 --> 1 byte

total: 35 bytes
IL_0023: ldstr "B" load value for compare

[string reference][b]

opcode: 0x72 + token --> 5 bytes

total: 40 bytes
IL_0028: beq.s IL_002a branch if equal short version
will branch to ret when the suitable string was found.

[]

opcode: 0x2E + offset -> 2 bytes

total: 42 bytes

IL_002a: ret


will return from the method passing the last item on stack (if there is
one) as return value. stack is empty -> no return value -> ok, it''s a
void method

opcode: 0x2A --> 1 byte

TOTAL: 43 bytes
Hope this helps.

And yes I''m a geek and I''m even coding my COM objects in raw x86
assembler!
(Have you ever seen a fully working and valid DirectX Startupcode in just
2.272 bytes executable size?) ;-)

Greets
Peter

--
------ooo---OOO---ooo------

Peter Koen - www.kema.at
MCAD CAI/RS CASE/RS IAT

------ooo---OOO---ooo------


Brian Tyler <br*********@yahoo.com> wrote:

<snip>

QUESTION: What is the purpose of the two ldstr for the case statement
values? The LEAVE instruction clears out the eval stack so they aren''t used
elsewhere. I thought at first that it might have something to do with the
Interned string cache, but these are constants and so should already be in
the cache.

Any ideas?



They''re used for testing reference equality. The first two (at the
start of the method) make sure that they''re both interned (so that
IsInterned will do the right thing) and the last two are just testing
for reference equality.

It means that switch/case with strings never has to do a full .Equals
call on every case - it only has to find out whether the string value
has been interned, and if so compare the interned version of the
string''s reference to the cases listed in the code.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too


Jon Skeet [C# MVP] <sk***@pobox.com> wrote in
news:MP************************@msnews.microsoft.c om:

Brian Tyler <br*********@yahoo.com> wrote:

<snip>

QUESTION: What is the purpose of the two ldstr for the case statement
values? The LEAVE instruction clears out the eval stack so they
aren''t used elsewhere. I thought at first that it might have
something to do with the Interned string cache, but these are
constants and so should already be in the cache.

Any ideas?



They''re used for testing reference equality. The first two (at the
start of the method) make sure that they''re both interned (so that
IsInterned will do the right thing) and the last two are just testing
for reference equality.

It means that switch/case with strings never has to do a full .Equals
call on every case - it only has to find out whether the string value
has been interned, and if so compare the interned version of the
string''s reference to the cases listed in the code.



Jon,

where did you get this information from? I don''t think that this is
really correct.

You are talking about 2 times executing that 3 lines. but they are called
just once. didn''t you notice that all other branches are heading for the
ret statment?

And how can a statement that does just a typesafe load and a leave of a
guarded block perform an equal? Can''t be... the compiler would have to
"know" these structure and make asumptions about algorithms the user
might have had in mind... Can''t be!

And there is no need to make sure that they''re both interned. They will
be interned for sure at this place.

Sorry Jon, but in my humble opinion you are completly wrong here. If you
are sure that you are right then plz. show me where you got your infos
about this...

greets
Peter
--
------ooo---OOO---ooo------

Peter Koen - www.kema.at
MCAD CAI/RS CASE/RS IAT

------ooo---OOO---ooo------


这篇关于编译器诡计?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆