编译器级别的C ++引用和指针 [英] C++ references and pointers at the compiler level
问题描述
我正在尝试学习C ++编译器如何处理引用和指针,为下学期准备的编译器类做准备.我对编译器如何处理C ++中的引用特别感兴趣.
I'm trying to learn how C++ compilers handle references and pointers, in preparation for a compiler class that I'm taking next semester. I'm specifically interested in how compilers handle references in C++.
该标准指定引用是别名",但是我不知道在编译器级别这到底意味着什么.我有两种理论:
The standard specifies that a reference is an "alias," but I don't know exactly what that means at the compiler level. I have two theories:
-
非引用变量在符号表中具有一个条目.创建对该变量的引用后,编译器仅创建另一个词素,它指向"符号表中完全相同的条目(而不是非引用变量在内存中的位置).
A non-reference variable has an entry in the symbol table. When a reference to that variable is created, the compiler simply creates another lexeme that "points" to the exact same entry in the symbol table (and not to the non-reference variable's location in memory).
创建对该变量的引用后,编译器会创建一个指向该变量在内存中位置的指针.解析语言上下文时,将处理对引用的限制(无null值等).换句话说,对于已取消引用的指针,引用是语法糖".
When a reference to that variable is created, the compiler creates a pointer to that variable's location in memory. The limitations on references (no null values, etc.) are handled when parsing the context of the language. In other words, a reference is "syntactic sugar" for a dereferenced pointer.
据我所知,这两种解决方案都会创建一个别名".编译器使用一个而不使用另一个吗?还是依赖于编译器?
Both solutions would create an "alias," as far as I can tell. Do compilers use one and not the other? Or is it compiler-dependent?
顺便说一句,我知道在机器语言级别上,它们都是指针"(除了整数以外,几乎所有其他东西在机器级别上都是指针").我对生成机器代码之前编译器的工作很感兴趣.
As an aside, I'm aware that at the machine-language level, both are "pointers" (pretty much everything other than an integer is a "pointer" at the machine level). I'm interested in what the compiler does before the machine code is generated.
我很好奇的部分原因是因为 PHP使用方法#1 ,我想知道C ++编译器是否以相同的方式工作. Java当然不使用方法#1,并且它们的引用"实际上是取消引用的指针.请参阅Scott Stanchfield的本文.
Part of the reason I am curious is because PHP uses method #1, and I'm wondering if C++ compilers work the same way. Java certainly does not use method #1, and their "references" are in fact dereferenced pointers; see this article by Scott Stanchfield.
推荐答案
我将尝试解释g ++编译器如何实现引用.
I will try to explain how references are implemented by g++ compiler.
#include <iostream>
using namespace std;
int main()
{
int i = 10;
int *ptrToI = &i;
int &refToI = i;
cout << "i = " << i << "\n";
cout << "&i = " << &i << "\n";
cout << "ptrToI = " << ptrToI << "\n";
cout << "*ptrToI = " << *ptrToI << "\n";
cout << "&ptrToI = " << &ptrToI << "\n";
cout << "refToNum = " << refToI << "\n";
//cout << "*refToNum = " << *refToI << "\n";
cout << "&refToNum = " << &refToI << "\n";
return 0;
}
此代码的输出是这样
i = 10
&i = 0xbf9e52f8
ptrToI = 0xbf9e52f8
*ptrToI = 10
&ptrToI = 0xbf9e52f4
refToNum = 10
&refToNum = 0xbf9e52f8
让我们看看反汇编(我为此使用了GDB.8,9和10是代码的行号)
Lets look at the disassembly(I used GDB for this. 8,9 and 10 here are line numbers of code)
8 int i = 10;
0x08048698 <main()+18>: movl $0xa,-0x10(%ebp)
这里$0xa
是我们分配给i
的10(十进制). -0x10(%ebp)
在这里表示ebp register
–16(十进制)的内容.
-0x10(%ebp)
指向堆栈上i
的地址.
Here $0xa
is the 10(decimal) that we are assigning to i
. -0x10(%ebp)
here means content of ebp register
–16(decimal).
-0x10(%ebp)
points to the address of i
on stack.
9 int *ptrToI = &i;
0x0804869f <main()+25>: lea -0x10(%ebp),%eax
0x080486a2 <main()+28>: mov %eax,-0x14(%ebp)
将i
的地址分配给ptrToI
. ptrToI
再次位于地址-0x14(%ebp)
的堆栈中,即ebp
– 20(十进制).
Assign address of i
to ptrToI
. ptrToI
is again on stack located at address -0x14(%ebp)
, that is ebp
– 20(decimal).
10 int &refToI = i;
0x080486a5 <main()+31>: lea -0x10(%ebp),%eax
0x080486a8 <main()+34>: mov %eax,-0xc(%ebp)
现在这是要抓住的地方!比较第9行和第10行的反汇编,您会发现第10行中的-0x14(%ebp)
被-0xc(%ebp)
替换,-0xc(%ebp)
是refToNum
的地址.它是在堆栈上分配的.但是您将永远无法从代码中获取该地址,因为您不需要知道该地址.
Now here is the catch! Compare disassembly of line 9 and 10 and you will observer that ,-0x14(%ebp)
is replaced by -0xc(%ebp)
in line number 10. -0xc(%ebp)
is the address of refToNum
. It is allocated on stack. But you will never be able to get this address from you code because you are not required to know the address.
所以;引用确实占用内存.在这种情况下,它是堆栈内存,因为我们已将其分配为局部变量. 它占用多少内存? 指针占用了很多.
So; a reference does occupy memory. In this case it is the stack memory since we have allocated it as a local variable. How much memory does it occupy? As much a pointer occupies.
现在让我们看看如何访问引用和指针.为简单起见,我仅显示了一部分汇编代码
Now lets see how we access the reference and pointers. For simplicity I have shown only part of the assembly snippet
16 cout << "*ptrToI = " << *ptrToI << "\n";
0x08048746 <main()+192>: mov -0x14(%ebp),%eax
0x08048749 <main()+195>: mov (%eax),%ebx
19 cout << "refToNum = " << refToI << "\n";
0x080487b0 <main()+298>: mov -0xc(%ebp),%eax
0x080487b3 <main()+301>: mov (%eax),%ebx
现在比较上面两行,您将看到惊人的相似性. -0xc(%ebp)
是refToI
的实际地址,您无法访问.
简单来说,如果您将引用视为普通指针,则访问引用就像在引用指向的地址处获取值.这意味着下面两行代码将为您提供相同的结果
Now compare the above two lines, you will see striking similarity. -0xc(%ebp)
is the actual address of refToI
which is never accessible to you.
In simple terms, if you think of reference as a normal pointer, then accessing a reference is like fetching the value at address pointed to by the reference. Which means the below two lines of code will give you the same result
cout << "Value if i = " << *ptrToI << "\n";
cout << " Value if i = " << refToI << "\n";
现在比较
15 cout << "ptrToI = " << ptrToI << "\n";
0x08048713 <main()+141>: mov -0x14(%ebp),%ebx
21 cout << "&refToNum = " << &refToI << "\n";
0x080487fb <main()+373>: mov -0xc(%ebp),%eax
我想您能够发现这里发生的事情.
如果要求输入&refToI
,则返回-0xc(%ebp)
地址位置的内容,并且-0xc(%ebp)
是refToi
所在的位置,其内容不过是i
的地址.
I guess you are able to spot what is happening here.
If you ask for &refToI
, the contents of -0xc(%ebp)
address location are returned and -0xc(%ebp)
is where refToi
resides and its contents are nothing but address of i
.
最后一件事,为什么要注释此行?
One last thing, Why is this line commented?
//cout << "*refToNum = " << *refToI << "\n";
因为不允许使用*refToI
,它会给您一个编译时错误.
Because *refToI
is not permitted and it will give you a compile time error.
这篇关于编译器级别的C ++引用和指针的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!