为什么 C 中的箭头 (->) 运算符存在? [英] Why does the arrow (->) operator in C exist?

查看:27
本文介绍了为什么 C 中的箭头 (->) 运算符存在?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

点(.)运算符用于访问结构体的成员,而C中的箭头运算符(->)用于访问成员由相关指针引用的结构.

The dot (.) operator is used to access a member of a struct, while the arrow operator (->) in C is used to access a member of a struct which is referenced by the pointer in question.

指针本身没有任何可以用点运算符访问的成员(它实际上只是一个描述虚拟内存中位置的数字,所以它没有任何成员).因此,如果我们只是定义点运算符以在指针上使用指针时自动取消引用指针(编译器在编译时已知的信息 afaik),则不会有歧义.

The pointer itself does not have any members which could be accessed with the dot operator (it's actually only a number describing a location in virtual memory so it doesn't have any members). So, there would be no ambiguity if we just defined the dot operator to automatically dereference the pointer if it is used on a pointer (an information which is known to the compiler at compile time afaik).

那么为什么语言创建者决定通过添加这个看似不必要的运算符来使事情变得更加复杂?什么是重大设计决策?

So why have the language creators decided to make things more complicated by adding this seemingly unnecessary operator? What is the big design decision?

推荐答案

我会将你的问题解释为两个问题:1) 为什么 -> 甚至存在,以及 2) 为什么 . 不会自动取消引用指针.这两个问题的答案都有历史渊源.

I'll interpret your question as two questions: 1) why -> even exists, and 2) why . does not automatically dereference the pointer. Answers to both questions have historical roots.

为什么 -> 甚至存在?

Why does -> even exist?

在 C 语言的第一个版本中(我将其称为 CRM,用于C 参考手册",1975 年 5 月随 Unix 第六版一起提供),运算符 -> 具有非常排他的含义,不是 *. 组合

In one of the very first versions of C language (which I will refer as CRM for "C Reference Manual", which came with 6th Edition Unix in May 1975), operator -> had very exclusive meaning, not synonymous with * and . combination

CRM 所描述的 C 语言在许多方面与现代 C 语言有很大不同.在 CRM 结构体中,成员实现了字节偏移的全局概念,它可以添加到任何地址值中而没有类型限制.IE.所有 struct 成员的所有名称都具有独立的全局含义(因此,必须是唯一的).例如你可以声明

The C language described by CRM was very different from the modern C in many respects. In CRM struct members implemented the global concept of byte offset, which could be added to any address value with no type restrictions. I.e. all names of all struct members had independent global meaning (and, therefore, had to be unique). For example you could declare

struct S {
  int a;
  int b;
};

和名称 a 将代表偏移量 0,而名称 b 将代表偏移量 2(假设 int 类型的大小为 2 并且没有填充).该语言要求翻译单元中所有结构的所有成员具有唯一的名称或代表相同的偏移值.例如.在同一个翻译单元中,您可以另外声明

and name a would stand for offset 0, while name b would stand for offset 2 (assuming int type of size 2 and no padding). The language required all members of all structs in the translation unit either have unique names or stand for the same offset value. E.g. in the same translation unit you could additionally declare

struct X {
  int a;
  int x;
};

这样就可以了,因为名称 a 将始终代表偏移量 0.但是这个附加声明

and that would be OK, since the name a would consistently stand for offset 0. But this additional declaration

struct Y {
  int b;
  int a;
};

在形式上是无效的,因为它试图重新定义"a 为偏移量 2 和 b 为偏移量 0.

would be formally invalid, since it attempted to "redefine" a as offset 2 and b as offset 0.

这就是 -> 运算符的用武之地.由于每个结构体成员名称都有其自给自足的全局含义,因此该语言支持这样的表达式

And this is where the -> operator comes in. Since every struct member name had its own self-sufficient global meaning, the language supported expressions like these

int i = 5;
i->b = 42;  /* Write 42 into `int` at address 7 */
100->a = 0; /* Write 0 into `int` at address 100 */

第一次赋值被编译器解释为取地址5,添加偏移量2,并将42赋值给>int 结果地址处的值".IE.以上将在地址 7 处将 42 分配给 int 值.请注意,这种 -> 的使用并不关心左侧表达式的类型.左侧被解释为右值数字地址(无论是指针还是整数).

The first assignment was interpreted by the compiler as "take address 5, add offset 2 to it and assign 42 to the int value at the resultant address". I.e. the above would assign 42 to int value at address 7. Note that this use of -> did not care about the type of the expression on the left-hand side. The left hand side was interpreted as an rvalue numerical address (be it a pointer or an integer).

*. 组合不可能实现这种诡计.你做不到

This sort of trickery was not possible with * and . combination. You could not do

(*i).b = 42;

因为 *i 已经是一个无效的表达式.* 运算符与 . 分开,因此对其操作数施加了更严格的类型要求.为了提供解决此限制的能力,CRM 引入了 -> 运算符,该运算符与左侧操作数的类型无关.

since *i is already an invalid expression. The * operator, since it is separate from ., imposes more strict type requirements on its operand. To provide a capability to work around this limitation CRM introduced the -> operator, which is independent from the type of the left-hand operand.

正如 Keith 在评论中指出的那样,->*+. 之间的这种区别就是 CRM 所指的"7.1.8 中的放宽要求":除了放宽要求 E1 是指针类型外,表达式 E1−>MOS 正好是相当于 (*E1).MOS

As Keith noted in the comments, this difference between -> and *+. combination is what CRM is referring to as "relaxation of the requirement" in 7.1.8: Except for the relaxation of the requirement that E1 be of pointer type, the expression E1−>MOS is exactly equivalent to (*E1).MOS

后来,在 K&R C 中,最初在 CRM 中描述的许多功能都进行了重大修改.完全删除了结构成员作为全局偏移标识符"的想法.并且->运算符的功能与*.组合的功能完全相同.

Later, in K&R C many features originally described in CRM were significantly reworked. The idea of "struct member as global offset identifier" was completely removed. And the functionality of -> operator became fully identical to the functionality of * and . combination.

为什么不能.自动取消对指针的引用?

Why can't . dereference the pointer automatically?

同样,在该语言的 CRM 版本中,. 运算符的左操作数必须是 lvalue.这是对该操作数施加的唯一要求(这就是它与 -> 不同的原因,如上所述).请注意,CRM 没有要求 . 的左操作数具有结构类型.它只是要求它是一个左值,any 左值.这意味着在 CRM 版本的 C 中你可以编写这样的代码

Again, in CRM version of the language the left operand of the . operator was required to be an lvalue. That was the only requirement imposed on that operand (and that's what made it different from ->, as explained above). Note that CRM did not require the left operand of . to have a struct type. It just required it to be an lvalue, any lvalue. This means that in CRM version of C you could write code like this

struct S { int a, b; };
struct T { float x, y, z; };

struct T c;
c.b = 55;

在这种情况下,编译器会将 55 写入位于称为 c 的连续内存块中字节偏移量 2 处的 int 值,即使类型 struct T 没有名为 b 的字段.编译器根本不会关心 c 的实际类型.它只关心 c 是一个左值:某种可写的内存块.

In this case the compiler would write 55 into an int value positioned at byte-offset 2 in the continuous memory block known as c, even though type struct T had no field named b. The compiler would not care about the actual type of c at all. All it cared about is that c was an lvalue: some sort of writable memory block.

现在请注意,如果你这样做了

Now note that if you did this

S *s;
...
s.b = 42;

代码将被认为是有效的(因为 s 也是一个左值)并且编译器只会尝试将数据写入指针 s 本身, at byte-offset 2. 不用说,这样的事情很容易导致内存溢出,但语言本身并不关心这些事情.

the code would be considered valid (since s is also an lvalue) and the compiler would simply attempt to write data into the pointer s itself, at byte-offset 2. Needless to say, things like this could easily result in memory overrun, but the language did not concern itself with such matters.

即在该版本的语言中,您提出的关于指针类型重载运算符 . 的想法不起作用: operator . 在与指针一起使用时已经具有非常特定的含义(使用左值指针或有任何左值).毫无疑问,这是非常奇怪的功能.但当时它就在那里.

I.e. in that version of the language your proposed idea about overloading operator . for pointer types would not work: operator . already had very specific meaning when used with pointers (with lvalue pointers or with any lvalues at all). It was very weird functionality, no doubt. But it was there at the time.

当然,这种奇怪的功能并不是反对在重新设计的 C - K&R C 版本中引入重载 . 指针操作符(如您所建议的)的一个非常有力的理由.但它没有没有完成.也许当时有一些必须支持的用 C 的 CRM 版本编写的遗留代码.

Of course, this weird functionality is not a very strong reason against introducing overloaded . operator for pointers (as you suggested) in the reworked version of C - K&R C. But it hasn't been done. Maybe at that time there was some legacy code written in CRM version of C that had to be supported.

(1975 C 参考手册的 URL 可能不稳定.另一个副本,可能有一些细微的差异,是 此处.)

(The URL for the 1975 C Reference Manual may not be stable. Another copy, possibly with some subtle differences, is here.)

这篇关于为什么 C 中的箭头 (->) 运算符存在?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆