将缓冲区解释为结构的正确、可移植的方式 [英] Correct, portable way to interpret buffer as a struct

查看:25
本文介绍了将缓冲区解释为结构的正确、可移植的方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题的背景是网络编程.假设我想在两个程序之间通过网络发送消息.为简单起见,假设消息看起来像这样,并且字节顺序不是问题.我想找到一种正确、可移植且有效的方法来将这些消息定义为 C 结构.我知道有四种方法可以解决这个问题:显式转换、通过联合转换、复制和封送.

The context of my problem is in network programming. Say I want to send messages over the network between two programs. For simplicity, let's say messages look like this, and byte-order is not a concern. I want to find a correct, portable, and efficient way to define these messages as C structures. I know of four approaches to this: explicit casting, casting through a union, copying, and marshaling.

struct message {
    uint16_t logical_id;
    uint16_t command;
};

显式转换:

void send_message(struct message *msg) {
    uint8_t *bytes = (uint8_t *) msg;
    /* call to write/send/sendto here */
}

void receive_message(uint8_t *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    struct message *msg = (struct message*) bytes;
    /* And now use the message */
    if (msg->command == SELF_DESTRUCT)
        /* ... */
}

我的理解是 send_message 不违反别名规则,因为字节/字符指针可以别名任何类型.然而,反之亦然,因此 receive_message 违反了别名规则,因此具有未定义的行为.

My understanding is that send_message does not violate aliasing rules, because a byte/char pointer may alias any type. However, the converse is not true, and so receive_message violates aliasing rules and thus has undefined behavior.

union message_u {
    struct message m;
    uint8_t bytes[sizeof(struct message)];
};

void receive_message_union(uint8_t *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    union message_u *msgu = bytes;
    /* And now use the message */
    if (msgu->m.command == SELF_DESTRUCT)
        /* ... */
}

然而,这似乎违反了联合在任何给定时间只包含其成员之一的想法.此外,如果源缓冲区未在字/半字边界上对齐,这似乎可能导致对齐问题.

However, this seems to violate the idea that a union only contains one of its members at any given time. Additionally, this seems like it could lead to alignment issues if the source buffer isn't aligned on a word/half-word boundary.

void receive_message_copy(uint8_t *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    struct message msg;
    memcpy(&msg, bytes, sizeof msg);
    /* And now use the message */
    if (msg.command == SELF_DESTRUCT)
        /* ... */
}

这似乎可以保证产生正确的结果,但当然我更希望不必复制数据.

This seems guaranteed to produce the correct result, but of course I would greatly prefer to not have to copy the data.

void send_message(struct message *msg) {
    uint8_t bytes[4];
    bytes[0] = msg.logical_id >> 8;
    bytes[1] = msg.logical_id & 0xff;
    bytes[2] = msg.command >> 8;
    bytes[3] = msg.command & 0xff;
    /* call to write/send/sendto here */
}

void receive_message_marshal(uint8_t *bytes, size_t len) {
    /* No longer relying on the size of the struct being meaningful */
    assert(len >= 4);    
    struct message msg;
    msg.logical_id = (bytes[0] << 8) | bytes[1];    /* Big-endian */
    msg.command = (bytes[2] << 8) | bytes[3];
    /* And now use the message */
    if (msg.command == SELF_DESTRUCT)
        /* ... */
}

仍然需要复制,但现在与结构的表示解耦了.但是现在我们需要明确每个成员的位置和大小,而字节序是一个更明显的问题.

Still have to copy, but now decoupled from the representation of the struct. But now we need be explicit with the position and size of each member, and endian-ness is a much more obvious issue.

什么是严格别名规则?

带指针到结构的别名数组不带违反标准

对于严格指针别名而言,char* 何时是安全的?

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

我一直在寻找网络代码示例,以了解其他地方如何处理这种情况.轻量级 ip 有一些类似的情况.在 udp.c 文件中以下代码:

I've been looking for examples of networking code to see how this situation is handled elsewhere. The light-weight ip has a few similar cases. In the udp.c file lies the following code:

/**
 * Process an incoming UDP datagram.
 *
 * Given an incoming UDP datagram (as a chain of pbufs) this function
 * finds a corresponding UDP PCB and hands over the pbuf to the pcbs
 * recv function. If no pcb is found or the datagram is incorrect, the
 * pbuf is freed.
 *
 * @param p pbuf to be demultiplexed to a UDP PCB (p->payload pointing to the UDP header)
 * @param inp network interface on which the datagram was received.
 *
 */
void
udp_input(struct pbuf *p, struct netif *inp)
{
  struct udp_hdr *udphdr;

  /* ... */

  udphdr = (struct udp_hdr *)p->payload;

  /* ... */
}

其中 struct udp_hdr 是 udp 标头的打包表示,p->payloadvoid * 类型.根据我的理解和这个的答案,这绝对 打破严格-别名,因此具有未定义的行为.

where struct udp_hdr is a packed representation of a udp header and p->payload is of type void *. Going on my understanding and this answer, this is definitely [edit- not] breaking strict-aliasing and thus has undefined behavior.

推荐答案

我想这是我一直试图避免的,但我最终还是去看了 C99 标准.这是我发现的内容(添加了重点):
§6.3.2.2 无效

I guess this is what I've been trying to avoid, but I finally went and took a look at the C99 standard myself. Here's what I've found (emphasis added):
§6.3.2.2 void

1 void 表达式(具有 void 类型的表达式)的(不存在的)值不应以任何方式使用,隐式或显式转换(除了 void)不得应用于这样的表达.如果任何其他类型的表达式被评估为空表达式,它的值或指示符被丢弃.(一个 void 表达式被评估为它的副作用.)

1 The (nonexistent) value of a void expression (an expression that has type void) shall not be used in any way, and implicit or explicit conversions (except to void) shall not be applied to such an expression. If an expression of any other type is evaluated as a void expression, its value or designator is discarded. (A void expression is evaluated for its side effects.)

§6.3.2.3 指针

§6.3.2.3 Pointers

1 指向 void 的指针可以与指向任何不完整或对象的指针相互转换输入.指向任何不完整或对象类型的指针可以转换为指向 void 的指针然后再回来;结果应与原始指针相等.

1 A pointer to void may be converted to or from a pointer to any incomplete or object type. A pointer to any incomplete or object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.

和§3.14

1 个对象
执行环境中的数据存储区域,其中的内容可以表示值

1 object
region of data storage in the execution environment, the contents of which can represent values

§6.5

一个对象只能通过左值表达式访问其存储的值,该表达式具有以下之一以下类型:
与对象有效类型兼容的类型,
— 与对象的有效类型兼容的类型的限定版本,
— 与对象的有效类型相对应的有符号或无符号类型,
— 一种类型,它是与对象有效类型的限定版本相对应的有符号或无符号类型,
— 在其
中包含上述类型之一的聚合或联合类型成员(包括递归地,子聚合或包含联合的成员),或
— 一种字符类型.

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

§6.5

一个对象访问其存储值的有效类型是
的声明类型对象,如果有的话.如果一个值被存储到一个没有声明类型的对象中左值的类型不是字符类型,则左值的类型变为该访问和不修改存储值的后续访问的对象的有效类型.如果一个值被复制到一个没有声明类型的对象中,使用memcpy 或 memmove,或复制为字符类型的数组,则有效类型该访问的修改对象以及不修改该访问的后续访问value 是从中复制值的对象的有效类型,如果它有的话.为了对没有声明类型的对象的所有其他访问,该对象的有效类型是只是用于访问的左值的类型.

The effective type of an object for an access to its stored value is the declared type of the
object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

§J.2 未定义行为

——尝试使用 void 表达式的值,或者隐式或显式转换(void 除外)应用于 void 表达式 (6.3.2.2).

— An attempt is made to use the value of a void expression, or an implicit or explicit conversion (except to void) is applied to a void expression (6.3.2.2).

结论

void* 之间进行转换是可以的(定义明确的),但在 中使用类型为 void 的值是不行的C99.因此,真实世界的例子"不是未定义的行为.因此,只要注意对齐、填充和字节顺序,就可以使用显式转换方法进行以下修改:

Conclusion

It is ok (well-defined) to cast to-and-from a void*, but not ok to use a value of type void in C99. Therefore the "real world example" is not undefined behavior. Therefore, the explicit casting method can be used with the following modification, as long as alignment, padding, and byte-order is taken care of:

void receive_message(void *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    struct message *msg = (struct message*) bytes;
    /* And now use the message */
    if (msg->command == SELF_DESTRUCT)
        /* ... */
}

这篇关于将缓冲区解释为结构的正确、可移植的方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆