正确,可移植的方式将缓冲区解释为结构 [英] Correct, portable way to interpret buffer as a struct

查看:100
本文介绍了正确,可移植的方式将缓冲区解释为结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题所在是网络编程.假设我想通过网络在两个程序之间发送消息.为了简单起见,假设消息看起来像这样,并且字节顺序不是问题.我想找到一种正确,可移植且有效的方式来将这些消息定义为C结构.我知道有四种解决方法:显式投射,通过联合进行投射,复制和封送处理.

The context of my problem is in network programming. Say I want to send messages over the network between two programs. For simplicity, let's say messages look like this, and byte-order is not a concern. I want to find a correct, portable, and efficient way to define these messages as C structures. I know of four approaches to this: explicit casting, casting through a union, copying, and marshaling.

struct message {
    uint16_t logical_id;
    uint16_t command;
};

显式投射:

void send_message(struct message *msg) {
    uint8_t *bytes = (uint8_t *) msg;
    /* call to write/send/sendto here */
}

void receive_message(uint8_t *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    struct message *msg = (struct message*) bytes;
    /* And now use the message */
    if (msg->command == SELF_DESTRUCT)
        /* ... */
}

我的理解是send_message不会违反别名规则,因为字节/字符指针可以别名任何类型.但是,相反情况并非如此,因此receive_message违反了别名规则,因此具有未定义的行为.

My understanding is that send_message does not violate aliasing rules, because a byte/char pointer may alias any type. However, the converse is not true, and so receive_message violates aliasing rules and thus has undefined behavior.

union message_u {
    struct message m;
    uint8_t bytes[sizeof(struct message)];
};

void receive_message_union(uint8_t *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    union message_u *msgu = bytes;
    /* And now use the message */
    if (msgu->m.command == SELF_DESTRUCT)
        /* ... */
}

但是,这似乎违反了联盟在任何给定时间仅包含其成员之一的想法.此外,如果源缓冲区未在字/半字边界上对齐,这似乎可能导致对齐问题.

However, this seems to violate the idea that a union only contains one of its members at any given time. Additionally, this seems like it could lead to alignment issues if the source buffer isn't aligned on a word/half-word boundary.

void receive_message_copy(uint8_t *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    struct message msg;
    memcpy(&msg, bytes, sizeof msg);
    /* And now use the message */
    if (msg.command == SELF_DESTRUCT)
        /* ... */
}

这似乎可以保证产生正确的结果,但是我当然非常希望不必复制数据.

This seems guaranteed to produce the correct result, but of course I would greatly prefer to not have to copy the data.

void send_message(struct message *msg) {
    uint8_t bytes[4];
    bytes[0] = msg.logical_id >> 8;
    bytes[1] = msg.logical_id & 0xff;
    bytes[2] = msg.command >> 8;
    bytes[3] = msg.command & 0xff;
    /* call to write/send/sendto here */
}

void receive_message_marshal(uint8_t *bytes, size_t len) {
    /* No longer relying on the size of the struct being meaningful */
    assert(len >= 4);    
    struct message msg;
    msg.logical_id = (bytes[0] << 8) | bytes[1];    /* Big-endian */
    msg.command = (bytes[2] << 8) | bytes[3];
    /* And now use the message */
    if (msg.command == SELF_DESTRUCT)
        /* ... */
}

仍然必须复制,但现在已与结构的表示分离.但是现在我们需要明确每个成员的位置和大小,字节顺序是一个更加明显的问题.

Still have to copy, but now decoupled from the representation of the struct. But now we need be explicit with the position and size of each member, and endian-ness is a much more obvious issue.

什么是严格的别名规则?

使用指针指向结构的混叠数组违反标准

什么时候char *对于严格的指针别名是安全的?

http://blog.llvm. org/2011/05/what-every-c-programmer-should-know.html

我一直在寻找联网代码的示例,以了解如何在其他地方处理这种情况. 轻型ip 也有一些类似的情况.在 udp.c 文件中以下代码:

I've been looking for examples of networking code to see how this situation is handled elsewhere. The light-weight ip has a few similar cases. In the udp.c file lies the following code:

/**
 * Process an incoming UDP datagram.
 *
 * Given an incoming UDP datagram (as a chain of pbufs) this function
 * finds a corresponding UDP PCB and hands over the pbuf to the pcbs
 * recv function. If no pcb is found or the datagram is incorrect, the
 * pbuf is freed.
 *
 * @param p pbuf to be demultiplexed to a UDP PCB (p->payload pointing to the UDP header)
 * @param inp network interface on which the datagram was received.
 *
 */
void
udp_input(struct pbuf *p, struct netif *inp)
{
  struct udp_hdr *udphdr;

  /* ... */

  udphdr = (struct udp_hdr *)p->payload;

  /* ... */
}

其中,struct udp_hdr是udp标头的压缩表示,而p->payloadvoid *类型.继续我的理解和答案,这是绝对 [edit- not]混叠,因此具有不确定的行为.

where struct udp_hdr is a packed representation of a udp header and p->payload is of type void *. Going on my understanding and this answer, this is definitely [edit- not] breaking strict-aliasing and thus has undefined behavior.

推荐答案

我想这是我一直在努力避免的事情,但是我终于去看看

I guess this is what I've been trying to avoid, but I finally went and took a look at the C99 standard myself. Here's what I've found (emphasis added):
§6.3.2.2 void

1 void表达式(具有void类型的表达式)的(不存在)值不应 可以任何方式使用,并且不得隐式或显式转换(除void之外) 适用于这样的表达.如果任何其他类型的表达式被评估为void 表达式,其值或指示符将被丢弃. (对一个空表达式进行求值 副作用.)

1 The (nonexistent) value of a void expression (an expression that has type void) shall not be used in any way, and implicit or explicit conversions (except to void) shall not be applied to such an expression. If an expression of any other type is evaluated as a void expression, its value or designator is discarded. (A void expression is evaluated for its side effects.)

§6.3.2.3指针

1 指向void的指针可以与任何不完整或对象的指针进行转换 输入.指向任何不完整或对象类型的指针都可以转换为指向void的指针 然后又回来;结果应等于原始指针.

1 A pointer to void may be converted to or from a pointer to any incomplete or object type. A pointer to any incomplete or object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.

第§3.​​14

1个对象
执行环境中数据存储的区域,其内容可以表示 值

1 object
region of data storage in the execution environment, the contents of which can represent values

§6.5

一个对象的存储值只能由具有以下之一的左值表达式访问: 以下类型:
与对象的有效类型兼容的类型
—与对象的有效类型兼容的类型的限定版本,
—一个类型,它是与对象的有效类型相对应的有符号或无符号类型,
—一个类型,是与对象的有效类型的限定版本相对应的有符号或无符号类型,
—在其
中包括上述类型之一的聚合或联合类型 成员(递归地包括子集合或所包含的联盟的成员),或
—字符类型.

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

§6.5

用于访问其存储值的对象的有效类型是
的声明类型. 对象(如果有). 如果将值存储在没有声明类型的对象中,则通过 具有非字符类型的左值,则左值的类型变为 该访问以及不修改存储值的后续访问的对象的有效类型.如果将值复制到没有声明类型的对象中,则使用 memcpy或memmove,或复制为字符类型数组,然后是有效类型 修改后的对象的访问权限,以及随后的访问权限,这些访问权限不会修改 value是从中复制值的对象的有效类型(如果有).为了 对没有声明类型的对象的所有其他访问,则该对象的有效类型为 只是用于访问的左值的类型.

The effective type of an object for an access to its stored value is the declared type of the
object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

§J.2未定义行为

-尝试使用void表达式的值,隐式或显式的值 转换(除void之外)将应用于void表达式(6.3.2.2).

— An attempt is made to use the value of a void expression, or an implicit or explicit conversion (except to void) is applied to a void expression (6.3.2.2).

结论

可以将void*强制转换为(c),但不能使用 C99 中的类型为void的值.因此,真实示例"不是未定义的行为.因此,只要考虑到对齐,填充和字节顺序,显式转换方法就可以进行以下修改:

Conclusion

It is ok (well-defined) to cast to-and-from a void*, but not ok to use a value of type void in C99. Therefore the "real world example" is not undefined behavior. Therefore, the explicit casting method can be used with the following modification, as long as alignment, padding, and byte-order is taken care of:

void receive_message(void *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    struct message *msg = (struct message*) bytes;
    /* And now use the message */
    if (msg->command == SELF_DESTRUCT)
        /* ... */
}

这篇关于正确,可移植的方式将缓冲区解释为结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆