在不违反严格混叠的情况下将u8string_view转换为char数组? [英] `u8string_view` into a `char` array without violating strict-aliasing?

查看:105
本文介绍了在不违反严格混叠的情况下将u8string_view转换为char数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  • 我的内存中有一堆二进制数据,表示为 char * (可以从文件中读取,也可以通过网络传输).
  • 我知道它包含一个以UTF8编码的文本字段,该文本字段在一定的偏移处具有一定的长度.
  • I have a blob of binary data in memory, represented as a char* (maybe read from a file, or transmitted over the network).
  • I know that it contains a UTF8-encoded text field of a certain length at a certain offset.

如何(安全且方便地)获取 u8string_view 来表示此文本字段的内容?

How can I (safely and portably) get a u8string_view to represent the contents of this text field?

将字段作为 u8string_view 传递给下游代码的动机是:

The motivation for passing the field to down-stream code as a u8string_view is:

  • 很清楚地表明,与 string_view 不同,该文本字段是UTF8编码的.
  • 它避免了将其作为 u8string 返回的开销(可能是免费存储分配+复制).
  • It very clearly communicates that the text field is UTF8-encoded, unlike string_view.
  • It avoids the cost (likely free-store allocation + copying) of returning it as u8string.

做到这一点的天真的方法是:

The naive way to do this, would be:

char* data = ...;
size_t field_offset = ...;
size_t field_length = ...;

char8_t* field_ptr = reinterpret_cast<char8_t*>(data + field_offset);
u8string_view field(field_ptr, field_length);

但是,如果我正确理解C ++严格别名规则,则这是未定义的行为,因为它通过返回的 char8_t * 指针访问 char * 缓冲区的内容按 reinterpret_cast ,而 char8_t 不是别名类型.

However, if I understand the C++ strict-aliasing rules correctly, this is undefined behavior because it accesses the contents of the char* buffer via the char8_t* pointer returned by reinterpret_cast, and char8_t is not an aliasing type.

是真的吗?

有安全的方法吗?

推荐答案

同一问题有时也会在其他情况下发生,例如使用共享内存.

This same problem occurs occasionally in other contexts too, including the use of shared memory for example.

一种使用原始"位中的位创建对象的技巧.不分配内存的内存是通过memcpy创建本地对象,然后在原始"磁盘上创建该本地对象的动态副本.记忆.示例:

A trick to create objects using bits in "raw" memory without allocating memory is to create a local object by memcpy, and then create a dynamic copy of that local object over the "raw" memory. Example:

char* begin_raw = data + field_offset;
char8_t* last {};
for(std::ptrdiff_t i = 0; i < field_length; i++) {
    char* current = begin_raw + i;
    char8_t local {};
    std::memcpy(&local, current, sizeof local);
    last = new (current) char8_t(local);
}
char8_t* begin = last - (field_length - 1);
std::u8string_view field(begin, field_length);

在您反对不希望复制之前,请注意最终结果不会导致原始"图像的表示形式发生任何变化.记忆.编译器也可以注意到这一点,并且可以将整个循环编译为零指令(在我的测试中,GCC和Clang使用-O2实现了此目的).我们所做的全部工作就是通过在内存中创建动态对象来满足语言的对象生存期规则.

Before you object that you don't want to copy, notice that the end result causes no changes to the representation of the "raw" memory. The compiler can notice this too, and can compile the entire loop into zero instructions (in my tests GCC and Clang achieve this with -O2). All that we have done is satisfy the object lifetime rules of the language by creating dynamic objects into the memory.

这篇关于在不违反严格混叠的情况下将u8string_view转换为char数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆