如何转换std :: vector< unsigned char>到向量< char>不复制? [英] How to convert std::vector<unsigned char> to vector<char> without copying?

查看:196
本文介绍了如何转换std :: vector< unsigned char>到向量< char>不复制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法找到这个问题,这是我面对的一个实际问题。



我有一个文件加载实用程序,返回 std :: vector< unsigned char> 包含整个文件内容。
然而,处理函数需要 char 的连续数组(并且不能被改变 - 这是一个库函数)。因为使用处理函数的类存储数据的副本,所以我想将它存储为 vector< char> 。这里的代码可能更有说明性。

  std :: vector< unsigned char> LoadFile(std :: string const& path); 

class Processor {
std :: vector< char>缓存;
void _dataOperation(std :: vector< char> const& data);

public:
void Process(){
if(cache.empty())
//这里的问题!
cache = LoadFile(file.txt);

_dataOperation(cache);
}
};

此代码不编译,因为(显然)没有适当的转换。我们可以肯定的是,临时向量将占用相同的内存量(IOW sizeof(char)== sizeof(unsigned char)



天真的解决方案是遍历临时内容并转换每个字符。我知道在正常情况下,会调用 operator =(T&&)



情况下,可以安全地做重新解释转换,因为我相信我将只读ASCII字符。



因此,我的问题是:如何正确地并以不涉及复制的方式安全地转换临时向量?



如果不可能,我宁愿选择安全的复制方式,不安全的非复制。我也可以更改 LoadFile 以返回向量< char> 向量< unsigned char&

解决方案

在C ++ 11中,[basic.lval] p10说,


如果程序试图通过除下列类型之外的glvalue访问对象的存储值,则该行为未定义:




  • ...

  • 一个char或unsigned char类型。


(在其他版本的C ++中,确切的位置可能不同,但含义是一样的)



这意味着你可以使用向量< unsigned char>使用范围 [reinterpret_cast< char *>(cache.data()),reinterpret_cast< char *>(cache.data())+来访问它的内容 cache.size())。 (@Kerrek SB提到这个。)



如果您在<$ c $中存储向量< unsigned char> c> Processor 以匹配 LoadFile _dataOperation()的返回类型一个数组 char (意味着一个 const char * 和一个大小)然而,如果 _dataOperation() _dataOperation()的参数 $ c>具体地采用一个向量< char> ,并且存储一个向量< unsigned char> cache ,那么您不能将其传递给 reinterpret_cast< vector< char>& (即@AndréPuel是完全错误的,不要听他的。)这违反了混叠规则,编译器会试图在凌晨2点让你的客户愤怒。 (如果这个版本的编译器没有管理它,下一个版本将继续尝试。)



一个选项是,如你所述,模板 LoadFile(),并返回(或填写)所需类型的向量。另一个是复制结果,其简明版本再次是源向量 .data()的 reinterpret_cast 。 [basic.fundamental] p1提到对于字符类型,对象表示的所有位都参与值表示。,意味着你不会失去与 reinterpret_cast 。我没有看到一个坚定的保证,如果 reinterpret_cast'ed 没有位模式的 unsigned char char ,但我不知道有什么现代的硬件或编译器。


I weren't able to find that question, and it's an actual problem I'm facing.

I have a file loading utility that returns std::vector<unsigned char> containing whole file contents. However, the processing function requires contiguos array of char (and that cannot be changed - it's a library function). Since the class that's using the processing function stores a copy of the data anyway, I want to store it as vector<char>. Here's the code that might be a bit more illustrative.

std::vector<unsigned char> LoadFile (std::string const& path);

class Processor {
    std::vector<char> cache;
    void _dataOperation(std::vector<char> const& data);

public:
    void Process() {
        if (cache.empty())
            // here's the problem!
            cache = LoadFile("file.txt");

        _dataOperation(cache);
    }
};

This code doesn't compile, because (obviously) there's no appropriate conversion. We can be sure, however, that the temporary vector will ocupy the same amount of memory (IOW sizeof(char) == sizeof(unsigned char))

The naive solution would be to iterate over the contents of a temporary and cast every character. I know that in normal case, the operator= (T&&) would be called.

In my situation it's safe to do reinterpreting conversion, because I am sure I am going to read ASCII characters only. Any other character would be caught in _dataOperation anyway.

So, my question is : how to properly and safely convert the temporary vector in a way that involves no copying?

If it isn't possible, I would prefer the safe way of copying rather than unsafe noncopying. I could also change LoadFile to return either vector<char> or vector<unsigned char>.

解决方案

In C++11, [basic.lval]p10 says,

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • ...
  • a char or unsigned char type.

(the exact location may be different in other versions of C++, but the meaning is the same.)

That means that you can take a vector<unsigned char> cache and access its contents using the range [reinterpret_cast<char*>(cache.data()), reinterpret_cast<char*>(cache.data()) + cache.size()). (@Kerrek SB mentioned this.)

If you store a vector<unsigned char> in Processor to match the return type of LoadFile, and _dataOperation() actually takes an array of char (meaning a const char* and a size), then you can cast when you're passing the argument to _dataOperation()

However, if _dataOperation() takes a vector<char> specifically and you store a vector<unsigned char> cache, then you cannot pass it reinterpret_cast<vector<char>&>(cache). (i.e. @André Puel is totally wrong. Do not listen to him.) That violates the aliasing rules, and the compiler will attempt to anger your customers at 2am. (And if this version of your compiler doesn't manage it, the next version will keep trying.)

One option is, as you mentioned, to template LoadFile() and have it return (or fill in) a vector of the type you want. Another is to copy the result, for which the concise version is again the reinterpret_cast of the source vector's .data(). [basic.fundamental]p1 mentions that "For character types, all bits of the object representation participate in the value representation.", meaning that you're not going to lose data with that reinterpret_cast. I don't see a firm guarantee that no bit pattern of an unsigned char can cause a trap if reinterpret_cast'ed to char, but I don't know of any modern hardware or compilers that do it.

这篇关于如何转换std :: vector&lt; unsigned char&gt;到向量&lt; char&gt;不复制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆