保留文件字节最合适的向量类型是什么? [英] What is the most suitable type of vector to keep the bytes of a file?
问题描述
最适合保留文件字节的向量类型是什么?
What is the most suitable type of vector to keep the bytes of a file?
我正在考虑使用int类型,因为位"00000000"(1个字节)被解释为0!
I'm considering using the int type, because the bits "00000000" (1 byte) are interpreted to 0!
目标是将这些数据(字节)保存到文件中,以后再从该文件中检索.
The goal is to save this data (bytes) to a file and retrieve from this file later.
注意:文件包含空字节(位为"00000000")!
NOTE: The files contain null bytes ("00000000" in bits)!
我在这里迷路了.帮我! = D谢谢!
I'm a bit lost here. Help me! =D Thanks!
更新我:
要使用此功能读取文件,请执行以下操作:
To read the file I'm using this function:
char* readFileBytes(const char *name){
std::ifstream fl(name);
fl.seekg( 0, std::ios::end );
size_t len = fl.tellg();
char *ret = new char[len];
fl.seekg(0, std::ios::beg);
fl.read(ret, len);
fl.close();
return ret;
}
注意我::我需要找到一种方法来确保可以从文件中恢复"00000000"位!
NOTE I: I need to find a way to ensure that bits "00000000" can be recovered from the file!
注意II:,关于安全方式将这些位"00000000"保存到文件的任何建议吗?
NOTE II: Any suggestions for a safe way to save those bits "00000000" to a file?
注意III::使用char数组时,我无法为该类型转换位"00000000".
NOTE III: When using char array I had problems converting bits "00000000" for that type.
代码段:
int bit8Array[] = {0, 0, 0, 0, 0, 0, 0, 0};
char charByte = (bit8Array[7] ) |
(bit8Array[6] << 1) |
(bit8Array[5] << 2) |
(bit8Array[4] << 3) |
(bit8Array[3] << 4) |
(bit8Array[2] << 5) |
(bit8Array[1] << 6) |
(bit8Array[0] << 7);
更新II:
遵循@chqrlie建议.
Following the @chqrlie recommendations.
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <algorithm>
#include <random>
#include <cstring>
#include <iterator>
std::vector<unsigned char> readFileBytes(const char* filename)
{
// Open the file.
std::ifstream file(filename, std::ios::binary);
// Stop eating new lines in binary mode!
file.unsetf(std::ios::skipws);
// Get its size
std::streampos fileSize;
file.seekg(0, std::ios::end);
fileSize = file.tellg();
file.seekg(0, std::ios::beg);
// Reserve capacity.
std::vector<unsigned char> unsignedCharVec;
unsignedCharVec.reserve(fileSize);
// Read the data.
unsignedCharVec.insert(unsignedCharVec.begin(),
std::istream_iterator<unsigned char>(file),
std::istream_iterator<unsigned char>());
return unsignedCharVec;
}
int main(){
std::vector<unsigned char> unsignedCharVec;
// txt file contents "xz"
unsignedCharVec=readFileBytes("xz.txt");
// Letters -> UTF8/HEX -> bits!
// x -> 78 -> 0111 1000
// z -> 7a -> 0111 1010
for(unsigned char c : unsignedCharVec){
printf("%c\n", c);
for(int o=7; o >= 0; o--){
printf("%i", ((c >> o) & 1));
}
printf("%s", "\n");
}
// Prints...
// x
// 01111000
// z
// 01111010
return 0;
}
更新III:
这是我用来写入二进制文件的代码:
This is the code I am using using to write to a binary file:
void writeFileBytes(const char* filename, std::vector<unsigned char>& fileBytes){
std::ofstream file(filename, std::ios::out|std::ios::binary);
file.write(fileBytes.size() ? (char*)&fileBytes[0] : 0,
std::streamsize(fileBytes.size()));
}
writeFileBytes("xz.bin", fileBytesOutput);
更新四:
Futher阅读了有关 UPDATE III 的信息:
Futher read about UPDATE III:
c ++-保存" std :: vector" unsigned char>"到文件
结论:
对于"00000000"位(1个字节)的问题,绝对的解决方案是将存储文件字节的类型更改为std::vector<unsigned char>
作为朋友的指导. std::vector<unsigned char>
是通用类型(存在于所有环境中),并且可以接受任何八进制(与"UPDATE I"中的char *不同)!
Definitely the solution to the problem of the "00000000" bits (1 byte) was change the type that stores the bytes of the file to std::vector<unsigned char>
as the guidance of friends. std::vector<unsigned char>
is a universal type (exists in all environments) and will accept any octal (unlike char* in "UPDATE I")!
此外,从数组(字符)更改为向量(无符号字符)对于成功至关重要!使用vector时,我可以更安全,完全独立于数据内容来操作数据(在char数组中,我对此有问题).
In addition, changing from array (char) to vector (unsigned char) was crucial for success! With vector I manipulate my data more securely and completely independent of its content (in char array I have problems with this).
非常感谢!
推荐答案
您的代码中存在3个问题:
There are 3 problems in your code:
-
您使用
char
类型并返回char *
.但是,返回值不是正确的C字符串,因为您没有为'\0'
终止符分配额外的字节,也没有为null终止它.
You use the
char
type and return achar *
. Yet the return value is not a proper C string as you do not allocate an extra byte for the'\0'
terminator nor null terminate it.
如果文件中可能包含空字节,则可能应该使用类型unsigned char
或uint8_t
来明确表明数组不包含文本.
If the file may contain null bytes, you should probably use type unsigned char
or uint8_t
to make it explicit that the array does not contain text.
您不将数组大小返回给调用方.调用者无法得知数组的长度.您可能应该使用std::vector<uint8_t>
或std::vector<unsigned char>
而不是使用new
分配的数组.
You do not return the array size to the caller. The caller has no way to tell how long the array is. You should probably use a std::vector<uint8_t>
or std::vector<unsigned char>
instead of an array allocated with new
.
这篇关于保留文件字节最合适的向量类型是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!