C ++ FREAD jibberish [英] c++ fread jibberish
问题描述
出于某种原因,我的缓冲区越来越充满jibberish,我不知道为什么。我甚至检查我的文件与十六进制编辑器来验证我的角色保存在2字节的UNI code格式。我不知道什么是错的。
[在打开文件]
fseek的(_file_pointer,0,SEEK_END);
这 - > _length = FTELL(这个 - > _file_pointer)/的sizeof(CHR);
[主]
//还有一个原因,我只是
//不包括code,告诉为什么
wchar_t的类型定义CHR;
CHR *缓冲区=(CHR *)malloc的(F> _length *的sizeof(CHR));
如果(缓冲== NULL)回报;
memset的(缓冲液,0,F-> _length * sizeof的 - (CHR));
F-> Read_Whole_File(缓冲液);
F->关闭();
免费(缓冲);
[Read_Whole_File]
无效Read_Whole_File(CHR *缓冲区)
{
如果(缓冲== NULL)
{
这 - > _IsError = TRUE;
返回;
}
fseek的(这 - > _file_pointer,0,SEEK_SET);
一个诠释的sizeof =(缓冲[0]); //用于调试目的
FREAD(缓冲,一,_length,这 - > _file_pointer);
}
假设你的错误处理(你说你在这里略)是合理的,我看到两个原因,可能是问题的原因:
-
首先,
wchar_t的
不一定是2个字节,它的大小是实现定义。例如在Linux上它是最有可能的4个字节。 -
这可能是因为该文件是UTF-16BE(大端),和你在一个小端平台上运行,因此
wchar_t的
值在您的缓冲区有自己的字节顺序调换。
或者,它可以是两者。请了一些关于你的平台,以十六进制(如果可能)。
样本文件的细节和几个字节更新你的问题的在任何情况下,你不应该做出有关使用UNI code文件打交道时,标准的C尺寸或C ++类型的任何假设。的
例如,如果你想读UTF16-BE,使用C99 uint16_t
键入(或有保证的同等类型为16位),和交换字节顺序您输入的取决于你的平台字节序和文件字节序。您可以使用字节顺序标记检测文件字节序,如果它是present在文件中。
另外,使用第三部分统一code库,如 ICU 。这需要所有的平台特定的细节关怀,并会为你节省很多时间调试的一个相当大的项目。
For some reason my buffer is getting filled with jibberish, and I'm not sure why. I even checked my file with a hex editor to verify that my characters are saved in a 2 byte unicode format. I'm not sure what's wrong.
[on file open]
fseek(_file_pointer, 0, SEEK_END);
this->_length = ftell(this->_file_pointer) / sizeof(chr);
[Main]
//there is a reason for this, I just
//didn't include the code that tells why
typedef wchar_t chr;
chr *buffer = (chr*)malloc(f->_length*sizeof(chr));
if(buffer == NULL)return;
memset(buffer,0,f->_length*sizeof(chr));
f->Read_Whole_File(buffer);
f->Close();
free(buffer);
[Read_Whole_File]
void Read_Whole_File(chr *buffer)
{
if(buffer == NULL)
{
this->_IsError = true;
return;
}
fseek(this->_file_pointer, 0, SEEK_SET);
int a = sizeof(buffer[0]);//for debugging purposes
fread(buffer, a, _length, this->_file_pointer);
}
Assuming your error handling (that you said you've omitted here) is sound, I see two reasons that may be the cause of the problem:
First of all,
wchar_t
may not necessarily be 2 bytes, its size is implementation defined. For example on Linux it's most likely 4 bytes.It may be that the file is UTF-16BE (big-endian), and you are running on a little-endian platform, so the
wchar_t
values in your buffer have their byte order swapped.
Or, it may be both. Please update your question with some details about your platform and a few bytes from the sample file in hex (if possible).
In any case, you should not make any assumptions about sizes of standard C or C++ types when dealing with Unicode files.
For example, If you want to read UTF16-BE, use C99 uint16_t
type (or an equivalent type that's guaranteed to be 16-bit), and swap byte order of your input depending on your platform endian-ness and file endian-ness. You can detect file endian-ness using a byte order mark if it's present in the file.
Alternatively, use a third-part Unicode library, like ICU. It takes care of all platform-specific details and will save you a lot of time debugging in a sizable project.
这篇关于C ++ FREAD jibberish的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!