将getline与unicode文件一起使用时出现问题 [英] problem using getline with a unicode file

查看：152 发布时间：2020/11/13 21:12:48 c++ unicode getline wstring

本文介绍了将getline与unicode文件一起使用时出现问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

更新:谢谢@Potatoswatter和@Jonathan Leffler的评论-令人尴尬的是，我被调试器工具提示所吸引，没有正确显示wstring的值-但是，它仍然对我不起作用，我拥有更新了以下问题:

UPDATE: Thank you to @Potatoswatter and @Jonathan Leffler for comments - rather embarrassingly I was caught out by the debugger tool tip not showing the value of a wstring correctly - however it still isn't quite working for me and I have updated the question below:

如果我有一个小的多字节文件，我想读入一个字符串，则使用以下技巧-我使用getline且其分度为'\0'，例如

If I have a small multibyte file I want to read into a string I use the following trick - I use getline with a delimeter of '\0' e.g.

std::string contents_utf8;
std::ifstream inf1("utf8.txt");
getline(inf1, contents_utf8, '\0');

这将读取整个文件，包括换行符.
但是，如果我尝试对宽字符文件执行相同的操作，将无法正常工作-我的wstring仅读取到第一行.

This reads in the entire file including newlines.
However if I try to do the same thing with a wide character file it doesn't work - my wstring only reads to the the first line.

std::wstring contents_wide;
std::wifstream inf2(L"ucs2-be.txt");
getline( inf2, contents_wide, wchar_t(0) ); //doesn't work

例如，如果我的unicode文件包含由CRLF分隔的字符A和B，则十六进制如下所示:

For example my if unicode file contains the chars A and B seperated by CRLF, the hex looks like this:

FE FF 00 41 00 0D 00 0A 00 42

基于这样的事实，对于多字节文件，getline带有'\ 0'会读取整个文件，我相信getline( inf2, contents_wide, wchar_t(0) )应该读取整个unicode文件.但是，事实并非如此-在上面的示例中，我的宽字符串将包含以下两个wchar_ts:FF FF

Based on the fact that with a multibyte file getline with '\0' reads the entire file I believed that getline( inf2, contents_wide, wchar_t(0) ) should read in the entire unicode file. However it doesn't - with the example above my wide string would contain the following two wchar_ts: FF FF

(如果我删除wchar_t(0)，它将按预期方式在第一行中读取(即FE FF 00 41 00 0D 00)

(If I remove the wchar_t(0) it reads in the first line as expected (ie FE FF 00 41 00 0D 00)

为什么wchar_t(0)不能作为定界wchar_t来使getline在00 00上停止(或读取到我想要的文件末尾)?
谢谢

Why doesn't wchar_t(0) work as a delimiting wchar_t so that getline stops on 00 00 (or reads to the end of the file which is what I want)?
Thank you

推荐答案

您的UCS-2解码器行为异常. FE FF 00 41 00 0D 00 0A 00 42上getline( inf2, contents_wide )的结果应为0041 0000 = L"A".假设您使用的是Windows，则应正确转换行尾，并且字节序标记不应出现在输出中.

Your UCS-2 decoder is misbehaving. The result of getline( inf2, contents_wide ) on FE FF 00 41 00 0D 00 0A 00 42 should be 0041 0000 = L"A". Assuming you're on Windows, the line ending should be properly converted, and the byte-order mark shouldn't appear in the output.

关于设置语言环境的建议，请仔细检查您的OS文档.

Suggest double-checking your OS documentation with respect to how you set the locale.

编辑:您是否设置了语言环境?

Did you set the locale?

locale::global( locale( "something if your system supports UCS-2" ) );

或

locale::global( encoding_support::ucs2_bigendian_encoding );

其中encoding_support是一些库.

where encoding_support is some library.

这篇关于将getline与unicode文件一起使用时出现问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将getline与unicode文件一起使用时出现问题 [英] problem using getline with a unicode file

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

将getline与unicode文件一起使用时出现问题 [英] problem using getline with a unicode file

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭