写一个包含非 ASCII 字符的字符串 - 只有当字符串是变量时才会出错? [英] Write a string with non-ASCII characters in it - error only if string is a variable?

查看:24
本文介绍了写一个包含非 ASCII 字符的字符串 - 只有当字符串是变量时才会出错?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将包含非 ASCII 字符的字符串写入文件,例如maçã"、pé"等.

我目前正在做这样的事情:

_setmode(_fileno(stdout), _O_U16TEXT);//我最近在问题中添加了上面的行,//但是之前在代码里,忘记写了//我还包含了一些头文件,以便能够做到这一点//真的不记得是哪个,如果有必要我会查一下.wstring word=L"";wstring 文件 = L"example_file.txt"向量my_vector;wofstream my_output(file);而(字!= L."){getline(wcin, word);如果(字!= L.")my_vector.pushback(word);}for(std::vector::iterator j=my_vector.begin(); j!=my_vector.end(); j++){我的输出<<*j<<结束;//迭代器指向的元素遍历整个向量我的输出<

现在,如果我输入maçã"、pé"和."作为单词(只有前两个存储在向量中),文件的输出相当奇怪:

  • 我输入的单词(存储在变量中)看起来很奇怪:ma‡Æ"和p,";
  • 直接存储在代码中的词看起来完全正常maçã pé";

我尝试过使用 wcin >>word 而不是 getline(wcin, word) 和写入控制台而不是文件,结果是一样的:变量字符串写错了,直接在代码中写字符串完美.>

我找不到发生这种情况的原因,因此将不胜感激任何帮助.

我在 Windows 7 中工作,使用 Visual C++ 2010

Edit 2:添加了我遗漏的另一行代码.(就在开头)

编辑 3: 按照 SigTerm 的建议,我意识到问题出在输入上:wcin 和 getline 都没有将具有正确格式的字符串转换为变量 wstring字.那么,问题是,您知道是什么导致了这种情况或如何解决吗?

解决方案

Windows 使编码变得混乱,因为控制台通常使用OEM"代码页,而 GUI 应用程序使用ANSI"代码页.每个都因所使用的 Windows 本地化版本而异.在美国 Windows 上,OEM 代码页为 437,ANSI 代码页为 1252.

记住上述内容,将流设置为正在使用的语言环境可以解决问题.如果在控制台中工作,请使用控制台的代码页:

wcin.imbue(std::locale("English_United States.437"));wcout.imbue(std::locale("English_United States.437"));

但请记住,大多数代码页都是单字节编码,因此只能理解 256 个可能的 Unicode 字符:

wstring 字;wcin.imbue(std::locale("English_United States.437"));wcout.imbue(std::locale("English_United States.437"));getline(wcin, word);wcout<<字<<结束;wcout<

这会在控制台上返回:

maça pé玛萨贝

代码页 437 不包含 ã.

如果您符合以下条件,可以从控制台使用代码页 1252:

  • 发布chcp 1252.
  • 使用 TrueType 控制台字体,如 Consolas 或 Lucida Console.
  • 改为使用 English_United States.1252 注入流.

写入文件也有类似的问题.如果您在记事本中查看该文件,它将使用 ANSI 代码页来解释文件中的字节.因此,即使控制台应用程序使用代码页 437,如果使用 437 代码页编写,记事本也会错误地显示文件.以代码页 1252 编写文件也无济于事,因为这两个代码页不解释同一组 Unicode 代码点.这个问题的一些答案是获得不同的文件查看器,例如 Notepad++ 或以支持所有 Unicode 字符的 UTF-8 编写文件.

I'm trying to write strings with non-ASCII characters in it to a file, such as "maçã", "pé", and so on.

I'm currently doing something like this:

_setmode(_fileno(stdout), _O_U16TEXT);

//I added the line above recently to the question,
//but it was in the code before, I forgot to write it
//I also included some header files, to be able to do that
//can't really remember which, if necessary I'll look it up.


wstring word=L"";
wstring file = L"example_file.txt"
vector<wstring> my_vector;

wofstream my_output(file);

while(word != L".")
{
 getline(wcin, word);
 if(word!= L".")
   my_vector.pushback(word);
}

for(std::vector<wstring>::iterator j=my_vector.begin(); j!=my_vector.end(); j++)
    {
        my_output << *j << endl;
//element pointed by iterator going through the whole vector

        my_output << L("maçã pé") << endl;
    }
my_output.close();

Now, if I enter "maçã", "pé" and "." as words (only the 1st two are stored in the vector), the output to the file is rather strange:

  • the words I entered (stored in variables) appear strange: "ma‡Æ" and "p,";
  • the words stored directly in the code appear perfectly normal "maçã pé";

I have tried using wcin >> word instead of getline(wcin, word) and writing to the console instead of a file, the results are the same: writes variable strings wrong, writes strings directly in code perfectly.

I cannot find a reason for this to happen, so any help will be greatly appreciated.

Edit: I am working in Windows 7, using Visual C++ 2010

Edit 2: added one more line of code, that I had missed. (right in the beginning)

EDIT 3: following SigTerm's suggestion, I realised the problem is with the input: neither wcin nor getline are getting the strings with right formatting to variable wstring word. So, the question is, do you know what is causing this or how to fix it?

解决方案

Windows makes encodings confusing because the console typically uses an "OEM" code page, while GUI applications use an "ANSI" code page. The each vary with the localized version of Windows used. On U.S. Windows, The OEM code page is 437 and the ANSI code page is 1252.

Keeping the above in mind, setting the streams to the locale being used fixes the problem. If working in the console, use the console's code page:

wcin.imbue(std::locale("English_United States.437"));
wcout.imbue(std::locale("English_United States.437"));

But keep in mind most code pages are single-byte encodings, so only understand 256 possible Unicode characters:

wstring word;
wcin.imbue(std::locale("English_United States.437"));
wcout.imbue(std::locale("English_United States.437"));
getline(wcin, word);
wcout << word << endl;
wcout << L"maçã pé" << endl;

This returns on the console:

maça pé
maça pé

Code page 437 doesn't contain ã.

You can use code page 1252 from the console if you:

  • Issue chcp 1252.
  • Use a TrueType console font like Consolas or Lucida Console.
  • Imbue the streams with English_United States.1252 instead.

Writing to a file has similar issues. If you view the file in Notepad, it uses the ANSI code page to interpret the bytes in the file. So even if a console app is using code page 437, Notepad will display the file incorrectly if written using the 437 code page. Writing the file in code page 1252 doesn't help either, because the two code pages don't interpret the same set of Unicode code points. Some answers to this problem are to get a different file viewer such as Notepad++ or write the file in UTF-8 which supports all Unicode characters.

这篇关于写一个包含非 ASCII 字符的字符串 - 只有当字符串是变量时才会出错?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆