在C ++中处理非ASCII字符 [英] Handling Non-Ascii Chars in C++

查看：255 发布时间：2016/10/17 9:47:54 c++ string c++11 non-ascii-characters

本文介绍了在C ++中处理非ASCII字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在C ++中遇到一些非Ascii字符的问题。我有一个文件containg非ascii字符，我正在C + +通过文件处理阅读。读取文件（比如1.txt）后，我将数据存储到字符串流中，并将其写入另一个文件（例如2.txt）。

假设1.txt包含：

 ação

b $ b

在2.txt我应该得到相同的ouyput，但非ASCII字符打印为它的十六进制值在2.txt。

相当肯定的是C ++正在处理Ascii字符as Ascii。

请帮助如何在2.txt中正确打印这些字符

编辑：

首先是整个流程的代码：

  1.从DB读取脚本一个值并存储在11.txt中
 2.CPP代码（a.cpp）读取11.txt并写入f.txt

数据存在于正在读取的DB中：Instalação

文件11 .txt包含：InstalaÃ§ã£o

文件F.txt包含：InstalaÃ

 
 
 屏幕上a.cpp的输出：Instalação p> 
 
 
 a.cpp 
  #include< iterator> 
 #include< iostream> 
 #include< algorithm> 
 #include< sstream> 
＃include< fstream> 
 #include< iomanip> 
 
 using namespace std; 
 int main（）
 {
 ifstream myReadFile; 
 ofstream f2; 
 myReadFile.open（11.txt）; 
 f2.open（f2.txt）; 
 string output; 
 if（myReadFile.is_open（））
 {
 while（！myReadFile.eof（））
 {
 myReadFile>输出; 
 // cout<<< output; 
 
 cout<<\\\
; 
 
 std :: stringstream tempDummyLineItem; 
 tempDummyLineItem<< output; 
 cout<< tempDummyLineItem.str（）; 
 f2<< tempDummyLineItem.str（）; 
} 
} 
 myReadFile.close（）; 
 return 0; 
} 
  
 Locale说：
  LANG = en_US.UTF-8 
 LC_CTYPE =en_US.UTF-8
 LC_NUMERIC =en_US.UTF-8
 LC_TIME =en_US.UTF-8
 LC_COLLATE =en_US.UTF-8
 LC_MONETARY =en_US.UTF-8
 LC_MESSAGES =en_US.UTF-8
 LC_PAPER =en_US.UTF-8
 LC_NAME =en_US.UTF-8
 LC_ADDRESS =en_US.UTF-8
 LC_TELEPHONE =en_US.UTF-8
 LC_MEASUREMENT =en_US.UTF-8
 LC_IDENTIFICATION =en_US.UTF-8
 LC_ALL = 
  
 
 
解决方案
听起来像一个utf8问题。由于您没有使用c ++ 11 标记您的问题这里是一个关于unicode和c ++流的excelent文章。
 
 
 从更新的代码，让我解释发生了什么。您创建一个文件流以读取您的文件。在内部，文件流只识别 chars ，直到你告诉它。在大多数机器上， char 只能保存8位数据，但是文件中的字符使用多于8位。为了能够正确地读取你的文件，你需要知道它是如何编码的。最常见的编码是UTF-8，每个字符使用1到4个 chars 。
 
 
 知道你的编码，你可以使用wifstream（对于UTF-16）或 imbue（）一个语言环境的其他编码。
 
 
 更新：
如果您的文件是ISO-88591（从您上面的评论），请尝试此。
  wifstream myReadFile; 
 myReadFile.imbue（std :: locale（en_US.iso88591））; 
 myReadFile.open（11.txt）; 
  
 
I am facing some issues with non-Ascii chars in C++. I have one file containg non-ascii chars which I am reading in C++ via file Handling. After reading the file(say 1.txt) I am storing the data into string stream and writing it into another file(say 2.txt).

Assume 1.txt contains:
ação
In 2.txt I should get same ouyput but non-Ascii chars are printed as their Hex value in 2.txt.

Also, I am quite sure that C++ is handling Ascii chars as Ascii only. 

Please Help on how to print these chars correctly in 2.txt

EDIT:

Firstly Psuedo-Code for Whole Process:
1.Shell script to Read from DB one Value and stores in 11.txt
2.CPP Code(a.cpp) reading 11.txt and Writing to f.txt
Data Present in DB which is being read: Instalação

File 11.txt contains: InstalaÃ§Ã£o

File F.txt Contains: InstalaÃ§Ã£o

Ouput of a.cpp on screen: Instalação

a.cpp
#include <iterator>
#include <iostream>
#include <algorithm>
#include <sstream>
#include<fstream>
#include <iomanip>

using namespace std;
int main()
{
    ifstream myReadFile;
    ofstream f2;
    myReadFile.open("11.txt");
    f2.open("f2.txt");
    string output;
    if (myReadFile.is_open()) 
    {
        while (!myReadFile.eof())
        {
            myReadFile >> output;
                //cout<<output;

            cout<<"\n";

            std::stringstream tempDummyLineItem;
            tempDummyLineItem <<output;
            cout<<tempDummyLineItem.str();
            f2<<tempDummyLineItem.str();
        }
    }
    myReadFile.close();
    return 0;
}
Locale says this:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

 解决方案 Sounds to me like a utf8 issue. Since you didn't tag your question with c++11 Here Is an excelent article on unicode and c++ streams.

From your updated code, let me explain what is happening. You create a file stream to read your file. Internally the file stream only recognizes chars, until you tell it otherwise. A char, on most machines, can only hold 8 bits of data, but the characters in your file are using more than 8 bits. To be able to read your file correctly, you NEED to know how it is encoded. The most common encoding is UTF-8, which uses between 1 and 4 chars for each character.

Once you know your encoding, you can either use wifstream (for UTF-16) or imbue() a locale for other encodings.

Update:
If your file is ISO-88591 (from your comment above), try this.
wifstream myReadFile;
myReadFile.imbue(std::locale("en_US.iso88591"));
myReadFile.open("11.txt");


                        
这篇关于在C ++中处理非ASCII字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在C ++中处理非ASCII字符 [英] Handling Non-Ascii Chars in C++

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

在C ++中处理非ASCII字符 [英] Handling Non-Ascii Chars in C++

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭