在C ++中处理非ASCII字符 [英] Handling Non-Ascii Chars in C++

查看:255
本文介绍了在C ++中处理非ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C ++中遇到一些非Ascii字符的问题。我有一个文件containg非ascii字符,我正在C + +通过文件处理阅读。读取文件(比如1.txt)后,我将数据存储到字符串流中,并将其写入另一个文件(例如2.txt)。



假设1.txt包含:

 ação


b $ b

在2.txt我应该得到相同的ouyput,但非ASCII字符打印为它的十六进制值在2.txt。



相当肯定的是C ++正在处理Ascii字符as Ascii。



请帮助如何在2.txt中正确打印这些字符



编辑:



首先是整个流程的代码:

  1.从DB读取脚本一个值并存储在11.txt中
2.CPP代码(a.cpp)读取11.txt并写入f.txt

数据存在于正在读取的DB中:Instalação



文件11 .txt包含:Instalaçã£o



文件F.txt包含:InstalaÃ



屏幕上a.cpp的输出:Instalação p>

a.cpp

  #include< iterator> 
#include< iostream>
#include< algorithm>
#include< sstream>
#include< fstream>
#include< iomanip>

using namespace std;
int main()
{
ifstream myReadFile;
ofstream f2;
myReadFile.open(11.txt);
f2.open(f2.txt);
string output;
if(myReadFile.is_open())
{
while(!myReadFile.eof())
{
myReadFile>输出;
// cout<<< output;

cout<<\\\
;

std :: stringstream tempDummyLineItem;
tempDummyLineItem<< output;
cout<< tempDummyLineItem.str();
f2<< tempDummyLineItem.str();
}
}
myReadFile.close();
return 0;
}

Locale说:

  LANG = en_US.UTF-8 
LC_CTYPE =en_US.UTF-8
LC_NUMERIC =en_US.UTF-8
LC_TIME =en_US.UTF-8
LC_COLLATE =en_US.UTF-8
LC_MONETARY =en_US.UTF-8
LC_MESSAGES =en_US.UTF-8
LC_PAPER =en_US.UTF-8
LC_NAME =en_US.UTF-8
LC_ADDRESS =en_US.UTF-8
LC_TELEPHONE =en_US.UTF-8
LC_MEASUREMENT =en_US.UTF-8
LC_IDENTIFICATION =en_US.UTF-8
LC_ALL =


解决方案

听起来像一个utf8问题。由于您没有使用c ++ 11 标记您的问题这里是一个关于unicode和c ++流的excelent文章。



从更新的代码,让我解释发生了什么。您创建一个文件流以读取您的文件。在内部,文件流只识别 chars ,直到你告诉它。在大多数机器上, char 只能保存8位数据,但是文件中的字符使用多于8位。为了能够正确地读取你的文件,你需要知道它是如何编码的。最常见的编码是UTF-8,每个字符使用1到4个 chars



知道你的编码,你可以使用wifstream(对于UTF-16)或 imbue()一个语言环境的其他编码。



更新:
如果您的文件是ISO-88591(从您上面的评论),请尝试此。

  wifstream myReadFile; 
myReadFile.imbue(std :: locale(en_US.iso88591));
myReadFile.open(11.txt);


I am facing some issues with non-Ascii chars in C++. I have one file containg non-ascii chars which I am reading in C++ via file Handling. After reading the file(say 1.txt) I am storing the data into string stream and writing it into another file(say 2.txt).

Assume 1.txt contains:

ação

In 2.txt I should get same ouyput but non-Ascii chars are printed as their Hex value in 2.txt.

Also, I am quite sure that C++ is handling Ascii chars as Ascii only.

Please Help on how to print these chars correctly in 2.txt

EDIT:

Firstly Psuedo-Code for Whole Process:

1.Shell script to Read from DB one Value and stores in 11.txt
2.CPP Code(a.cpp) reading 11.txt and Writing to f.txt

Data Present in DB which is being read: Instalação

File 11.txt contains: Instalação

File F.txt Contains: Instalação

Ouput of a.cpp on screen: Instalação

a.cpp

#include <iterator>
#include <iostream>
#include <algorithm>
#include <sstream>
#include<fstream>
#include <iomanip>

using namespace std;
int main()
{
    ifstream myReadFile;
    ofstream f2;
    myReadFile.open("11.txt");
    f2.open("f2.txt");
    string output;
    if (myReadFile.is_open()) 
    {
        while (!myReadFile.eof())
        {
            myReadFile >> output;
                //cout<<output;

            cout<<"\n";

            std::stringstream tempDummyLineItem;
            tempDummyLineItem <<output;
            cout<<tempDummyLineItem.str();
            f2<<tempDummyLineItem.str();
        }
    }
    myReadFile.close();
    return 0;
}

Locale says this:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

解决方案

Sounds to me like a utf8 issue. Since you didn't tag your question with c++11 Here Is an excelent article on unicode and c++ streams.

From your updated code, let me explain what is happening. You create a file stream to read your file. Internally the file stream only recognizes chars, until you tell it otherwise. A char, on most machines, can only hold 8 bits of data, but the characters in your file are using more than 8 bits. To be able to read your file correctly, you NEED to know how it is encoded. The most common encoding is UTF-8, which uses between 1 and 4 chars for each character.

Once you know your encoding, you can either use wifstream (for UTF-16) or imbue() a locale for other encodings.

Update: If your file is ISO-88591 (from your comment above), try this.

wifstream myReadFile;
myReadFile.imbue(std::locale("en_US.iso88591"));
myReadFile.open("11.txt");

这篇关于在C ++中处理非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆