c ++对unicode,utf-8,编码/解码,ifstream,wstream的支持? [英] c++ support for unicode, utf-8, encode/decode, ifstream, wstream?

查看:356
本文介绍了c ++对unicode,utf-8,编码/解码,ifstream,wstream的支持?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





我有一个以UTF-8编码的UNICODE文本文件。


我应该存储UNICODE我的程序中的字符串例如在

std :: wstring对吗?为了能够正常工作,所以

std :: wstring foo; foo [5]意味着UNICODE编码字符串的第5个_字符,而不是第5个字节。$ / b

如何从UTF读取文本-8文件到std :: wstring?我需要做一些转换吗?从utf-8到内部格式

std :: wstring(可能是UCS-2或-4对吗?)


另外,如何保存回来字符串,以及如何操纵它(比如,

替换第4个字符,只是str [4] =(wchar)''x''?)


谢谢


解决方案

Rafa ?? Maj Raf256 sade:


我有一个以UTF-8编码的UNICODE文本文件。

我应该将UNICODE字符串存储在我的程序中,例如
std :: wstring对吗?为了能够正常地对它们进行处理,那么
std :: wstring foo; foo [5]表示UNICODE编码字符串的第5个_字符,而不是第5个字节。

如何将UTF-8文件中的文本读入std :: wstring ?我需要做一些转换吗?从utf-8到
std :: wstring使用的内部格式(可能是UCS-2或-4对吧?)

另外,如何保存字符串,以及如何操作它(比如,
替换第4个字符,只需str [4] =(wchar)''x''?)




阅读UTF- 8个数据将其内部转换为UTF-32,以便更容易解析。转换过程非常容易编写。

std :: wstring的问题在于它用

wchar_t进行模板化,而且这个原语至少在我的机器只有2个字节,

因此不适用于unicode(除非你实际上

希望在这种情况下使用异常的UTF-16变体)。 />

-

TB @ SWEDEN


TB写道:

读取UTF-8数据后,将其内部转换为UTF-32,以便于解析。


怎么样?还没准备好使用函数/类吗?在std,

或许可以提升?

转换过程非常容易编写。
std :: wstring的问题在于它用
wchar_t进行模板化,而且这个原语至少在我的机器上只有2个字节,因此不适用于unicode (除非你实际上希望在这种情况下使用异常的UTF-16变体)。




Hm ..所以哪个类最好存储任何类 - 语言文本字符串呢?


" Rafal Maj Raf256" <我们******************* @ raf256.com.invalid>写在

消息新闻:dq ********** @ inews.gazeta.pl ...

TB写道:

读取UTF-8数据后,将其内部转换为UTF-32,以便于解析。



如何?还没准备好使用函数/类吗?在std中,
也许在提升?




你会在不同的地方找到一些codecvt方面(你需要的小动物),

但需要一整套你可能需要的东西 - 现成的,

测试和支持 - 请参阅我们的CoreX库。

转换过程非常容易编写。


不,不是。至少没有正确和健壮。

std :: wstring的问题在于它用
wchar_t进行模板化,而且这个原语至少在我的机器上只有2个字节,<因此,与unicode一起使用是不切实际的(除非你实际上希望在这种情况下使用异常的UTF-16变体)。



Hm .. so那么哪个类最好存储任何语言的文本字符串?




取决于你的目标。在事实和现实中,你仍然能够很好地与UCS-2相处得很好。实际上,您最近添加了大于0xffff的

代码值,忽略了异国情调的字符。您的输入转换器

然后将任何UTF-8序列视为错误,该序列指定代码

值太大。但如果你觉得需要支持当前形式的完整的

Unicode集,你需要在内部将UTF-8转换为UTF-16

,并接受这个事实角色可以占用一个或两个存储元素。无论你选择什么,CoreX都有你需要的

转换工具。


PJ Plauger

Dinkumware,Ltd。
http://www.dinkumware.com



Hi,
I have an UNICODE text file endcoded in UTF-8.

I should store the UNICODE strings in my program for example in
std::wstring right? To be able to work on them normally, so that
std::wstring foo; foo[5] would mean 5-th _character_, and not 5-th
byte of UNICODE encoded string.

How do I read a text from UTF-8 file into std::wstring? I need to do
some conversion right? from utf-8 to internal format used by
std::wstring (probably UCS-2 or -4 right?)

Also, how to save back the string, and how to manipulate it (like,
replace 4-th character, just str[4]=(wchar)''x'' ?)

Thanks



解决方案

Rafa?? Maj Raf256 sade:

Hi,
I have an UNICODE text file endcoded in UTF-8.

I should store the UNICODE strings in my program for example in
std::wstring right? To be able to work on them normally, so that
std::wstring foo; foo[5] would mean 5-th _character_, and not 5-th
byte of UNICODE encoded string.

How do I read a text from UTF-8 file into std::wstring? I need to do
some conversion right? from utf-8 to internal format used by
std::wstring (probably UCS-2 or -4 right?)

Also, how to save back the string, and how to manipulate it (like,
replace 4-th character, just str[4]=(wchar)''x'' ?)



Upon reading the UTF-8 data convert it internally to UTF-32 for
easier parsing. The conversion process is quite easy to write.
The problem with std::wstring is that it''s templatized with
wchar_t, and that primitive is at least on my machine only 2 bytes,
and therefore not practical to use with unicode (unless you actually
wish to use the abnormal UTF-16 variant in such a case).

--
TB @ SWEDEN


TB wrote:

Upon reading the UTF-8 data convert it internally to UTF-32 for
easier parsing.
How? Arent there ready to use functions/classes doing that? In std,
perhaps in boost?
The conversion process is quite easy to write. The problem with std::wstring is that it''s templatized with
wchar_t, and that primitive is at least on my machine only 2 bytes,
and therefore not practical to use with unicode (unless you actually
wish to use the abnormal UTF-16 variant in such a case).



Hm.. so which class is best to store any-language text string then?


"Rafal Maj Raf256" <us*******************@raf256.com.invalid> wrote in
message news:dq**********@inews.gazeta.pl...

TB wrote:

Upon reading the UTF-8 data convert it internally to UTF-32 for
easier parsing.



How? Arent there ready to use functions/classes doing that? In std,
perhaps in boost?



You''ll find a few codecvt facets (the critters you need) in various places,
but for a complete set of all that you''re likely to need -- ready made,
tested, and supported -- see our CoreX library.

The conversion process is quite easy to write.
No it isn''t. At least not correctly and robustly.
The problem with std::wstring is that it''s templatized with
wchar_t, and that primitive is at least on my machine only 2 bytes,
and therefore not practical to use with unicode (unless you actually
wish to use the abnormal UTF-16 variant in such a case).



Hm.. so which class is best to store any-language text string then?



Depends on your goals. In truth and reality, you can still get away quite
nicely with UCS-2. Effectively, you ignore the exotic characters with
code values above 0xffff more recently added. Your input converter
then treats as erroneous any UTF-8 sequence that specifies a code
value that''s too big. But if you feel the need to support the complete
Unicode set in its current form, you need to convert UTF-8 to UTF-16
internally, and accept the fact that characters can occupy either one or
two storage elements. Whatever your choice, CoreX has the
conversion tools you need to carry it out.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com


这篇关于c ++对unicode,utf-8,编码/解码,ifstream,wstream的支持?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆