std :: string与Unicode UTF-8 [英] std::string vs. Unicode UTF-8

查看:96
本文介绍了std :: string与Unicode UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,完全可以在std :: string中存储UTF-8字符串

,但这样做可能会产生一些影响。

例如你不能按长度计算字符数量()|

size()。相反,必须遍历字符串,解析所有

UTF-8多字节并将每个多字节计为一个字符。


要解决此问题,GTKmm绑定对于GTK +工具包

已经实现了自己的字符串类Glib :: ustring

< http://tinyurl.com/bxpu4>在字符串中处理UTF-8。


问题是,使用std :: string

识别Unicode是不合逻辑的下一个STL版本? I18N现在是一个重要的

主题,我认为没有合理的理由来保持

std :: string的限制,就像现在这样。当然还有

也是wchar_t变体,但实际上我不喜欢它。


Wolfgang Draxinger

- -


---

[comp.std.c ++经过审核。要提交文章,请尝试发布]

[您的新闻阅读器。如果失败,请使用mailto:st ***** @ ncar.ucar.edu]

[---请在发布前查看常见问题解答。 ---]

[常见问题: http://www.jamesd.demon.co.uk/csc/faq.html ]

I understand that it is perfectly possible to store UTF-8 strings
in a std::string, however doing so can cause some implicaions.
E.g. you can''t count the amount of characters by length() |
size(). Instead one has to iterate through the string, parse all
UTF-8 multibytes and count each multibyte as one character.

To address this problem the GTKmm bindings for the GTK+ toolkit
have implemented a own string class Glib::ustring
<http://tinyurl.com/bxpu4> which takes care of UTF-8 in strings.

The question is, wouldn''t it be logical to make std::string
Unicode aware in the next STL version? I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. Of course there is
also the wchar_t variant, but actually I don''t like that.

Wolfgang Draxinger
--

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

推荐答案

>问题是,在下一个STL版本中使std :: string
> The question is, wouldn''t it be logical to make std::string
Unicode识别是不合逻辑的? I18N现在是一个重要的话题,我认为没有合理的理由让
std :: string受到限制,就像现在一样。当然还有wchar_t变体,但实际上我并不喜欢它。
Unicode aware in the next STL version? I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. Of course there is
also the wchar_t variant, but actually I don''t like that.




内部使用wchar_t处理unicode字符串要容易得多并且

关于字符串是ANSI还是UTF8

编码的混淆要少得多。所以我已经开始在任何地方使用wchar_t并且我只使用UTF8

进行外部通信。


Niels Dybdahl

- -

[comp.std.c ++经过审核。要提交文章,请尝试发布]

[您的新闻阅读器。如果失败,请使用mailto:st ***** @ ncar.ucar.edu]

[---请在发布前查看常见问题解答。 ---]

[常见问题: http://www.jamesd.demon.co.uk/csc/faq.html ]



It is much easier to handle unicode strings with wchar_t internally and
there is much less confusion about whether the string is ANSI or UTF8
encoded. So I have started using wchar_t wherever I can and I only use UTF8
for external communication.

Niels Dybdahl
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Wolfgang Draxinger写道:
Wolfgang Draxinger wrote:
我知道完全有可能在std :: string中存储UTF-8字符串
,但这样做可能会产生一些影响。
例如你不能用长度()|
size()计算字符数。相反,必须遍历字符串,解析所有
UTF-8多字节并将每个多字节计为一个字符。

要解决此问题GTK +工具包的GTKmm绑定
已经实现了自己的字符串类Glib :: ustring
< http://tinyurl.com/bxpu4>在字符串中处理UTF-8。

问题是,在下一个STL版本中使std :: string
识别Unicode是不合逻辑的? I18N现在是一个重要的话题,我认为没有合理的理由让
std :: string受到限制,就像现在一样。当然还有wchar_t变体,但实际上我不喜欢它。

Wolfgang Draxinger
I understand that it is perfectly possible to store UTF-8 strings
in a std::string, however doing so can cause some implicaions.
E.g. you can''t count the amount of characters by length() |
size(). Instead one has to iterate through the string, parse all
UTF-8 multibytes and count each multibyte as one character.

To address this problem the GTKmm bindings for the GTK+ toolkit
have implemented a own string class Glib::ustring
<http://tinyurl.com/bxpu4> which takes care of UTF-8 in strings.

The question is, wouldn''t it be logical to make std::string
Unicode aware in the next STL version? I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. Of course there is
also the wchar_t variant, but actually I don''t like that.

Wolfgang Draxinger




UTF -8只是一种编码,为什么你认为

程序内部的字符串应该表示为UTF-8?当你从

程序输入或输出字符串时,

转换为UTF-8或从UTF-8转换为更有意义。 C ++已经有了适合它的框架。


john


---

[comp.std .c ++经过审核。要提交文章,请尝试发布]

[您的新闻阅读器。如果失败,请使用mailto:st ***** @ ncar.ucar.edu]

[---请在发布前查看常见问题解答。 ---]

[常见问题: http://www.jamesd.demon.co.uk/csc/faq.html ]



UTF-8 is only an encoding, why to you think a strings internal to the
program should be represented as UTF-8? Makes more sense to me to
translate to or from UTF-8 when you input or output strings from your
program. C++ already has the framework in place for that.

john

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]




Wolfgang Draxinger写道:

Wolfgang Draxinger wrote:
我知道完全有可能在std :: string中存储UTF-8字符串
,但这样做可能会产生一些影响。
例如你不能用长度()|
size()计算字符数。相反,必须遍历字符串,解析所有
UTF-8多字节并将每个多字节计为一个字符。


正确。你也不能打印它或其他任何东西。

为了解决这个问题,GTK +工具包的GTKmm绑定已经实现了一个自己的字符串类Glib :: ustring
< ; HTTP://tinyurl.com/bxpu4>它在字符串中处理UTF-8。


好​​的。

问题是,在下一个STL版本中识别std :: string
Unicode是不合逻辑的?
已经是 - 使用例如wchar_t的。 I18N现在是一个重要的话题,我认为没有合理的理由让
std :: string受到限制,就像现在一样。
它不受限制。当然还有wchar_t变体,但实际上我并不喜欢它。


所以你想要支持Unicode。而且你意识到你已经有了b $ b。但你不喜欢它。为什么?
Wolfgang Draxinger
-
I understand that it is perfectly possible to store UTF-8 strings
in a std::string, however doing so can cause some implicaions.
E.g. you can''t count the amount of characters by length() |
size(). Instead one has to iterate through the string, parse all
UTF-8 multibytes and count each multibyte as one character.
Correct. Also you can''t print it or anything else.

To address this problem the GTKmm bindings for the GTK+ toolkit
have implemented a own string class Glib::ustring
<http://tinyurl.com/bxpu4> which takes care of UTF-8 in strings.
Ok.

The question is, wouldn''t it be logical to make std::string
Unicode aware in the next STL version? It already is - using e.g. wchar_t. I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. It is not limited.Of course there is
also the wchar_t variant, but actually I don''t like that.
So you''d like to have Unicode support. And you realize you already have
it. But you don''t like it. Why?
Wolfgang Draxinger
--



/ Peter


---

[comp.std.c ++经过审核。要提交文章,请尝试发布]

[您的新闻阅读器。如果失败,请使用mailto:st ***** @ ncar.ucar.edu]

[---请在发布前查看常见问题解答。 ---]

[常见问题: http://www.jamesd.demon.co.uk/csc/faq.html ]


/Peter

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


这篇关于std :: string与Unicode UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆