std :: string与Unicode UTF-8 [英] std::string vs. Unicode UTF-8
问题描述
据我所知,完全可以在std :: string中存储UTF-8字符串
,但这样做可能会产生一些影响。
例如你不能按长度计算字符数量()|
size()。相反,必须遍历字符串,解析所有
UTF-8多字节并将每个多字节计为一个字符。
要解决此问题,GTKmm绑定对于GTK +工具包
已经实现了自己的字符串类Glib :: ustring
< http://tinyurl.com/bxpu4>在字符串中处理UTF-8。
问题是,使用std :: string
识别Unicode是不合逻辑的下一个STL版本? I18N现在是一个重要的
主题,我认为没有合理的理由来保持
std :: string的限制,就像现在这样。当然还有
也是wchar_t变体,但实际上我不喜欢它。
Wolfgang Draxinger
- -
---
[comp.std.c ++经过审核。要提交文章,请尝试发布]
[您的新闻阅读器。如果失败,请使用mailto:st ***** @ ncar.ucar.edu]
[---请在发布前查看常见问题解答。 ---]
[常见问题: http://www.jamesd.demon.co.uk/csc/faq.html ]
I understand that it is perfectly possible to store UTF-8 strings
in a std::string, however doing so can cause some implicaions.
E.g. you can''t count the amount of characters by length() |
size(). Instead one has to iterate through the string, parse all
UTF-8 multibytes and count each multibyte as one character.
To address this problem the GTKmm bindings for the GTK+ toolkit
have implemented a own string class Glib::ustring
<http://tinyurl.com/bxpu4> which takes care of UTF-8 in strings.
The question is, wouldn''t it be logical to make std::string
Unicode aware in the next STL version? I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. Of course there is
also the wchar_t variant, but actually I don''t like that.
Wolfgang Draxinger
--
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
推荐答案
>问题是,在下一个STL版本中使std :: string
> The question is, wouldn''t it be logical to make std::string
Unicode识别是不合逻辑的? I18N现在是一个重要的话题,我认为没有合理的理由让
std :: string受到限制,就像现在一样。当然还有wchar_t变体,但实际上我并不喜欢它。
Unicode aware in the next STL version? I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. Of course there is
also the wchar_t variant, but actually I don''t like that.
内部使用wchar_t处理unicode字符串要容易得多并且
关于字符串是ANSI还是UTF8
编码的混淆要少得多。所以我已经开始在任何地方使用wchar_t并且我只使用UTF8
进行外部通信。
Niels Dybdahl
- -
[comp.std.c ++经过审核。要提交文章,请尝试发布]
[您的新闻阅读器。如果失败,请使用mailto:st ***** @ ncar.ucar.edu]
[---请在发布前查看常见问题解答。 ---]
[常见问题: http://www.jamesd.demon.co.uk/csc/faq.html ]
It is much easier to handle unicode strings with wchar_t internally and
there is much less confusion about whether the string is ANSI or UTF8
encoded. So I have started using wchar_t wherever I can and I only use UTF8
for external communication.
Niels Dybdahl
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Wolfgang Draxinger写道:
Wolfgang Draxinger wrote:
我知道完全有可能在std :: string中存储UTF-8字符串
,但这样做可能会产生一些影响。
例如你不能用长度()|
size()计算字符数。相反,必须遍历字符串,解析所有
UTF-8多字节并将每个多字节计为一个字符。
要解决此问题GTK +工具包的GTKmm绑定
已经实现了自己的字符串类Glib :: ustring
< http://tinyurl.com/bxpu4>在字符串中处理UTF-8。
问题是,在下一个STL版本中使std :: string
识别Unicode是不合逻辑的? I18N现在是一个重要的话题,我认为没有合理的理由让
std :: string受到限制,就像现在一样。当然还有wchar_t变体,但实际上我不喜欢它。
Wolfgang Draxinger
I understand that it is perfectly possible to store UTF-8 strings
in a std::string, however doing so can cause some implicaions.
E.g. you can''t count the amount of characters by length() |
size(). Instead one has to iterate through the string, parse all
UTF-8 multibytes and count each multibyte as one character.
To address this problem the GTKmm bindings for the GTK+ toolkit
have implemented a own string class Glib::ustring
<http://tinyurl.com/bxpu4> which takes care of UTF-8 in strings.
The question is, wouldn''t it be logical to make std::string
Unicode aware in the next STL version? I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. Of course there is
also the wchar_t variant, but actually I don''t like that.
Wolfgang Draxinger
UTF -8只是一种编码,为什么你认为
程序内部的字符串应该表示为UTF-8?当你从
程序输入或输出字符串时,
转换为UTF-8或从UTF-8转换为更有意义。 C ++已经有了适合它的框架。
john
---
[comp.std .c ++经过审核。要提交文章,请尝试发布]
[您的新闻阅读器。如果失败,请使用mailto:st ***** @ ncar.ucar.edu]
[---请在发布前查看常见问题解答。 ---]
[常见问题: http://www.jamesd.demon.co.uk/csc/faq.html ]
UTF-8 is only an encoding, why to you think a strings internal to the
program should be represented as UTF-8? Makes more sense to me to
translate to or from UTF-8 when you input or output strings from your
program. C++ already has the framework in place for that.
john
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Wolfgang Draxinger写道:
Wolfgang Draxinger wrote:
我知道完全有可能在std :: string中存储UTF-8字符串
,但这样做可能会产生一些影响。
例如你不能用长度()|
size()计算字符数。相反,必须遍历字符串,解析所有
UTF-8多字节并将每个多字节计为一个字符。
正确。你也不能打印它或其他任何东西。
为了解决这个问题,GTK +工具包的GTKmm绑定已经实现了一个自己的字符串类Glib :: ustring
< ; HTTP://tinyurl.com/bxpu4>它在字符串中处理UTF-8。
好的。
问题是,在下一个STL版本中识别std :: string
Unicode是不合逻辑的?
已经是 - 使用例如wchar_t的。 I18N现在是一个重要的话题,我认为没有合理的理由让
std :: string受到限制,就像现在一样。
它不受限制。当然还有wchar_t变体,但实际上我并不喜欢它。
所以你想要支持Unicode。而且你意识到你已经有了b $ b。但你不喜欢它。为什么?
Wolfgang Draxinger
-
I understand that it is perfectly possible to store UTF-8 strings
in a std::string, however doing so can cause some implicaions.
E.g. you can''t count the amount of characters by length() |
size(). Instead one has to iterate through the string, parse all
UTF-8 multibytes and count each multibyte as one character.
Correct. Also you can''t print it or anything else.
To address this problem the GTKmm bindings for the GTK+ toolkit
have implemented a own string class Glib::ustring
<http://tinyurl.com/bxpu4> which takes care of UTF-8 in strings.
Ok.
The question is, wouldn''t it be logical to make std::string
Unicode aware in the next STL version? It already is - using e.g. wchar_t. I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. It is not limited.Of course there is
also the wchar_t variant, but actually I don''t like that.
So you''d like to have Unicode support. And you realize you already have
it. But you don''t like it. Why?
Wolfgang Draxinger
--
/ Peter
---
[comp.std.c ++经过审核。要提交文章,请尝试发布]
[您的新闻阅读器。如果失败,请使用mailto:st ***** @ ncar.ucar.edu]
[---请在发布前查看常见问题解答。 ---]
[常见问题: http://www.jamesd.demon.co.uk/csc/faq.html ]
/Peter
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
这篇关于std :: string与Unicode UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!