std :: string与Unicode UTF-8 [英] std::string vs. Unicode UTF-8

查看：96 发布时间：2019/6/4 22:21:21 c

本文介绍了std :: string与Unicode UTF-8的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

据我所知，完全可以在std :: string中存储UTF-8字符串

，但这样做可能会产生一些影响。

例如你不能按长度计算字符数量（）|

size（）。相反，必须遍历字符串，解析所有

UTF-8多字节并将每个多字节计为一个字符。

要解决此问题，GTKmm绑定对于GTK +工具包

已经实现了自己的字符串类Glib :: ustring

< http://tinyurl.com/bxpu4>在字符串中处理UTF-8。

问题是，使用std :: string

识别Unicode是不合逻辑的下一个STL版本？ I18N现在是一个重要的

主题，我认为没有合理的理由来保持

std :: string的限制，就像现在这样。当然还有

也是wchar_t变体，但实际上我不喜欢它。

Wolfgang Draxinger

- -

---

[comp.std.c ++经过审核。要提交文章，请尝试发布]

[您的新闻阅读器。如果失败，请使用mailto：st ***** @ ncar.ucar.edu]

[---请在发布前查看常见问题解答。 ---]

[常见问题： http://www.jamesd.demon.co.uk/csc/faq.html ]

I understand that it is perfectly possible to store UTF-8 strings
in a std::string, however doing so can cause some implicaions.
E.g. you can''t count the amount of characters by length() |
size(). Instead one has to iterate through the string, parse all
UTF-8 multibytes and count each multibyte as one character.

To address this problem the GTKmm bindings for the GTK+ toolkit
have implemented a own string class Glib::ustring
<http://tinyurl.com/bxpu4> which takes care of UTF-8 in strings.

The question is, wouldn''t it be logical to make std::string
Unicode aware in the next STL version? I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. Of course there is
also the wchar_t variant, but actually I don''t like that.

Wolfgang Draxinger
--

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

推荐答案

>问题是，在下一个STL版本中使std :: string

> The question is, wouldn''t it be logical to make std::string

Unicode识别是不合逻辑的？ I18N现在是一个重要的话题，我认为没有合理的理由让
std :: string受到限制，就像现在一样。当然还有wchar_t变体，但实际上我并不喜欢它。

Unicode aware in the next STL version? I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. Of course there is
also the wchar_t variant, but actually I don''t like that.

内部使用wchar_t处理unicode字符串要容易得多并且

关于字符串是ANSI还是UTF8

编码的混淆要少得多。所以我已经开始在任何地方使用wchar_t并且我只使用UTF8

进行外部通信。

Niels Dybdahl

- -

[comp.std.c ++经过审核。要提交文章，请尝试发布]

[您的新闻阅读器。如果失败，请使用mailto：st ***** @ ncar.ucar.edu]

[---请在发布前查看常见问题解答。 ---]

[常见问题： http://www.jamesd.demon.co.uk/csc/faq.html ]

It is much easier to handle unicode strings with wchar_t internally and
there is much less confusion about whether the string is ANSI or UTF8
encoded. So I have started using wchar_t wherever I can and I only use UTF8
for external communication.

Niels Dybdahl
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Wolfgang Draxinger写道：

Wolfgang Draxinger wrote:

我知道完全有可能在std :: string中存储UTF-8字符串
，但这样做可能会产生一些影响。
例如你不能用长度（）|
size（）计算字符数。相反，必须遍历字符串，解析所有
UTF-8多字节并将每个多字节计为一个字符。

要解决此问题GTK +工具包的GTKmm绑定
已经实现了自己的字符串类Glib :: ustring
< http://tinyurl.com/bxpu4>在字符串中处理UTF-8。

问题是，在下一个STL版本中使std :: string
识别Unicode是不合逻辑的？ I18N现在是一个重要的话题，我认为没有合理的理由让
std :: string受到限制，就像现在一样。当然还有wchar_t变体，但实际上我不喜欢它。

Wolfgang Draxinger

I understand that it is perfectly possible to store UTF-8 strings
in a std::string, however doing so can cause some implicaions.
E.g. you can''t count the amount of characters by length() |
size(). Instead one has to iterate through the string, parse all
UTF-8 multibytes and count each multibyte as one character.

To address this problem the GTKmm bindings for the GTK+ toolkit
have implemented a own string class Glib::ustring
<http://tinyurl.com/bxpu4> which takes care of UTF-8 in strings.

The question is, wouldn''t it be logical to make std::string
Unicode aware in the next STL version? I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. Of course there is
also the wchar_t variant, but actually I don''t like that.

Wolfgang Draxinger

UTF -8只是一种编码，为什么你认为

程序内部的字符串应该表示为UTF-8？当你从

程序输入或输出字符串时，

转换为UTF-8或从UTF-8转换为更有意义。 C ++已经有了适合它的框架。

john

---

[comp.std .c ++经过审核。要提交文章，请尝试发布]

[您的新闻阅读器。如果失败，请使用mailto：st ***** @ ncar.ucar.edu]

[---请在发布前查看常见问题解答。 ---]

[常见问题： http://www.jamesd.demon.co.uk/csc/faq.html ]

UTF-8 is only an encoding, why to you think a strings internal to the
program should be represented as UTF-8? Makes more sense to me to
translate to or from UTF-8 when you input or output strings from your
program. C++ already has the framework in place for that.

john

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Wolfgang Draxinger写道：

Wolfgang Draxinger wrote:

我知道完全有可能在std :: string中存储UTF-8字符串
，但这样做可能会产生一些影响。
例如你不能用长度（）|
size（）计算字符数。相反，必须遍历字符串，解析所有
UTF-8多字节并将每个多字节计为一个字符。

正确。你也不能打印它或其他任何东西。

为了解决这个问题，GTK +工具包的GTKmm绑定已经实现了一个自己的字符串类Glib :: ustring
< ; HTTP：//tinyurl.com/bxpu4>它在字符串中处理UTF-8。

好的。

问题是，在下一个STL版本中识别std :: string
Unicode是不合逻辑的？
已经是 - 使用例如wchar_t的。 I18N现在是一个重要的话题，我认为没有合理的理由让
std :: string受到限制，就像现在一样。
它不受限制。当然还有wchar_t变体，但实际上我并不喜欢它。

所以你想要支持Unicode。而且你意识到你已经有了b $ b。但你不喜欢它。为什么？
Wolfgang Draxinger
-

I understand that it is perfectly possible to store UTF-8 strings
in a std::string, however doing so can cause some implicaions.
E.g. you can''t count the amount of characters by length() |
size(). Instead one has to iterate through the string, parse all
UTF-8 multibytes and count each multibyte as one character.
Correct. Also you can''t print it or anything else.

To address this problem the GTKmm bindings for the GTK+ toolkit
have implemented a own string class Glib::ustring
<http://tinyurl.com/bxpu4> which takes care of UTF-8 in strings.
Ok.

The question is, wouldn''t it be logical to make std::string
Unicode aware in the next STL version? It already is - using e.g. wchar_t. I18N is an important
topic nowadays and I simply see no logical reason to keep
std::string as limited as it is nowadays. It is not limited.Of course there is
also the wchar_t variant, but actually I don''t like that.
So you''d like to have Unicode support. And you realize you already have
it. But you don''t like it. Why?
Wolfgang Draxinger
--

/ Peter

---

[comp.std.c ++经过审核。要提交文章，请尝试发布]

[您的新闻阅读器。如果失败，请使用mailto：st ***** @ ncar.ucar.edu]

[---请在发布前查看常见问题解答。 ---]

[常见问题： http://www.jamesd.demon.co.uk/csc/faq.html ]

/Peter

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

这篇关于std :: string与Unicode UTF-8的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

std :: string与Unicode UTF-8 [英] std::string vs. Unicode UTF-8

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

std :: string与Unicode UTF-8 [英] std::string vs. Unicode UTF-8

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭