std::string 字符编码 [英] std::string character encoding

查看：46 发布时间：2021/8/30 19:12:10 c++ utf-8 stdstring

本文介绍了std::string 字符编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

std::string arrWords[10];
std::vector<std::string> hElemanlar;

......

this->hElemanlar.push_back(std::string(1, this->arrWords[sayKelime][j]).c_str());

......

我正在做的是:arrWord 的每个元素都是一个 std::string.我得到了 arrWord 的第 n 个元素，然后将它们推入 hElemanlar.

What i am doing is: Every element of arrWord is a std::string. I get the n th element of arrWord and then push them into hElemanlar.

假设 arrWords[0] 是test"，那么:

Assuming arrWords[0] is "test", then:

this->hElemanlar.push_back("t");
this->hElemanlar.push_back("e");
this->hElemanlar.push_back("s");
this->hElemanlar.push_back("t");

我的问题是，虽然我在使用 arrWords 时没有编码问题，但在 hElemanlar 中，某些 utf-8 字符没有得到很好的打印或处理.我该如何解决?s

And my problem is although i have no encoding problems with arrWords, some utf-8 characters are not printed or treated well in hElemanlar. How can i fix it?s

推荐答案

如果您知道 arrWords[i] 包含 UTF-8 编码的文本，那么您可能需要将字符串拆分为完整的 Unicode人物.

If you know that arrWords[i] contains UTF-8 encoded text, then you probably need to split the strings into complete Unicode characters.

顺便说一句，而不是说:

As an aside, rather than saying:

this->hElemanlar.push_back(std::string(1, this->arrWords[sayKelime][j]).c_str());

(构造一个临时 std::string，获得它的 c 字符串表示，构造另一个 临时字符串，并将其推送到向量上)，例如:

(which constructs a temporary std::string, obtains a the c-string representation of it, constructs another temporary string, and pushes that onto the vector), say:

this->hElemanlar.push_back(std::string(1, this->arrWords[sayKelime][j]))

无论如何.这将需要变成这样:

Anyway. This will need to become something like:

std::string str(1, this-arrWords[sayKelime][j])
if (static_cast<unsigned char>(str[0]) >= 0xC0)
{
   for (const char c = this-arrWords[sayKelime][j+1];
        static_cast<unsigned char>(c) >= 0x80;
        j++)
   {
       str.push_back(c);
   }
}
this->hElemenlar.push_back(str);

注意上面的循环是安全的，因为如果 j 是字符串中最后一个字符的索引，[j+1] 将返回空终止符(这将结束循环).不过，您需要考虑递增 j 如何与其余代码交互.

Note that the above loop is safe, because if j is the index of the last char in the string, [j+1] will return the nul-terminator (which will end the loop). You will need to consider how incrementing j interacts with the rest of your code though.

然后，您需要考虑是否希望 hElemanlar 表示单个 Unicode 代码点(这样做)，还是希望包含一个字符 + 后面的所有组合字符?在后一种情况下，您必须将上面的代码扩展为:

You then need to consider whether you want hElemanlar to represent individual Unicode code points (which this does), or do you want to include a character + all the combining characters that follow? In the latter case, you would have to extend the code above to:

解析下一个代码点
判断是否是组合字符
如果是，则在字符串上推送 UTF-8 序列.
重复(一个字符上可以有多个组合字符).

这篇关于std::string 字符编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

std::string 字符编码 [英] std::string character encoding

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

std::string 字符编码 [英] std::string character encoding

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭