C ++非ASCII字母 [英] C++ Non ASCII letters

查看:81
本文介绍了C ++非ASCII字母的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当字符串具有非ASCII字符时,如何遍历字符串的字母?
在Windows上可以使用!

  for(int i = 0; i< text.length(); i ++ )
{
std :: cout<< text [i]
}

但是在Linux上,如果我这样做:

  std :: string text =á; 
std :: cout<< text.length()<< std :: endl;

它告诉我字符串á的长度为2,而在Windows上只有1
但是使用ASCII字母可以很好地工作!

解决方案

在Windows系统的代码页,á是一个单字节字符,即<$ c中的每个 char $ c> string 确实是一个字符。因此,您可以循环并打印它们。



在Linux上,á表示为多字节(准确地说是2个字节)utf-8字符 C3 A1。这意味着在您的字符串中,á实际上由两个 char 组成,并且分别打印(或以任何方式处理它们)会产生废话。 ASCII字符永远不会发生这种情况,因为每个ASCII字符的utf-8表示形式都适合一个字节。



不幸的是,C ++标准并不真正支持utf-8。设备。只要您只处理整个字符串,既不从中访问单个 char s,也不假定 string 等于 string std :: string

如果您需要更多utf-8支持,请寻找一个可以实现所需功能的好的库。



您可能还想阅读,以获取有关以下内容的详细讨论在不同的系统上使用不同的字符集,并提供有关 string wstring 的建议。



也可以看看,以获取有关如何可移植地处理不同字符编码的信息。


How do i loop through the letters of a string when it has non ASCII charaters? This works on Windows!

for (int i = 0; i < text.length(); i++)
{
    std::cout << text[i]
}

But on linux if i do:

std::string text = "á";
std::cout << text.length() << std::endl;

It tells me the string "á" has a length of 2 while on windows it's only 1 But with ASCII letters it works good!

解决方案

In your windows system's code page, á is a single byte character, i.e. every char in the string is indeed a character. So you can just loop and print them.

On Linux, á is represented as the multibyte (2 bytes to be exact) utf-8 character 'C3 A1'. This means that in your string, the á actually consists of two chars, and printing those (or handling them in any way) separately yields nonsense. This will never happen with ASCII characters because the utf-8 representation of every ASCII character fits in a single byte.

Unfortunately, utf-8 is not really supported by C++ standard facilities. As long as you only handle the whole string and neither access individual chars from it nor assume the length of the string equals the number of actual characters in the string, std::string will most likely do fine.

If you need more utf-8 support, look for a good library that implements what you need.

You might also want to read this for a more detailed discussion on different character sets on different systems and advice regarding string vs. wstring.

Also have a look at this for information on how to handle different character encodings portably.

这篇关于C ++非ASCII字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆