如何从UTF-8字符串的每个字符获取UNICODE代码? [英] How to get the UNICODE code from each character of a UTF-8 string?

查看:158
本文介绍了如何从UTF-8字符串的每个字符获取UNICODE代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用C ++ 11,如何从UTF-8编码的 std :: string 中获取文本的每个字符的Unicode值到一个 uint32_t



如下所示:

  void f(const std :: string& utf8_str)
{
for(???){
uint32_t code = ???;

/ *用代码我的东西... * /
}
}

假设主机系统区域设置是UTF-8有帮助吗?什么标准库工具C ++ 11为任务提供?

解决方案

您可以简单地将字符串转换为UTF-32编码一个,使用提供的转换方面和 std :: wstring_convert < locale>

  #include< codecvt> 
#include< locale>
#include< string>

void foo(std :: string const& utf8str)
{
std :: wstring_convert< std :: codecvt_utf8< char32_t>,char32_t> CONV;
std :: u32string utf32str = conv.from_bytes(utf8str);

for(char32_t u:utf32str){/ * ... * /}
}


With C++11, how can I, from an UTF-8 encoded std::string, get the Unicode value of each character of the text into an uint32_t?

Something like:

void f(const std::string &utf8_str)
{
    for(???) {
       uint32_t code = ???;

       /* Do my stuff with the code... */
    }
}

Does assuming the host system locale is UTF-8 helps? What standard library tools C++11 offers for the task?

解决方案

You can simply convert the string into a UTF-32 encoded one, using the provided conversion facet and std::wstring_convert from <locale>:

#include <codecvt>
#include <locale>
#include <string>

void foo(std::string const & utf8str)
{
     std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
     std::u32string utf32str = conv.from_bytes(utf8str);

     for (char32_t u : utf32str)  { /* ... */ }
}

这篇关于如何从UTF-8字符串的每个字符获取UNICODE代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆