Do C ++ 11正则表达式使用UTF-8字符串? [英] Do C++11 regular expressions work with UTF-8 strings?

查看:280
本文介绍了Do C ++ 11正则表达式使用UTF-8字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我想使用C ++ 11的正则表达式与unicode字符串,它们将使用char *作为UTF-8或者我必须将它们转换为一个wchar_t *字符串?

解决方案

你需要测试你的编译器和你使用的系统,但在理论上,如果你的系统有一个UTF-8语言环境,它将被支持。以下测试在Clang / OS X上为我返回true。

  bool test_unicode()
{
std :: locale old;
std :: locale :: global(std :: locale(en_US.UTF-8));

std :: regex pattern([[:alpha:]] +,std :: regex_constants :: extended);
bool result = std :: regex_match(std :: string(abcdéfg),pattern);

std :: locale :: global(old);

返回结果;
}

注意:这是编译在一个UTF- / p>




为了安全我还使用了一个带有显式十六进制版本的字符串。它也工作。

  bool test_unicode2()
{
std :: locale old;
std :: locale :: global(std :: locale(en_US.UTF-8));

std :: regex pattern([[:alpha:]] +,std :: regex_constants :: extended);
bool result = std :: regex_match(std :: string(abcd\xC3\xA9fg),pattern);

std :: locale :: global(old);

返回结果;
}


If I want to use C++11's regular expressions with unicode strings, will they work with char* as UTF-8 or do I have to convert them to a wchar_t* string?

解决方案

You would need to test your compiler and the system you are using, but in theory, it will be supported if your system has a UTF-8 locale. The following test returned true for me on Clang/OS X.

bool test_unicode()
{
    std::locale old;
    std::locale::global(std::locale("en_US.UTF-8"));

    std::regex pattern("[[:alpha:]]+", std::regex_constants::extended);
    bool result = std::regex_match(std::string("abcdéfg"), pattern);

    std::locale::global(old);

    return result;
}

NOTE: This was compiled in a file what was UTF-8 encoded.


Just to be safe I also used a string with the explicit hex versions. It worked also.

bool test_unicode2()
{
    std::locale old;
    std::locale::global(std::locale("en_US.UTF-8"));

    std::regex pattern("[[:alpha:]]+", std::regex_constants::extended);
    bool result = std::regex_match(std::string("abcd\xC3\xA9""fg"), pattern);

    std::locale::global(old);

    return result;
}

这篇关于Do C ++ 11正则表达式使用UTF-8字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆