如何在C ++正则表达式中使用Unicode范围 [英] How to use Unicode range in C++ regex

查看：388 发布时间：2020/9/26 22:40:24 c++ regex

本文介绍了如何在C ++正则表达式中使用Unicode范围的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我必须在C ++的正则表达式中使用unicode范围。基本上，我需要有一个正则表达式来接受所有有效的unicode字符。.我只是尝试使用测试表达式，并遇到了一些问题。

I have to use unicode range in a regex in C++. Basically what I need is to have a regex to accept all valid unicode characters..I just tried with the test expression and facing some issues with it.

std::regex reg("^[\\u0080-\\uDB7Fa-z0-9!#$%&'*+/=?^_`{|}~-]+$");

问题出在 \\ ？


推荐答案
这应该可以正常工作，但是您需要使用 std :: wregex 和 std :: wsmatch 。您需要将源字符串和正则表达式转换为 wide 字符unicode（在Linux上为UTF-32，在Windows上为UTF-16（ish））以使其起作用。
This should work fine but you need to use std::wregex and std::wsmatch. You will need to convert the source string and regular expression to wide character unicode (UTF-32 on Linux, UTF-16(ish) on Windows) to make it work.
这对我有用，其中源文本为 UTF-8 ：
This works for me where source text is UTF-8:
inline std::wstring from_utf8(const std::string& utf8)
{
    // code to convert from utf8 to utf32/utf16
}

inline std::string to_utf8(const std::wstring& ws)
{
    // code to convert from utf32/utf16 to utf8
}

int main()
{
    std::string test = "john.doe@神谕.com"; // utf8
    std::string expr = "[\\u0080-\\uDB7F]+"; // utf8

    std::wstring wtest = from_utf8(test);
    std::wstring wexpr = from_utf8(expr);

    std::wregex we(wexpr);
    std::wsmatch wm;
    if(std::regex_search(wtest, wm, we))
    {
        std::cout << to_utf8(wm.str(0)) << '\n';
    }
}

 输出： 
神谕

 注意：：如果您需要 UTF 转换库，我使用了 此一个 。
Note: If you need a UTF conversion library I used THIS ONE in the example above.
 编辑：或，您可以使用此答案中给出的功能：
 Or, you could use the functions given in this answer:
 对C ++字符串代码点和代码单元有好的解决方案吗？ 

                        这篇关于如何在C ++正则表达式中使用Unicode范围的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何在C ++正则表达式中使用Unicode范围 [英] How to use Unicode range in C++ regex

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

如何在C ++正则表达式中使用Unicode范围 [英] How to use Unicode range in C++ regex

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭