Unicode正则表达式C ++ [英] unicode regular expressions c++
问题描述
我想通过使用正则表达式匹配单词février" 或任何其他月份.
I want to match the word "février" or any other month by using regular expression.
正则表达式:
^(JANVIER | FEVRIER | MARS | AVRIL | MAI | JUIN | JUILLET | AOUT | SEPTEMBRE | OCBOB | NOVEMBRE | DECEMBRE | Jan | Feb | Mar | Apr | May | Jun | Jun | Jul | Aug | Sep | Oct | Nov | Dec | [jJ] anvier | [Ff]évrier| [mM] ars | [aA] vril | [mM] ai | [jJ] uin | [jJ] uillet | [aA] o [éû] t | aout | [sS] eptembre | [oO] toc || [nN] ovembre | [dD] [eé] cembre)$
^(JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[Ff]évrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre)$
问题
问题是我无法匹配包含Unicode字母的单词:à,é,è等.
我在以下网站上找到: Unicode 表示unicode值é
的\u00E9
.我可以将此值整合到正则表达式中吗?以及如何在正则表达式中使用unicode值.
Problem
The problem is that I cannot match the words that contain unicode letters: à,é,è etc.
I found on the following website: Unicode that the unicode value of é
is \u00E9
. Can i integrate this value in the regular expression? and how can I use unicode values in regular expressions.
void returnValue(string pattern)
{
bool x = false;
const boost::regex e("février");
x = boost::regex_search(pattern.c_str(),e);
if(x){ cout <<"found"<<endl; }
}
推荐答案
您可以将unicode与boost :: regex匹配.有两种方法可以做到这一点.
You can match a unicode with boost::regex. There are two ways to do it.
-
如果平台的wchar_t可以容纳Unicode字符,并且平台的C/C ++运行时正确处理宽字符常量,则依靠wchar_t. (这有一些陷阱,建议不要在我提供的链接中阅读)
Rely on wchar_t if your platform's wchar_t can hold Unicode characters and your platform's C/C++ runtime correctly handles wide character constants. (this has few pitfalls, not suggested, read about this in the link I provided)
使用Unicode感知的正则表达式类型(boost :: u32regex).必须将Boost配置为通过使用Unicode和ICU支持进行构建
Use a Unicode aware regular expression type (boost::u32regex). Boost has to be configured to enable this via Building With Unicode and ICU Support
http://www. boost.org/doc/libs/1_42_0/libs/regex/doc/html/boost_regex/unicode.html
这篇关于Unicode正则表达式C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!