regcomp的一个字符是什么?哪种多字节编码决定了这一点? [英] What does constitute one character for regcomp? Which multibyte encoding does determine this?

查看:141
本文介绍了regcomp的一个字符是什么?哪种多字节编码决定了这一点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

regcomp(来自glibc)是用于编译正则表达式的POSIX函数.

regcomp (from glibc) is a POSIX function for compiling regular expressions.

     int regcomp(regex_t *restrict preg, const char *restrict pattern,
     int cflags);

正则表达式中有一些构造取决于单个字符的概念,例如[abc].

There are some constructions in regular expressions which depend on the idea of a single character, for example [abc].

如果在表达式中使用了多字节编码并且使用了多字节字母,则将其视为字节序列或多字节字母序列都将有所不同.

If a multibyte encoding is used and a multibyte letter is used in the expression, the interpretation would be different if it treated either as a byte-sequence or a sequence of multibyte letters.

在这里,我用grep(在这方面不得与C函数regcomp相同)说明这个想法:

Here I illustrate this idea with grep (which must not be the same in this respect as the C function regcomp):

$ { echo Г; echo Д; } | egrep '[Д]'
Д
$ { echo Г; echo Д; } | LANG=C egrep '[Д]'
Г
Д
$ 

如果未设置任何特定的语言环境变量,则

LANG是默认值,因此问题是:其中哪个变量会影响regcomp的编码概念.

LANG is the default value if any of the specific locale variables are not set, so the question is: which one of them would affect the regcomp's idea about the encoding.

$ locale
LANG=ru_RU.utf8
LC_CTYPE="ru_RU.utf8"
LC_NUMERIC="ru_RU.utf8"
LC_TIME="ru_RU.utf8"
LC_COLLATE="ru_RU.utf8"
LC_MONETARY="ru_RU.utf8"
LC_MESSAGES=POSIX
LC_PAPER="ru_RU.utf8"
LC_NAME="ru_RU.utf8"
LC_ADDRESS="ru_RU.utf8"
LC_TELEPHONE="ru_RU.utf8"
LC_MEASUREMENT="ru_RU.utf8"
LC_IDENTIFICATION="ru_RU.utf8"
LC_ALL=
$ 

推荐答案

grep一样(其行为不得与regcomp相同),似乎很荣幸为此决定使用LC_CTYPE:

As for grep (which must not have the same behavior as regcomp), it seems to honor LC_CTYPE for this decision:

$ { echo Г; echo Д; } | LANG=en_US.utf8 egrep '[Д]'
Д
$ { echo Г; echo Д; } | LANG=en_US.utf8 LC_COLLATE=C egrep '[Д]'
Д
$ { echo Г; echo Д; } | LANG=en_US.utf8 LC_CTYPE=C egrep '[Д]'
Г
Д
$ 

这篇关于regcomp的一个字符是什么?哪种多字节编码决定了这一点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆