如何使 grep [A-Z] 独立于语言环境? [英] How to make grep [A-Z] independent of locale?
问题描述
我正在做一些日常的 grepping,突然发现一些看似微不足道的东西不起作用:
I was doing some everyday grepping and suddenly discovered that something seemingly trivial does not work:
$ echo T | grep [A-Z]
不匹配.
T 怎么不在 A-Z 范围内?
How come T is not within A-Z range?
我稍微改变了正则表达式:
I changed the regex a tiny bit:
$ echo T | grep [A-Y]
匹配!
哇!T 在 A-Y 内而不是在 A-Z 内如何?
Whoa! How is T within A-Y but not within A-Z?
显然这是因为我的环境设置为爱沙尼亚语语言环境,其中 Y 位于字母表的末尾,而 Z 位于中间的某个位置:ABCDEFGHIJKLMNOPQRSŠZŽTUVWÕÄÖÜXY
Apparently this is because my environment is set to Estonian locale where Y is at the end of the alphabet but Z is somewhere in the middle: ABCDEFGHIJKLMNOPQRSŠZŽTUVWÕÄÖÜXY
$ echo $LANG
et_EE.UTF-8
这一切让我有点震惊.99% 的时间我都用 grep 计算机代码,而不是爱沙尼亚文学.我一直在以错误的方式使用 grep 吗?过去我为此犯过哪些错误?
This all came as a bit of a shock to me. 99% of the time I grep computer code, not Estonian literature. Have I been using grep the wrong way all the time? What all kind of mistakes have I made because of this in the past?
在尝试了几件事之后,我得出了以下解决方案:
After trying several things I arrived at the following solution:
$ echo T | LANG=C grep [A-Z]
这是使 grep 语言环境独立的推荐方法吗?
更多...定义这样的别名是否安全:
Further more... would it be safe to define an alias like that:
$ alias grep="LANG=C grep"
PS. 我也想知道为什么像 [AZ]
这样的字符范围首先依赖于语言环境,而 w
似乎不受语言环境的影响(虽然手册说 w
等效于 [[:alnum:]]
- 但我发现后者取决于语言环境而 w
没有).
PS. I'm also wondering of why are the character ranges like [A-Z]
locale dependent in the first place while w
seems to be unaffected by locale (although the manual says w
is equivalent of [[:alnum:]]
- but I found out the latter depends on locale while w
does not).
推荐答案
POSIX 正则表达式,Linux 和 FreeBSD grep 自然支持,还有一些其他人应要求支持,有一系列 [:xxx:] 模式来尊重语言环境.有关详细信息,请参阅手册页.
POSIX regular expressions, which Linux and FreeBSD grep support naturally, and some others support on request, have a series of [:xxx:] patterns that honor locales. See the man page for details.
grep '[[:upper:]]'
由于 [] 是模式名称的一部分,因此您也需要外部 [],无论它看起来多么奇怪.
As the []s are part of the pattern name you need the outer [] as well, regardless of how strange it looks.
随着这些 : 代码的出现,经典的 w 等将严格保留在 C 语言环境中.因此,您对模式的选择决定了 grep 是否使用当前语言环境.
With the advent of these : codes the classic w, etc., remain strictly in the C locale. Thus your choice of patterns determines if grep uses the current locale or not.
[A-Z] 应该遵循区域设置,但您可能需要设置 LC_ALL 而不是 LANG,尤其是如果系统将 LC_ALL 设置为不同的值.
[A-Z] should follow locale, but you may need to set LC_ALL rather than LANG, especially if the system sets LC_ALL to a different value for your.
这篇关于如何使 grep [A-Z] 独立于语言环境?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!