如何使 grep [A-Z] 独立于语言环境? [英] How to make grep [A-Z] independent of locale?

查看:37
本文介绍了如何使 grep [A-Z] 独立于语言环境?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一些日常的 grepping,突然发现一些看似微不足道的东西不起作用:

I was doing some everyday grepping and suddenly discovered that something seemingly trivial does not work:

$ echo T | grep [A-Z]

不匹配.

T 怎么不在 A-Z 范围内?

How come T is not within A-Z range?

我稍微改变了正则表达式:

I changed the regex a tiny bit:

$ echo T | grep [A-Y]

匹配!

哇!T 在 A-Y 内而不是在 A-Z 内如何?

Whoa! How is T within A-Y but not within A-Z?

显然这是因为我的环境设置为爱沙尼亚语语言环境,其中 Y 位于字母表的末尾,而 Z 位于中间的某个位置:ABCDEFGHIJKLMNOPQRSŠZŽTUVWÕÄÖÜXY

Apparently this is because my environment is set to Estonian locale where Y is at the end of the alphabet but Z is somewhere in the middle: ABCDEFGHIJKLMNOPQRSŠZŽTUVWÕÄÖÜXY

$ echo $LANG
et_EE.UTF-8

这一切让我有点震惊.99% 的时间我都用 grep 计算机代码,而不是爱沙尼亚文学.我一直在以错误的方式使用 grep 吗?过去我为此犯过哪些错误?

This all came as a bit of a shock to me. 99% of the time I grep computer code, not Estonian literature. Have I been using grep the wrong way all the time? What all kind of mistakes have I made because of this in the past?

在尝试了几件事之后,我得出了以下解决方案:

After trying several things I arrived at the following solution:

$ echo T | LANG=C grep [A-Z]

这是使 grep 语言环境独立的推荐方法吗?

更多...定义这样的别名是否安全:

Further more... would it be safe to define an alias like that:

$ alias grep="LANG=C grep"

PS. 我也想知道为什么像 [AZ] 这样的字符范围首先依赖于语言环境,而 w似乎不受语言环境的影响(虽然手册说 w 等效于 [[:alnum:]] - 但我发现后者取决于语言环境而 w 没有).

PS. I'm also wondering of why are the character ranges like [A-Z] locale dependent in the first place while w seems to be unaffected by locale (although the manual says w is equivalent of [[:alnum:]] - but I found out the latter depends on locale while w does not).

推荐答案

POSIX 正则表达式,Linux 和 FreeBSD grep 自然支持,还有一些其他人应要求支持,有一系列 [:xxx:] 模式来尊重语言环境.有关详细信息,请参阅手册页.

POSIX regular expressions, which Linux and FreeBSD grep support naturally, and some others support on request, have a series of [:xxx:] patterns that honor locales. See the man page for details.

   grep '[[:upper:]]' 

由于 [] 是模式名称的一部分,因此您也需要外部 [],无论它看起来多么奇怪.

As the []s are part of the pattern name you need the outer [] as well, regardless of how strange it looks.

随着这些 : 代码的出现,经典的 w 等将严格保留在 C 语言环境中.因此,您对模式的选择决定了 grep 是否使用当前语言环境.

With the advent of these : codes the classic w, etc., remain strictly in the C locale. Thus your choice of patterns determines if grep uses the current locale or not.

[A-Z] 应该遵循区域设置,但您可能需要设置 LC_ALL 而不是 LANG,尤其是如果系统将 LC_ALL 设置为不同的值.

[A-Z] should follow locale, but you may need to set LC_ALL rather than LANG, especially if the system sets LC_ALL to a different value for your.

这篇关于如何使 grep [A-Z] 独立于语言环境?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆