LC_ALL=C 对加速 grep 的影响 [英] Implications of LC_ALL=C to speedup grep

查看:41
本文介绍了LC_ALL=C 对加速 grep 的影响的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚发现,如果我用 LC_ALL=C 在我的 grep 命令前加上前缀,它确实可以加快 grep 的速度.

I just discovered that if i prefix my grep commands with a LC_ALL=C it does wonders for speeding grep up.

但我想知道其中的含义.

But i am wondering about the implications.

使用 UTF-8 的模式会不会不匹配?如果 grepped 文件使用 UTF-8,会发生什么?

Would a pattern using UTF-8 not match? What happens if the grepped file is using UTF-8?

推荐答案

您不一定需要 UTF-8 才能在这里遇到麻烦.语言环境负责设置字符类,即确定哪个字符是空格、字母或数字.考虑以下两个示例:

You don't necessarily need UTF-8 to run into trouble here. The locale is responsible for setting the character classes, i.e. determining which character is a space, a letter or a digit. Consider these two examples:

$ echo -e 'xe4' | LC_ALL=en_US.iso88591 grep '[[:alnum:]]' || echo false
ä
$ echo -e 'xe4' | LC_ALL=C grep '[[:alnum:]]' || echo false
false

当尝试将精确的二进制模式相互匹配时,语言环境没有任何区别:

When trying to match exact binary patterns against each other, the locale doesn't make a difference, however:

$ echo -e 'xe4' | LC_ALL=en_US.iso88591 grep "$(echo -e 'xe4')" || echo false
ä
$ echo -e 'xe4' | LC_ALL=C grep "$(echo -e 'xe4')" || echo false
ä

我不确定 grep 实现 unicode 的范围,以及不同代码点彼此匹配的程度,但匹配 ASCII 的任何子集和匹配没有替代二进制表示的单个字符应该可以正常工作,而不管语言环境如何.

I'm not sure about the extent of grep implementing unicode, and how well different codepoints are matched to each other, but matching any subset of ASCII and the matching of single characters without alternate binary representations should work fine regardless of locale.

这篇关于LC_ALL=C 对加速 grep 的影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆