LC_ALL的C =启示用来加快的grep [英] Implications of LC_ALL=C to speedup grep

查看:220
本文介绍了LC_ALL的C =启示用来加快的grep的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚发现,如果我preFIX我的grep用LC_ALL = C命令它对于加快了grep的奇迹。

I just discovered that if i prefix my grep commands with a LC_ALL=C it does wonders for speeding grep up.

但我想知道的问题。

会使用UTF-8的模式不匹配?
如果grepped文件使用UTF-8?

Would a pattern using UTF-8 not match? What happens if the grepped file is using UTF-8?

推荐答案

您不一定需要UTF-8在这里遇到麻烦。该区域是负责设置字符类,即确定哪些字符是一个空间,一个字母或数字。考虑以下两个例子:

You don't necessarily need UTF-8 to run into trouble here. The locale is responsible for setting the character classes, i.e. determining which character is a space, a letter or a digit. Consider these two examples:

$ echo -e '\xe4' | LC_ALL=en_US.iso88591 grep '[[:alnum:]]' || echo false
ä
$ echo -e '\xe4' | LC_ALL=C grep '[[:alnum:]]' || echo false
false

当试图对对方确切的匹配二进制模式,区域设置不有所作为,但是:

When trying to match exact binary patterns against each other, the locale doesn't make a difference, however:

$ echo -e '\xe4' | LC_ALL=en_US.iso88591 grep "$(echo -e '\xe4')" || echo false
ä
$ echo -e '\xe4' | LC_ALL=C grep "$(echo -e '\xe4')" || echo false
ä

我不知道有关的grep实施单向code程度,codepoints有多好不同的相互匹配,但匹配ASCII的任何子集和单个字符没有备用的二进制重新匹配presentations应该做工精细的语言环境无关。

I'm not sure about the extent of grep implementing unicode, and how well different codepoints are matched to each other, but matching any subset of ASCII and the matching of single characters without alternate binary representations should work fine regardless of locale.

这篇关于LC_ALL的C =启示用来加快的grep的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆