grep / regex找不到重音字 [英] grep/regex can't find accented word
本文介绍了grep / regex找不到重音字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的问题是,正则表达式是一个正则表达式,无法找到重音词,但在我的文本文件中有很多重音词。
我的命令行是:
cat input / words.txt | grep'^ [éra] \ {1,4 \} $'> output / words_era.txt
cat input / words.txt | grep'^ [carroça] \ {1,7 \} $'>输出/ words_carroca.txt
文件内容为:
carroça
éra
éssa
roça
roco
rato
onça
orça
roca
我该如何解决它?
解决方案
如果你的文件是用ISO-8859-1编码的,但是你的系统区域设置是UTF-8,这是行不通的。
将文件转换为UTF-8或将您的系统区域设置更改为ISO-8859-1。
#convert从擦除前的ISO-8859-1到环境区域设置
#输出将处于当前语言环境
$ iconv -f 8859_1 input / words.txt | grep ...
#使用ISO-8859-1语言环境运行grep
#输出将使用ISO-8859-1编码
$ cat input / words.txt | env LC_ALL = en_US grep ...
I'm trying mount a regex that get some words on a file where all letters of this word match with a word pattern.
My problem is, the regex can't find accented words, but in my text file there are alot of accented words.
My command line is:
cat input/words.txt | grep '^[éra]\{1,4\}$' > output/words_era.txt
cat input/words.txt | grep '^[carroça]\{1,7\}$' > output/words_carroca.txt
And the content of file is:
carroça
éra
éssa
roça
roco
rato
onça
orça
roca
How can I fix it?
解决方案
If your file is encoded in ISO-8859-1 but your system locale is UTF-8, this will not work.
Either convert the file to UTF-8 or change your system locale to ISO-8859-1.
# convert from ISO-8859-1 to the environmental locale before grepping # output will be in the current locale $ iconv -f 8859_1 input/words.txt | grep ... # run grep with an ISO-8859-1 locale # output will be in ISO-8859-1 encoding $ cat input/words.txt | env LC_ALL=en_US grep ...
这篇关于grep / regex找不到重音字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文