grep/regex 找不到带重音的单词 [英] grep/regex can't find accented word

查看：26 发布时间：2022/1/6 14:07:53 regex unicode grep cat non-ascii-characters

本文介绍了grep/regex 找不到带重音的单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试安装一个正则表达式，在文件中获取一些单词，其中该单词的所有字母都与单词模式匹配.

我的问题是，正则表达式找不到带重音的词，但在我的文本文件中有很多带重音的词.

我的命令行是:

cat input/words.txt |grep '^[éra]{1,4}$' >输出/words_era.txt猫输入/words.txt |grep '^[carroça]{1,7}$' >输出/words_carroca.txt

文件内容为:

carroça时代埃萨罗萨洛可比率翁萨奥尔萨罗卡

我该如何解决?

解决方案

如果您的文件使用 ISO-8859-1 编码，但您的系统区域设置为 UTF-8，这将不起作用.

将文件转换为 UTF-8 或将您的系统区域设置更改为 ISO-8859-1.

<前># 在 grepping 之前从 ISO-8859-1 转换为环境语言环境# 输出将在当前语言环境中$ iconv -f 8859_1 input/words.txt |格雷普...# 使用 ISO-8859-1 语言环境运行 grep# 输出将采用 ISO-8859-1 编码$ cat input/words.txt |环境 LC_ALL=en_US grep ...

I'm trying mount a regex that get some words on a file where all letters of this word match with a word pattern.

My problem is, the regex can't find accented words, but in my text file there are alot of accented words.

My command line is:

cat input/words.txt | grep '^[éra]{1,4}$' > output/words_era.txt
cat input/words.txt | grep '^[carroça]{1,7}$' > output/words_carroca.txt

And the content of file is:

carroça
éra
éssa
roça
roco
rato
onça
orça
roca

How can I fix it?

解决方案

If your file is encoded in ISO-8859-1 but your system locale is UTF-8, this will not work.

Either convert the file to UTF-8 or change your system locale to ISO-8859-1.

# convert from ISO-8859-1 to the environmental locale before grepping
# output will be in the current locale
$ iconv -f 8859_1 input/words.txt | grep ...

# run grep with an ISO-8859-1 locale
# output will be in ISO-8859-1 encoding
$ cat input/words.txt | env LC_ALL=en_US grep ...

这篇关于grep/regex 找不到带重音的单词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

grep/regex 找不到带重音的单词 [英] grep/regex can't find accented word

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

grep/regex 找不到带重音的单词 [英] grep/regex can&#39;t find accented word

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

grep/regex 找不到带重音的单词 [英] grep/regex can't find accented word

登录关闭