从文本文件中删除 Unicode 字符 - sed ，其他 Bash/shell 方法 [英] Remove Unicode characters from textfiles - sed , other Bash/shell methods

查看：40 发布时间：2021/12/5 23:17:35 bash unicode sed text-files spaces

本文介绍了从文本文件中删除 Unicode 字符 - sed ，其他 Bash/shell 方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

如何从终端的一堆文本文件中删除 Unicode 字符?

How do I remove Unicode characters from a bunch of text files in the terminal?

我试过了，但没有用:

sed 'g/u'U+200E'//' -i *.txt

我需要从文本文件中删除这些 Unicode 字符:

I need to remove these Unicode characters from the text files:

U+0091 - sort of weird "control" space
U+0092 - same sort of weird "control" space
A0 - non-space break
U+200E - left to right mark

如果您想删除仅特定字符并且您有 Python，您可以:

If you want to remove only particular characters and you have Python, you can:

CHARS=$(python -c 'print u"u0091u0092u00a0u200E".encode("utf8")')
sed 's/['"$CHARS"']//g' < /tmp/utf8_input.txt > /tmp/ascii_output.txt

这篇关于从文本文件中删除 Unicode 字符 - sed ，其他 Bash/shell 方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文