从TEXTFILES删除UNI code字 - sed的，其他的bash /壳方法 [英] Remove unicode characters from textfiles - sed , other bash/shell methods

查看：171 发布时间：2016/8/2 13:13:33 bash unicode sed text-files spaces

本文介绍了从TEXTFILES删除UNI code字 - sed的，其他的bash /壳方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我如何从终端上一堆文本文件中删除UNI code字？
我试过这个，但它没有工作：

  SED'克/ \\ u'U + 200E'//'-i * .TXT

我需要从TEXTFILES删除这些UNI codeS

  U + 0091  - 那种怪异的控制权的空间
U + 0092  - 同一类的怪异控制的空间
A0  - 非休息空间
U + 200E  - 左至右符号

解决方案

如果您只想删除特定的字符，你有蟒蛇，您可以：

 煤焦= $（蟒蛇-c'打印U\\ u0091 \\ u0092 \\ u00a0 \\ u200E.EN code（UTF8））
SED的/ ['$煤焦'] // G'＆LT; /tmp/utf8_input.txt＆GT; /tmp/ascii_output.txt

How do I remove unicode characters from a bunch of text files on the terminal? I've tried this but it didn't work:

sed 'g/\u'U+200E'//' -i *.txt

I need to remove these unicodes from the textfiles

U+0091 - sort of weird "control" space
U+0092 - same sort of weird "control" space
A0 - non-space break
U+200E - left to right mark

解决方案

If you want to remove ONLY particular characters and you have python, you can:

CHARS=$(python -c 'print u"\u0091\u0092\u00a0\u200E".encode("utf8")')
sed 's/['"$CHARS"']//g' < /tmp/utf8_input.txt > /tmp/ascii_output.txt

这篇关于从TEXTFILES删除UNI code字 - sed的，其他的bash /壳方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文