sed:将unicode块与 [英] sed: matching unicode blocks with

查看：278 发布时间：2020/7/12 18:47:10 unicode utf-8 sed unicode-escapes

本文介绍了sed:将unicode块与的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我拼命尝试使用sed替换文件中的某些unicode字符(字素).但是，我一直对其中一些失败，即来自unicode块的那些失败:

I am desperately trying to replace certain unicode characters (graphemes) from a file using sed. However I keep failing for some of them, namely the ones from unicode blocks:

\p{InHigh_Surrogates}: U+D800–U+DB7F
\p{InHigh_Private_Use_Surrogates}: U+DB80–U+DBFF
\p{InLow_Surrogates}: U+DC00–U+DFFF

我尝试过(在通过-f开关加载的sed配置文件中):

I tried (in a sed config file loaded via the -f switch):

s/\p{InHigh_Surrogates}/###/  --> no effect at all
s/\\p\{InHigh_Surrogates\}/###_D-NON-UTF8_###/ -> error message 'Invalid content of \{\}'

有人收到建议吗?另外，我并不一定要专注于使用块-但是我也尝试定义\ xd800- \ xdfff形式的字符范围也失败了.

Anybody got a suggestion? Also, I am not necessarily focused on using the blocks - but I also failed trying to define a character range of the form \xd800-\xdfff.

谢谢，托马斯

推荐答案

尝试将-r标志用于sed:

Try using the -r flag for sed:

$ sed -r 's/\\p\{InHigh_Surrogates\}/###/g' file
###: U+D800–U+DB7F
\p{InHigh_Private_Use_Surrogates}: U+DB80–U+DBFF
\p{InLow_Surrogates}: U+DC00–U+DFFF

来自man sed:

-r，--regexp-extended

在脚本中使用扩展的正则表达式.

use extended regular expressions in the script.

这篇关于sed:将unicode块与的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

sed:将unicode块与 [英] sed: matching unicode blocks with

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

sed:将unicode块与 [英] sed: matching unicode blocks with

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭