grep用于Linux中的表情符号 [英] grep for emojis in linux

查看:144
本文介绍了grep用于Linux中的表情符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图遍历包含多个非ASCII字符的令牌列表。我只想匹配表情符号,其他字符(例如ð或ñ)都可以。表情符号的unicode范围似乎是U + 1F600-U + 1F1FF,但是当我使用grep搜索它时,会发生这种情况:

I am trying to grep across a list of tokens that include several non-ASCII characters. I want to match only emojis, other characters such as ð or ñ are fine. The unicode range for emojis appears to be U+1F600-U+1F1FF but when I search for it using grep this happens:

grep -P "[\x1F6-\x1F1]" contact_names.tokens                                                                                                                                                                                                                                
grep: range out of order in character class 

$ b中的乱序
$ b

https://unicode.org /emoji/charts/full-emoji-list.html#1f3f4_e0067_e0062_e0077_e006c_e0073_e007f

推荐答案

您需要指定代码点具有完整值(不是 1F6 而是 1F600 ),并用花括号将其包裹起来。另外,第一个值必须小于最后一个值。
因此正则表达式应为 [\x {1F1FF} -\x {1F600}]

You need to specify the code points with full value (not 1F6 but 1F600) and wrap them with curly braces. In addition, the first value must be smaller than the last value. So the regex should be "[\x{1F1FF}-\x{1F600}]".

但是,表情符号的unicode范围比您想象的要复杂。您引用的页面不会按代码点对字符进行排序,并且表情符号被放置在许多块中。如果您想涵盖几乎所有表情符号:

The unicode range for emojis is, however, more complex than you assumed. The page you referred does not sort characters by code point and emojis are placed in many blocks. If you want to cover almost all of emoji:

grep -P "[\x{1f300}-\x{1f5ff}\x{1f900}-\x{1f9ff}\x{1f600}-\x{1f64f}\x{1f680}-\x{1f6ff}\x{2600}-\x{26ff}\x{2700}-\x{27bf}\x{1f1e6}-\x{1f1ff}\x{1f191}-\x{1f251}\x{1f004}\x{1f0cf}\x{1f170}-\x{1f171}\x{1f17e}-\x{1f17f}\x{1f18e}\x{3030}\x{2b50}\x{2b55}\x{2934}-\x{2935}\x{2b05}-\x{2b07}\x{2b1b}-\x{2b1c}\x{3297}\x{3299}\x{303d}\x{00a9}\x{00ae}\x{2122}\x{23f3}\x{24c2}\x{23e9}-\x{23ef}\x{25b6}\x{23f8}-\x{23fa}]"  contact_names.tokens

(范围是从 Suhail Gupta的答案(类似问题)借来的)

(The range is borrowed from Suhail Gupta's answer on a similar question)

如果需要允许/禁止特定表情符号块,请参见 unicode.org上的序列数据 Wikipedia上的表情符号列表还会在有序表中显示字符,但可能不会列出最新的字符。

If you need to allow/disallow specific emoji blocks, see sequence data on unicode.org. List of emoji on Wikipedia also show characters in ordered tables but it might not list latest ones.

这篇关于grep用于Linux中的表情符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆