如何在Scala中识别表情符号? [英] How can I identify an emoji in scala?
问题描述
我正在处理来自Twitter Api的推文,很多推文都有表情符号.我试图跟踪最常用的表情符号,但实际上很难识别它们.
I am processing tweets from the Twitter Api, and a lot of the tweets have emojis. I'm trying to keep track of the most used emojis, but I'm having trouble actually identifying them.
我正在使用: https://github.com/iamcal/emoji-data 识别表情符号.
I'm using: https://github.com/iamcal/emoji-data to identify emojis.
我不知道如何判断一个字符串是否包含表情符号.我已经尝试过将表情符号数据与统一"字段一起使用正则表达式,但是我尝试仅检查字符串是否包含该字段.我真的只是不确定如何检查表情符号.任何帮助将不胜感激.
I have no idea how to figure out if a string contains an emoji or not. I have tried using regex with the emoji-data 'unified' field, I have tried just checking if the string contains that field. I'm really just not sure how to check for emojis.. Any help would be appreciated.
val pattern = new Regex("(${a.unified})")
(pattern findAllIn text).mkString(",")
这是我尝试使用正则表达式的内容.找不到任何表情符号.我也尝试过在表情符号数据的统一字段之前添加\ u,但这无济于事.
This is what I have tried using regex. This doesn't find any emojis. I have also tried adding a \u before the unified fields from the emoji-data, but that doesn't help.
推荐答案
您可以使用以下Regex查找表情符号字符(以及Unicode语言平面之外的其他字符):
You can use the following Regex to find emoji characters (and other characters outside the Unicode lingual plane):
[^ \ u0000- \ uFFFF]
例如,我们使用以下代码从字符串中过滤掉表情符号:
For example, we use the following code to filter out emojis from strings:
某些字符串" .replaceAll("[^ \ u0000- \ uFFFF]",");
希望有帮助.
这篇关于如何在Scala中识别表情符号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!