包含子字符串列表中确切子字符串的字符串 [英] String Containing Exact Substring from Substring List

查看:86
本文介绍了包含子字符串列表中确切子字符串的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里是scala初学者,我正在尝试在给定的关键字列表中找到所有包含至少一个关键字的推文文本.

Scala beginner here, I'm trying to find all the tweets text that contain at least one keyword in the list of keywords given.

其中一条推文:

case class Tweet(user: String, text: String, retweets: Int)

以示例 Tweet("user1","apple apple",3)

鉴于 wordInTweet 应该返回真,如果在推文的文本中至少可以找到列表关键字中的一个关键字.

Given that wordInTweet should return true if at least one keyword in the list keywords can be found in the tweet's text.

我尝试过如下实现:

def wordInTweet(tweet: Tweet, keywords: List[String]): Boolean = {
    keywords.exists(tweet.text.equals(_))
}

但是,如果一条推文的文字是音乐,而给定关键字的文字是 musica ,它也会返回 true .

But, it also returns true if a tweet's text is music and a given keyword's text is musica.

我正在努力寻找一种方法,仅在鸣叫时才返回true包含完全相同的关键字文本.

I'm struggling to find a way to return true ONLY if the tweets contains the exact same keyword's text.

我该如何实现?

谢谢.

推荐答案

首先,如果考虑将关键字作为集合,这将有所帮助,因为集合具有非常有效的 belongs 函数.

First, it would help if you consider the keywords as a Set, given that sets have a very efficient belongs function.

keywords: Set[String]

然后,我们需要测试推文中的每个单词,而不是全文.这意味着我们需要将文本拆分为单词.我们随处可见"wordCount"示例,以此为例.

Then we need to test every word in the tweet, as opposed to the complete text. This means that we need to split the text into words. We find an example of that everywhere with the ubiquitous "wordCount" example.

val wordsInTweet = tweet.text.split("\\W")

接下来,我们将所有内容放在一起:

Next, we put things together:

def wordInTweet(tweet: Tweet, keywords: Set[String]): Boolean = {
   val wordsInTweet = tweet.text.split("\\W")
   wordsInTweet.exists(word => keywords.contains(word))
}

这篇关于包含子字符串列表中确切子字符串的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆