根据单词过滤内容 [英] Filtering content based on words
问题描述
对于我正在处理的项目,我会显示从 Twitter Streaming API 收到的推文.在显示推文之前,我需要根据黑名单中的单词列表检查每个单词.
For a project I'm working on, I display tweets I receive from the Twitter Streaming API. Before displaying a tweet, I need to check each word against a list of blacklisted words.
目前,我在 MongoDB 集合中拥有所有列入黑名单的单词.
Currently, I have all the blacklisted words in a MongoDB collection.
我想到的显而易见的方法是将推文炸开以获取每个单词,然后针对推文中的每个单词检查黑名单集合是否包含该单词.
The obvious way that comes to my mind is to explode the tweet to get each word, and then for each word in the tweet, check if the blacklist collection contains that word.
然而,这意味着我显示的每条推文大约有 20 次数据库调用.
However, this would mean ~ 20 database calls per tweet I show.
有没有更好的方法来解决这个问题?
Is there a better way to go about this?
推荐答案
我会从数据库中获取所有列入黑名单的单词,将它们作为字符串存储在一个变量中(用 |
分隔)和使用 preg_match()
查看是否有推文中的任何内容.
I'd fetch all the blacklisted words from the database, store them inside a variable as a string (separated with |
) and use preg_match()
to see if there's any in the tweet.
$blacklist = 'blacklisted|words';
if (preg_match('/\b(' . $blacklist . ')\b/i', $tweet))
{
// Don't show
}
else
{
// Show the tweet
}
这篇关于根据单词过滤内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!