使用MySQL检测垃圾邮件发送者 [英] Detecting spammers with MySQL

查看:60
本文介绍了使用MySQL检测垃圾邮件发送者的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到越来越多的用户在我的网站上注册,仅向其他用户发送重复的垃圾邮件.我添加了一些服务器端代码,以使用以下mysql查询检测重复消息:

I see an ever increasing number of users signing up on my site to just send duplicate SPAM messages to other users. I've added some server side code to detect duplicate messages with the following mysql query:

  SELECT count(content) as msgs_sent 
    FROM messages 
   WHERE sender_id = '.$sender_id.' 
GROUP BY content having count(content) > 10

查询效果很好,但是现在他们通过更改消息中的一些charctersr来解决此问题.有没有一种方法可以使用MySQL进行检测,或者我需要查看从MySQL返回的每个分组,然后使用PHP确定相似性百分比?

The query works well but now they're getting around this by changing a few charctersr in their messages. Is there a way to detect this with MySQL or do I need to look at each grouping returned from MySQL and then use PHP to determine the percentage of similarity?

有什么想法或建议吗?

推荐答案

全文匹配

您可以考虑实现与MATCH示例类似的内容

You could look at implementing something similar to the MATCH example here:

mysql> SELECT id, body, MATCH (title,body) AGAINST
    -> ('Security implications of running MySQL as root') AS score
    -> FROM articles WHERE MATCH (title,body) AGAINST
    -> ('Security implications of running MySQL as root');
+----+-------------------------------------+-----------------+
| id | body                                | score           |
+----+-------------------------------------+-----------------+
|  4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
|  6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)

因此,以您的示例为例:

SELECT id, MATCH (content) AGAINST ('your string') AS score
FROM messages 
WHERE MATCH (content) AGAINST ('your string')
    AND score > 1;

请注意,要使用这些功能,您的content列将需要为FULLTEXT索引.

Note that to use these functions your content column would need to be a FULLTEXT index.

此示例中的score是什么?

What is score in this example?

它是relevance value.它是通过下面描述的过程计算的:

It is a relevance value. It is computed through the process described below:

对集合和查询中的每个正确单词进行加权 根据其在收集或查询中的意义. 因此,许多文档中存在的单词具有较低的 重量(甚至可能是零重量),因为它具有较低的重量 此特定集合中的语义值.反之,如果这个词 很少见,它的重量更大.单词的权重是 组合以计算该行的相关性.

Every correct word in the collection and in the query is weighted according to its significance in the collection or query. Consequently, a word that is present in many documents has a lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to compute the relevance of the row.

文档页中.

这篇关于使用MySQL检测垃圾邮件发送者的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆