PHP发誓单词过滤器 [英] PHP swear word filter

查看：67 发布时间：2020/5/27 2:02:58 php wordpress preg-replace preg-match

本文介绍了PHP发誓单词过滤器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在开发一个WordPress插件，该插件用列表中的随机新单词替换评论中的坏词.

I'm working on a WordPress plugin that replaces the bad words from the comments with random new ones from a list.

我现在有2个数组:一个包含坏词，另一个包含好词.

I now have 2 arrays: one containing the bad words and another containing the good words.

$bad = array("bad", "words", "here");
$good = array("good", "words", "here");

由于我是初学者，所以在某个时候被卡住了.

Since I'm a beginner, I got stuck at some point.

为了替换坏词，我一直在使用$newstring = str_replace($bad, $good, $string);.

In order to replace the bad words, I've been using $newstring = str_replace($bad, $good, $string);.

我的第一个问题是我想关闭区分大小写的功能，所以我不会放这样的单词"bad", "Bad", "BAD", "bAd", "BAd", etc，但是我需要新单词来保持原始单词的格式，例如，如果我写差"，它将替换为单词"，但是如果我输入差"，它将替换为单词"，等等.

My first problem is that I want to turn off the case sensivity, so I won't put the words like this "bad", "Bad", "BAD", "bAd", "BAd", etc but I need the new word to keep the format of the original word, for example if I write "Bad", it would be replaced with "Words", but if I type "bad", it would be replaced with "words", etc.

我的第一个强项是使用str_ireplace，但是它忘记了原始单词是否有大写字母.

My first tought was to use str_ireplace, but it forgets if the original word had a capital letter.

第二个问题是，我不知道如何与这样的用户打交道:"b a d"，"w o r d s"等.我需要一个主意.

The second problem is that I don't know how to deal with the users that type like this: "b a d", "w o r d s", etc. I need an idea.

为了使其选择一个随机词，我想我可以先使用$new = $good[rand(0, count($good)-1)];然后使用$newstring = str_replace($bad, $new, $string);.如果您有更好的主意，我在这里听.

In order to make it select a random word, I think I can use $new = $good[rand(0, count($good)-1)]; then $newstring = str_replace($bad, $new, $string);. If you have a better idea, I'm here to listen.

我的脚本的一般外观:

function noswear($string)
{
    if ($string)
    {       
        $bad = array("bad", "words");
        $good = array("good", "words"); 
        $newstring = str_replace($bad, $good, $string);     
        return $newstring;
}

echo noswear("I see bad words coming!");

预先感谢您的帮助！

前体

(如无数次评论中所指出)，通过实现这种功能，您和/或您的代码会陷入僵局，仅举几例:

Precursor

There are (as has been pointed out in the comments numerous times) gaping wholes for you - and/or your code - to fall into through implementing such a feature, to name but a few:

人们会将字符添加到傻瓜过滤器
人们将成为 creative (例如innuendo)
人们会使用被动攻击和嘲讽
人们不仅会使用单词，还会使用句子/短语

People will add characters to fool the filter
People will become creative (e.g. innuendo)
People will use passive aggression and sarcasm
People will use sentences/phrases not just words

您最好实施一个审核/举报系统，使人们可以举报令人反感的评论，然后由mods，用户等进行编辑/删除.

You'd do better to implement a moderation/flagging system where people can flag offensive comments which can then be edited/removed by mods, users, etc.

基于这种理解，让我们继续...

On that understanding, let us proceed...

鉴于您:

具有禁止的单词列表$bad_words
具有替换单词列表$good_words
想要替换坏词无论大小写
想用随机好词替换坏词
具有正确转义的坏词列表:请参见 http://php.net/preg_quote

Have a forbidden word list $bad_words
Have a replacement word list $good_words
Want to replace bad words regardless of case
Want to replace bad words with random good words
Have a correctly escaped bad word list: see http://php.net/preg_quote

您可以非常轻松地使用PHP的preg_replace_callback函数:

You can very easily use PHPs preg_replace_callback function:

$input_string = 'This Could be interesting but should it be? Perhaps this \'would\' work; or couldn\'t it?';

$bad_words  = array('could', 'would', 'should');
$good_words = array('might', 'will');

function replace_words($matches){
    global $good_words;
    return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3];
}

echo preg_replace_callback('/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', 'replace_words', $input_string);

好的，所以preg_replace_callback的作用是编译包含所有坏词的正则表达式模式.然后，匹配项将采用以下格式:

Okay, so what the preg_replace_callback does is it compiles a regex pattern consisting of all of the bad words. Matches will then be in the format:

/(START OR WORD_BOUNDARY OR WHITE_SPACE)(BAD_WORD)(WORD_BOUNDARY OR WHITE_SPACE OR END)/i

i修饰符使其不区分大小写，因此bad和Bad都将匹配.

The i modifier makes it case insensitive so both bad and Bad would match.

函数replace_words然后获取匹配的单词及其边界(空白或空白字符)，并用边界和随机的好单词替换.

The function replace_words then takes the matched word and it's boundaries (either blank or a white space character) and replaces it with the boundaries and a random good word.

global $good_words; <-- Makes the $good_words variable accessible from within the function
$matches[1] <-- The word boundary before the matched word
$matches[3] <-- The word boundary after  the matched word
$good_words[rand(0, count($good_words)-1] <-- Selects a random good word from $good_words

匿名函数

您可以使用preg_replace_callback

echo preg_replace_callback(
        '/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i',
        function ($matches) use ($good_words){
            return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3];
        },
        $input_string
    );

函数包装器

如果要多次使用它，也可以将其编写为自包含函数，尽管在这种情况下，您很可能会在调用时将好/不好的单词输入该函数中它(或在其中永久性地对其进行硬编码)，但这取决于您如何导出它们...

Function wrapper

If you're going to use it multiple times you may also write it as a self-contained function, although in this case you're most likely going to want to feed the good/bad words in to the function when calling it (or hard code them in there permanently) but that depends on how you derive them...

function clean_string($input_string, $bad_words, $good_words){
    return preg_replace_callback(
        '/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i',
        function ($matches) use ($good_words){
            return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3];
        },
        $input_string
    );
}

echo clean_string($input_string, $bad_words, $good_words);

输出

使用第一个示例中显示的输入和单词列表连续运行以上功能:

Output

Running the above functions consecutively with the input and word lists shown in the first example:

This will be interesting but might it be? Perhaps this 'will' work; or couldn't it?
This might be interesting but might it be? Perhaps this 'might' work; or couldn't it?
This might be interesting but will it be? Perhaps this 'will' work; or couldn't it?

当然替换词是随机选择的，因此，如果刷新页面，我还会得到其他东西....但这显示了什么/没有被替换.

Of course the replacement words are chosen randomly so if I refreshed the page I'd get something else... But this shows what does/doesn't get replaced.

foreach($bad_words as $key=>$word){
    $bad_words[$key] = preg_quote($word);
}

单词边界`\b`

在这段代码中，我使用\b，\s和^或$作为单词边界，这是有充分理由的.虽然white space，start of string和end of string都被视为单词边界，但\b在所有情况下均不匹配，例如:

Word boundaries `\b`

In this code I've used \b, \s, and ^ or $ as word boundaries there is a good reason for this. While white space, start of string, and end of string are all considered word boundaries \b will not match in all cases, for example:

\b\$h1t\b <---Will not match

这是因为\b与非单词字符(即[^a-zA-Z0-9])匹配，并且$之类的字符不算作单词字符.

This is because \b matches against non-word characters (i.e. [^a-zA-Z0-9]) and characters like $ don't count as word characters.

取决于单词列表的大小，可能会有几个潜在的问题.从系统设计的角度来看， huge 正则表达式通常是不好的形式，其原因如下:

Depending on the size of your word list there are a couple of potential hiccups. From a system design perspective it's generally bad form to have huge regexes for a couple of reasons:

可能难以维护

It can be difficult to maintain

很难读懂它的作用
很难发现错误

如果列表太大，可能会占用大量内存

It can be memory intensive if the list is too large

鉴于正则表达式模式是由PHP编译的，第一个原因被否定了.第二个也应该被否定；如果您的单词列表是 large ，并且每个坏单词都有很多排列，那么我建议您停止并重新考虑您的方法(阅读:使用标记/审核系统).

Given that the regex pattern is compiled by PHP the first reason is negated. The second should be negated as well; if you're word list is large with a dozen permutations of each bad word then I suggest you stop and rethink your approach (read: use a flagging/moderation system).

澄清一下，我认为没有一个小单词列表可以过滤掉特定的词义，因为它的目的是:阻止用户彼此爆发；当您尝试过滤掉太多(包括排列)时，就会出现问题.坚持过滤常见的脏话，如果这不起作用，那么-最后一次 -实施标记/审核系统.

To clarify, I don't see a problem have a small word list to filter out specific expletives as it serves a purpose: to stop users from having an outburst at one another; the problem comes when you try to filter out too much including permutations. Stick to filtering common swear words and if that doesn't work then - for the last time - implement a flagging/moderation system.

这篇关于PHP发誓单词过滤器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PHP发誓单词过滤器 [英] PHP swear word filter

问题描述

推荐答案

前体

Precursor

匿名函数

函数包装器

Function wrapper

输出

Output

单词边界`\b`

Word boundaries `\b`

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

PHP发誓单词过滤器 [英] PHP swear word filter

问题描述

推荐答案

前体

Precursor

匿名函数

函数包装器

Function wrapper

输出

Output

单词边界\b

Word boundaries \b

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

单词边界`\b`

Word boundaries `\b`

登录关闭