PHP:在坏词混淆器中使用特殊字符 [英] PHP: Using special characters in bad word obfuscator
问题描述
我正在php中使用此错误词检测器/混淆器(以符合Adsense的要求).它显示坏词的第一个字母,并用以下字符替换其余字母:▪
I'm using this bad word detector/obfuscator in php (to be Adsense compliant). It shows the first letter of the bad word, and replaces the remaining letters with this character: ▪
工作正常,除非当我使用包含西班牙语特殊字符的单词时,例如ñ,á,ó等.
It works fine, except when I'm using words that contain special characters in Spanish, for example: ñ, á, ó, etc.
这是我当前的代码:
<?
function badwords_full($string, &$bad_references) {
static $bad_counter;
static $bad_list;
static $bad_list_q;
if(!isset($bad_counter)) {
$bad_counter = 0;
$bad_list = badwords_list();
$bad_list_q = array_map('preg_quote', $bad_list);
}
return preg_replace_callback('~('.implode('|', $bad_list_q).')~',
function($matches) use (&$bad_counter, &$bad_references) {
$bad_counter++;
$bad_references[$bad_counter] = $matches[0];
return substr($matches[0], 0, 1).str_repeat('▪', strlen($matches[0]) - 1);
}, $string);
}
function badwords_list() {
# spanish
$es = array(
"gallina",
"ñoño"
);
# english
$en = array(
"chicken",
"horse"
);
# join all languages
$list = array_merge($es, $en);
usort($list, function($a,$b) {
return strlen($b) < strlen($b);
});
return $list;
}
$bad = []; //holder for bad words
测试1:
echo badwords_full('Hello, you are a chicken!', $bad);
结果1:
你好,你是一个c···········! (效果很好)
Hello, you are a c▪▪▪▪▪▪! (works fine)
测试2:
echo badwords_full('Hola en español eres un ñoño!', $bad);
结果2:
Hola enespañoler unes ·······!
Hola en español eres un �▪▪▪▪▪!
关于如何解决此问题的任何想法?谢谢!
Any ideas on how to solve this issue? Thanks!
推荐答案
您正在将一个多字节字符分成两半.使用 mb_substr
代替
You are splitting a multibyte character in half. Use mb_substr
in place of substr
.
return mb_substr($matches[0], 0, 1).str_repeat('▪', strlen($matches[0]) - 1);
You also probably want to use mb_strlen
in place of strlen
.
这篇关于PHP:在坏词混淆器中使用特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!