如何从PHP中的UTF-8字符串替换/删除4(+)字节字符? [英] How to replace/remove 4(+)-byte characters from a UTF-8 string in PHP?
本文介绍了如何从PHP中的UTF-8字符串替换/删除4(+)字节字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
似乎MySQL 不支持字符的默认UTF-8字符集超过3个字节.
It seems like MySQL does not support characters with more than 3 bytes in its default UTF-8 charset.
因此,在PHP中,如何去除字符串中所有4个(及更多)字节的字符,并用诸如此类的其他字符替换它们?
So, in PHP, how can I get rid of all 4(-and-more)-byte characters in a string and replace them with something like by some other character?
推荐答案
注意:您不仅应该剥离,还应使用替换字符U + FFFD进行替换,以避免Unicode攻击,主要是XSS:
NOTE: you should not just strip, but replace with replacement character U+FFFD to avoid unicode attacks, mostly XSS:
http://unicode.org/reports/tr36/#Deletion_of_Noncharacters
preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $value);
这篇关于如何从PHP中的UTF-8字符串替换/删除4(+)字节字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文