如何从PHP中的UTF-8字符串替换/删除4(+)字节字符? [英] How to replace/remove 4(+)-byte characters from a UTF-8 string in PHP?

查看:272
本文介绍了如何从PHP中的UTF-8字符串替换/删除4(+)字节字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎MySQL 不支持字符的默认UTF-8字符集超过3个字节.

It seems like MySQL does not support characters with more than 3 bytes in its default UTF-8 charset.

因此,在PHP中,如何去除字符串中所有4个(及更多)字节的字符,并用诸如此类的其他字符替换它们?

So, in PHP, how can I get rid of all 4(-and-more)-byte characters in a string and replace them with something like by some other character?

推荐答案

注意:您不仅应该剥离,还应使用替换字符U + FFFD进行替换,以避免Unicode攻击,主要是XSS:

NOTE: you should not just strip, but replace with replacement character U+FFFD to avoid unicode attacks, mostly XSS:

http://unicode.org/reports/tr36/#Deletion_of_Noncharacters

preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $value);

这篇关于如何从PHP中的UTF-8字符串替换/删除4(+)字节字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆