删除 utf 中的垃圾字符 [英] Remove garbage characters in utf
问题描述
我使用 utf8 格式将我的所有数据存储到 mysql.在将数据插入数据库之前,我需要用不需要的字符清理字符串.字符串采用 utf8 格式.我知道如何使用正则表达式和字符串替换,但不知道如何使用阿拉伯字符.
I am using utf8 format to store all my data into mysql. Before data is inserted into the database I need to clean the strings with unwanted characters. The strings are in utf8 format. I know how to use regex and string replace but do not know how to work with arabic characters.
需要清理的示例字符串:████ .. الــقــوانين الجديـــدة في قســـم الـعنايـ";
Sample string that needs to be cleaned : "████ .. الــقــوانين الجديـــدة في قســـم الـعنايـ";
谢谢
推荐答案
好的.正如 @Jonathan Leffler 已经说过的,如果您可以为需要替换的字符指定 unicode 字符范围,您可以使用正则表达式将字符替换为空字符串.
Ok. As @Jonathan Leffler already said, if you can specify the unicode character ranges for the characters that need to be replaced, you can use a regular expression to replace the characters with an empty string.
一个 unicode 字符在表达式中被指定为 \x{FFFF}
(在 PHP 中).另外,你必须设置u
修饰符 使 PHP 将模式视为 UTF8.
A unicode character is specified as \x{FFFF}
in an expression (in PHP). In addition, you have to set the u
modifier to make PHP treat the pattern as UTF8.
所以最后,你有这样的东西:
So in the end, you have something like this:
preg_replace('/[\x{FFFF}-\x{FFFF}]+/u','',$string);
哪里
/.../u
是分隔符加上修饰符[...]+
是一个字符类加量词,这意味着在一次或多次内匹配这些字符中的任何一个\x{FFFF}-\x{FFFF}
是一个 unicode 字符范围(显然你必须提供正确的代码点/字符数).
/.../u
are the delimiters plus the modifier[...]+
is a character class plus quantifier, which means match any of these characters inside one or mor times\x{FFFF}-\x{FFFF}
is a unicode character range (obviously you have to provide the right codepoints/numbers of the characters).
您也可以使用^
否定组,您可以指定要保留的范围:
You can also negate the group with a ^
you can specify the range which you want to keep:
preg_replace('/[^\x{FFFF}-\x{FFFF}]+/u','',$string);
<小时>
更多信息:
这篇关于删除 utf 中的垃圾字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!