删除utf中的垃圾字符 [英] Remove garbage characters in utf

查看:100
本文介绍了删除utf中的垃圾字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用utf8格式将所有数据存储到mysql中.在将数据插入数据库之前,我需要使用不需要的字符清除字符串.字符串采用utf8格式.我知道如何使用正则表达式和字符串替换,但不知道如何使用阿拉伯字符.

I am using utf8 format to store all my data into mysql. Before data is inserted into the database I need to clean the strings with unwanted characters. The strings are in utf8 format. I know how to use regex and string replace but do not know how to work with arabic characters.

需要清洗的样品串:████.. الــقــوانين الجديـــدةفيقســـم الـعنايـ";

Sample string that needs to be cleaned : "████ .. الــقــوانين الجديـــدة في قســـم الـعنايـ";

谢谢

推荐答案

好.如 @Jonathan Leffler 所述,如果您可以为需要替换的字符指定unicode字符范围,您可以使用正则表达式将字符替换为空字符串.

Ok. As @Jonathan Leffler already said, if you can specify the unicode character ranges for the characters that need to be replaced, you can use a regular expression to replace the characters with an empty string.

在表达式中(在PHP中)将Unicode字符指定为<​​c0>.另外,您必须设置 u修饰符使PHP将模式视为UTF8.

A unicode character is specified as \x{FFFF} in an expression (in PHP). In addition, you have to set the u modifier to make PHP treat the pattern as UTF8.

所以最后,您将得到如下内容:

So in the end, you have something like this:

preg_replace('/[\x{FFFF}-\x{FFFF}]+/u','',$string);

其中

  • /.../u是定界符加上修饰符
  • [...]+是字符类加量词,表示在一个或多个时间段内匹配任何这些字符
  • \x{FFFF}-\x{FFFF}是Unicode字符范围(显然,您必须提供正确的代码点/字符编号).
  • /.../u are the delimiters plus the modifier
  • [...]+ is a character class plus quantifier, which means match any of these characters inside one or mor times
  • \x{FFFF}-\x{FFFF} is a unicode character range (obviously you have to provide the right codepoints/numbers of the characters).

您还可以通过^ 否定该组,您可以指定要保留的范围:

You can also negate the group with a ^ you can specify the range which you want to keep:

preg_replace('/[^\x{FFFF}-\x{FFFF}]+/u','',$string);


更多信息:

  • Regular expressions
  • Regular expressions in PHP
  • Unicode Charts

这篇关于删除utf中的垃圾字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆