用于删除unicode字符的代码 [英] Code to remove unicode characters

查看:80
本文介绍了用于删除unicode字符的代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,



我想删除像Ã这样的unicode字符;°从我的数据...



我用这个表达



preg_replace('/ [^(\\ \\ x20-\x7F)] * /','',$ row ['xmlFeed'])



但它没有删除所有的unicode字符。 ...请帮我从我的数据中删除所有unicode



个字符

解决方案

row [ 'xmlFeed'])



但它没有删除所有的unicode字符....请帮我删除所有的unicode


来自我的数据的
字符


查看类似的问题 here [ ^ ],我想你想要的是(最密切的)就像你写的那样):

 preg_replace('  / [^ \ x {20} -\\ \\ x {7F}] / u'' '


row [' xmlFeed'])





但是,如果你真的想要保留所有非Unicode字符,那么值为0x00-0x19的字符在技术上也是有效的,所以你可能想要 / [^ \ x {00} -\ x {7F}] / u



关于正则表达式的一些提示general:不要在[]中使用括号,除非你的意思是包含/排除括号字符,它们的含义在字符类中改变。 (例如, / [(az)] + / 将匹配所有小写英文字母(和),因此它将匹配整个字符串a(b)c。)在这种情况下,你不需要*,因为你想要替换任何单个字符的出现,从我能找到的,默认情况下PHP进行全局匹配,所以它已经匹配所有他们。如果没有,你只需要第一次运行它们。无论哪种方式,它都没有帮助,特别是当您考虑到模式没有其他约束时,这可能会导致一些正则表达式的实现速度慢得多(它将替换它与替换文本遇到的每个空字符串,所以对于 / a * / 在字符串aaaaabba中替换为c会导致cbcbc,因为两个b之间的空字符串可以认为是有效的匹配)。


Hi all,

I want to remove the unicode characters like ð from my data...

I used this expression

preg_replace('/[^(\x20-\x7F)]*/','', $row['xmlFeed'])

but it didn't remoove all the unicode characters....Please help me to remove all the unicode

characters from my data

解决方案

row['xmlFeed'])

but it didn't remoove all the unicode characters....Please help me to remove all the unicode

characters from my data


Looking at a similar question here[^], I think what you want is (most closely resembling what you've written) is:

preg_replace('/[^\x{20}-\x{7F}]/u','',


row['xmlFeed'])



However, if you truly want to keep all non-Unicode characters, the characters with values 0x00-0x19 are technically valid as well, so you might want /[^\x{00}-\x{7F}]/u.

Also some tips on regex in general: don't use parenthesis inside of [] unless you mean to include/exclude parenthesis characters, their meaning changes inside character classes. (For example, /[(a-z)]+/ would match all lowercase English letters and ( and ), so it would match the entire string "a(b)c".) In this case, you don't need the * because you want to replace any single character occurrences, from what I could find, PHP does global matching by default, so it will already match all of them. If it didn't, you'd just get the first run of them anyways. Either way, it doesn't help, especially when you consider there are no other constraints to the pattern, this can result in much slower results on some implementations of regex (it will replace every empty string it encounters with the replacement text as well, so for /a*/ replaced with "c" in the string "aaaaabba" can result in "cbcbc", because the empty string between the two b's can be considered a valid match).


这篇关于用于删除unicode字符的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆