替换非UTF8字符 [英] Replacing non UTF8 characters
问题描述
在php中,我需要替换字符串中的所有非UTF8字符.但是,不是通过某种等效方式(例如带有//TRANSLIT
的iconv
函数),而是通过某些选定的字符(例如,例如"_"
或"*"
).
In php, I need to replace all non-UTF8 characters in a string. However, not by some equivalent (like the iconv
function with //TRANSLIT
) but by some chosen character (like "_"
or "*"
for example).
通常,我希望用户能够看到找到无效字符的位置.
Typically I want the user to be able to see the position were the invalid characters were found.
我没有找到执行此操作的任何功能,因此我将使用:
I didn't find any functions that do this, so I was going to use:
- 将
iconv
与//IGNORE
一起使用 - 对两个字符串进行比较,然后将所需字符插入非UTF8字符所在的位置
- use
iconv
with//IGNORE
- do a diff on the two strings and insert the wanted character where the non-UTF8 ones where
您看到更好的方法了吗,php中是否有一些函数可以组合在一起以产生这种行为?
Do you see a better way to do that, is there some functions in php that can be combined to have this behavior ?
感谢您的帮助.
推荐答案
这里有2个函数可帮助您实现所需的目标:
Here are 2 functions to help you achieve something close to what you want :
//reject overly long 2 byte sequences, as well as characters above U+10000 and replace with ?
$some_string = preg_replace('/[\x00-\x08\x10\x0B\x0C\x0E-\x19\x7F]'.
'|[\x00-\x7F][\x80-\xBF]+'.
'|([\xC0\xC1]|[\xF0-\xFF])[\x80-\xBF]*'.
'|[\xC2-\xDF]((?![\x80-\xBF])|[\x80-\xBF]{2,})'.
'|[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/S',
'?', $some_string );
//reject overly long 3 byte sequences and UTF-16 surrogates and replace with ?
$some_string = preg_replace('/\xE0[\x80-\x9F][\x80-\xBF]'.
'|\xED[\xA0-\xBF][\x80-\xBF]/S','?', $some_string );
请注意,您可以通过更改位于preg_replace('blablabla', **'?'**, $some_string)
note that you can change the replacement (which currently is '?' with anything else by changing the string located at preg_replace('blablabla', **'?'**, $some_string)
原始文章: http ://magp.ie/2011/01/06/remove-non-utf8-characters-from-string-with-php/
这篇关于替换非UTF8字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!