替换非UTF8字符 [英] Replacing non UTF8 characters

查看:132
本文介绍了替换非UTF8字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在php中,我需要替换字符串中的所有非UTF8字符.但是,不是通过某种等效方式(例如带有//TRANSLITiconv函数),而是通过某些选定的字符(例如,例如"_""*").

In php, I need to replace all non-UTF8 characters in a string. However, not by some equivalent (like the iconv function with //TRANSLIT) but by some chosen character (like "_" or "*" for example).

通常,我希望用户能够看到找到无效字符的位置.

Typically I want the user to be able to see the position were the invalid characters were found.

我没有找到执行此操作的任何功能,因此我将使用:

I didn't find any functions that do this, so I was going to use:

  • iconv//IGNORE一起使用
  • 对两个字符串进行比较,然后将所需字符插入非UTF8字符所在的位置
  • use iconv with //IGNORE
  • do a diff on the two strings and insert the wanted character where the non-UTF8 ones where

您看到更好的方法了吗,php中是否有一些函数可以组合在一起以产生这种行为?

Do you see a better way to do that, is there some functions in php that can be combined to have this behavior ?

感谢您的帮助.

推荐答案

这里有2个函数可帮助您实现所需的目标:

Here are 2 functions to help you achieve something close to what you want :

//reject overly long 2 byte sequences, as well as characters above U+10000 and replace with ?
$some_string = preg_replace('/[\x00-\x08\x10\x0B\x0C\x0E-\x19\x7F]'.
 '|[\x00-\x7F][\x80-\xBF]+'.
 '|([\xC0\xC1]|[\xF0-\xFF])[\x80-\xBF]*'.
 '|[\xC2-\xDF]((?![\x80-\xBF])|[\x80-\xBF]{2,})'.
 '|[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/S',
 '?', $some_string );

//reject overly long 3 byte sequences and UTF-16 surrogates and replace with ?
$some_string = preg_replace('/\xE0[\x80-\x9F][\x80-\xBF]'.
 '|\xED[\xA0-\xBF][\x80-\xBF]/S','?', $some_string );

请注意,您可以通过更改位于preg_replace('blablabla', **'?'**, $some_string)

note that you can change the replacement (which currently is '?' with anything else by changing the string located at preg_replace('blablabla', **'?'**, $some_string)

原始文章: http ://magp.ie/2011/01/06/remove-non-utf8-characters-from-string-with-php/

这篇关于替换非UTF8字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆