如何检测是否必须在字符串上应用utf8解码或编码? [英] How to detect if have to apply utf8 decode or encode on a string?
问题描述
我有一个从第三方网站采集的Feed,有时我必须申请 utf8_decode
等等。 utf8_encode
获得所需的可见输出。
如果错误地将相同的内容应用了两次,或者使用错误的方法,我会得到一些更丑陋的东西,这就是我想要的
如何检测何时应用于字符串?
更新
实际上,内容返回UTF-8,但内部有部分不是。
我不能说我可以依靠 mb_detect_encoding()
。有一些怪异的假阳性一段时间。
我发现在每一种情况下工作得最好的最普遍的方式是:
如果(preg_match('!! u',$ string))
{
//这是utf-8
}
else
{
//绝对不是utf-8
}
I have a feed taken from 3rd party sites, and sometimes I have to apply utf8_decode
and other times utf8_encode
to get the desired visible output.
If by mistake the same stuff is applied twice/or the wrong method is used I get something more ugly, this is what I want to change.
How can I detect when what have to apply on the string?
UPDATE
Actually the content returns UTF-8, but inside there are parts that are not.
I can't say I can rely on mb_detect_encoding()
. Had some freaky false positives a while back.
The most universal way I found to work well in every case was:
if (preg_match('!!u', $string))
{
// this is utf-8
}
else
{
// definitely not utf-8
}
这篇关于如何检测是否必须在字符串上应用utf8解码或编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!