如何检测是否必须对字符串应用UTF-8解码或编码? [英] How do I detect if have to apply UTF-8 decode or encode on a string?
问题描述
我从第三方站点获取了一个供稿,有时我必须应用utf8_decode
,有时需要应用utf8_encode
,以获得所需的可见输出.
I have a feed taken from third-party sites, and sometimes I have to apply utf8_decode
and other times utf8_encode
to get the desired visible output.
如果错误地将相同的东西应用了两次/或者使用了错误的方法,我会变得更加难看,这就是我要更改的地方.
If by mistake the same stuff is applied twice/or the wrong method is used I get something more ugly, this is what I want to change.
如何检测什么时候需要在字符串上应用什么内容?
How can I detect when what have to apply on the string?
实际上,内容返回的是UTF-8,但是里面有一些不是的部分.
Actually the content returns UTF-8, but inside there are parts that are not.
推荐答案
我不能说我可以依靠mb_detect_encoding()
.我前段时间有一些怪异的误报.
I can't say I can rely on mb_detect_encoding()
. I had some freaky false positives a while back.
我发现在每种情况下都能正常工作的最普遍的方式是:
The most universal way I found to work well in every case was:
if (preg_match('!!u', $string))
{
// This is UTF-8
}
else
{
// Definitely not UTF-8
}
这篇关于如何检测是否必须对字符串应用UTF-8解码或编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!