""出现而不是“& nbsp;”字符 [英] "Â " character showing up instead of " "
问题描述
我发现此线程很好地描述了我的问题,而此答案恰好描述了我的问题。
I found this thread which describes my issue pretty well and this answer describes my issue exactly.
不间断空格字符是字节0xA0,是ISO-8859-1;当编码为UTF-8时,它将为0xC2,0xA0,如果(错误地)将其视为ISO-8859-1,则显示为
 。其中包括结尾的nbsp ...
The non-breaking space character is byte 0xA0 is ISO-8859-1; when encoded to UTF-8 it'd be 0xC2,0xA0, which, if you (incorrectly) view it as ISO-8859-1 comes out as
"Â "
. That includes a trailing nbsp...
但是,我设法将问题归结为用于包装图片标签的函数
However, I have managed to track my issue down to a function I use to wrap image tags in divs.
function img_format($str)
{
$doc = new DOMDocument();
@$doc->loadHTML($str); // <-- Bonus points for the explaination of the @
// $tags object
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
$div = $doc->createElement('div');
$div->setAttribute('class','inner-copy');
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);
$tag->setAttribute('class', 'inner-img');
}
$str = $doc->saveHTML();
return $str;
}
很简单,如何在此函数中解决此问题?
Quite simply, how can I fix this issue within this function?
我理解使用;
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
可以解决此问题,但显然我在函数本身中忽略了某些事情。
will fix this issue, but there is obviously something I'm overlooking within the function itself.
我已经尝试过;
$dom->validateOnParse = true;
无济于事。 (我还是不太清楚该怎么做)
To no avail. (I don't quite know what that does anyway)
推荐答案
找到了!
@$doc->loadHTML(mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8'));
此答案解释了该问题并给出了上面的解决方法;
This answer explains the issue and gives the work around above;
DOMDocument :: loadHTML将把您的字符串视为ISO-除非另有说明,否则为8859-1。这导致UTF-8字符串被错误地解释。
DOMDocument::loadHTML will treat your string as being in ISO-8859-1 unless you tell it otherwise. This results in UTF-8 strings being interpreted incorrectly.
这篇关于""出现而不是“& nbsp;”字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!