PHP DomDocument-为什么用短划线"–"表示转换为– [英] PHP DomDocument - why is en dash "–" converted to –

查看:81
本文介绍了PHP DomDocument-为什么用短划线"–"表示转换为–的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用DOMDocument提取一些段落.

I am using DOMDocument to extract some paragraphs.

这是我要输入的初始htm文件的样子:

Here is how my initial htm file that I am impotrting looks like:

<html>
    <head>
        <title>Toxins</title>
    </head>

    <body>
        <p class=8reference><span>1.</span><span>Sivonen, K.; Jones, G. Cyanobacterial Toxins. In <i>Toxic Cyanobacteria in Water. A Guide to Their Public Health Consequences, Monitoring and Management</i>; Chorus, I., Bartram, J., Eds.; E. and F.N. Spon: London, UK, 1999; pp. 41–111.</span></p>
    </body>
</html>

我正在做的事情:

$dom_input = new \DOMDocument("1.0","UTF-8");
$dom_input->encoding = "UTF-8";
$dom_input->formatOutput = true;
$dom_input->loadHTMLFile($manuscript->getUploadRootDir().$manuscript->getFileName());

$paragraphs = $dom_input->getElementsByTagName('p');

foreach ($paragraphs as $paragraph) {
    if($paragraph->getAttribute('class') == "8reference") {
        var_dump($paragraph->nodeValue);
    }
}

"pp.41–111"中的破折号转换为

The dash from "pp. 41–111" is converted to

pp. 41â€"111

有人知道为什么以及如何解决它以便获取utf8 unicode值吗?

Any idea why and how can I fix it in order to get utf8 unicode values?

谢谢.

推荐答案

在我看来,数据是正确的,只是显示不正确.

It looks to me like the data is correct, you're just displaying it incorrectly.

您要输出UTF-8吗?

Are you outputting in UTF-8?

Ã+是经典的显示UTF-8编码的数据,就好像它不是UTF-8一样.

The à + thing is a classic "showing UTF-8 encoded data as if it was other than UTF-8.

例如 如果要输出到Web浏览器,请尝试使用meta标签设置字符集.例如

E.g. If you're outputting to a web browser, try setting the character set with a meta tag. E.g.

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

如果您需要输出非UTF-8格式的内容,则需要先转换为备用字符集.

If you need to output in something other than UTF-8 you'll need to convert into the alternative character set first.

这篇关于PHP DomDocument-为什么用短划线"–"表示转换为–的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆