DomDocument和html实体 [英] DomDocument and html entities

查看：87 发布时间：2016/11/19 13:50:18 php character-encoding domdocument

本文介绍了DomDocument和html实体的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试解析包含一些HTML实体的HTML，例如

I'm trying to parse some HTML that includes some HTML entities, like ×

$str = '<a href="http://example.com/"> A &#215; B</a>';

$dom = new DomDocument;
$dom -> substituteEntities = false;
$dom ->loadHTML($str);

$link = $dom ->getElementsByTagName('a') -> item(0);
$fullname = $link -> nodeValue;
$href = $link -> getAttribute('href');

echo "
fullname: $fullname \n
href: $href\n";

但是DomDocument用A？B代替文本。

but DomDocument substitutes the text for for A Ã— B.

有没有办法让它不接受&对于一个html实体，让它只是离开它？我试图将substituteEntities设置为false，但它不做任何事情

Is there some way to keep it from taking the & for an html entity and make it just leave it alone? I tried to set substituteEntities to false but it doesn't do anything

推荐答案

从文档：

DOM扩展使用UTF-8编码。

使用utf8_encode（）和utf8_decode（）处理ISO-8859-1编码中的文本或Iconv for other编码。

The DOM extension uses UTF-8 encoding.
Use utf8_encode() and utf8_decode() to work with texts in ISO-8859-1 encoding or Iconv for other encodings.

假设您使用的是latin-1，请尝试：

Assuming you're using latin-1 try:

<?php header('Content-type:text/html;charset=iso-8859-1'); $str = utf8_encode('<a href="http://example.com/"> A × B</a>'); $dom = new DOMDocument; $dom -> substituteEntities = false; $dom ->loadHTML($str); $link = $dom ->getElementsByTagName('a') -> item(0); $fullname = utf8_decode($link -> nodeValue); $href = $link -> getAttribute('href'); echo " fullname: $fullname \n href: $href\n"; ?>

这篇关于DomDocument和html实体的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

DomDocument和html实体 [英] DomDocument and html entities

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

DomDocument和html实体 [英] DomDocument and html entities

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭