如何使用SimpleXML解析XML的XML的HTML内容? [英] How to parse CDATA HTML-content of XML using SimpleXML?
问题描述
我试图在表格中显示Xml内容,所有的工作都很完美,但是我不想显示标签中的一些内容,我只想要图像但不是来自5.10的2012年11月日历<
< content:encoded><![CDATA [< p> 2012年11月的日历5.10测试< / p>
< p>< a class =shutterset_href ='http://trance-gemini.com/wordpress/wp-content/gallery/calendars/laura-bertram-trance-gemini-145-1080 .jpg'title ='& lt; br& gt; 2012年11月的日历5.10 The Test& gt; & lt; a href =& quot;< / a>< / p>]]>
< / content:encoded>
我想显示图片但不是
2012年11月从5.10开始的日历
。 <?php
//载入SimpleXML
$ item = new SimpleXMLElement('test1。 xml',null,true);
echo<<< EOF
< table border =1px>>
< tr cl>
< / tr>
EOF;
foreach($ item-> channel-> item as $ boo)//循环我们的书籍
{
echo<<< EOF
< tr>
< td rowspan =3> {$ boo-> children('content',true) - > encoded}< / td>
< td> {$ boo-> title}< / td>
< / tr>
< tr>
< td> {$ boo->说明}< / td>
< / tr>
< tr>
< td> {boo->评论}< / td>
< / tr>
EOF;
}
echo'< / table>';
?>
我曾经回答过,但我没有找到如果你看一下字符串(简化/美化):
> < content:encoded><![CDATA [
< p> Lorem Ipsom< / p>
< p>
< a href ='laura-bertram-trance-gemini-145-1080.jpg'
title ='& lt; br& gt; 2012年11月的日历5.10 The Test& lt; br& amp ; GT; & lt; a href =& quot;< / a>
< / p>]]>
< / content:encoded>
您可以看到您的HTML编码在的节点值< content:encoded>
元素。所以首先你需要获得HTML值,你已经做了:
$ html = $ boo-> children('内容',真) - >编码的;
然后你需要解析 $ html
。使用哪种库可以用PHP完成HTML解析:
如果您决定使用或多或少为作业推荐 DOMDocument
,您只需要获取某个元素的属性值:
或者你已经使用它的姊妹库SimpleXML(所以这个更推荐,下一节也要看):
在您的问题的背景下,以下提示:
您正在使用SimpleXML。 DOMDocument是一个姊妹库,这意味着你可以在两者之间进行交换,所以你不需要学习一个全新的库。
例如,只能使用 DOMDocument
的HTML解析功能,但将其导入到 SimpleXML
中。这很有用,因为SimpleXML不支持HTML解析。
可以通过 simplexml_import_dom()
。
一个简化的分步示例:
//从提要中获取HTML字符串:
$ htmlString = $ boo-> children('content',真) - >编码;
//为DOM解析创建DOMDocument:
$ htmlParser = new DOMDocument();
//载入HTML:
$ htmlParser-> loadHTML($ htmlString);
//将其导入到simplexml中:
$ html = simplexml_import_dom($ htmlParser);
现在您可以使用 这里是上面的完整视图(在线演示): 以及输出结果: I am trying to display Xml content in to tables, all works perfectly but some content in the tag that i don't want to display, I want only image but not November 2012 calendar from 5.10 The Test $ html
作为代表HTML文档的新SimpleXMLElement。由于您的HTML块没有任何< body>
标签,根据HTML规范,它们被放在< body>
标签。这将允许您例如访问第二个<$ c c>< a> 中的第一个属性
$ p $ //访问你的元素'寻找:
$ href = $ html-> body-> p [1] - > a ['href'];
//从提要中获取HTML字符串:
$ htmlString = $ boo-> children('content',true) - >编码;
//为DOM解析创建DOMDocument:
$ htmlParser = new DOMDocument();
//您的HTML给解析器警告,使它们保持内部:
libxml_use_internal_errors(true);
//载入HTML:
$ htmlParser-> loadHTML($ htmlString);
//将其导入到simplexml中:
$ html = simplexml_import_dom($ htmlParser);
//访问你正在寻找的元素:
$ href = $ html-> body-> p [1] - > a ['href'];
//输出
echo $ href,\\\
;
laura-bertram-trance-gemini-145-1080.jpg
<content:encoded><![CDATA[<p>November 2012 calendar from 5.10 The Test</p>
<p><a class="shutterset_" href='http://trance-gemini.com/wordpress/wp-content/gallery/calendars/laura-bertram-trance-gemini-145-1080.jpg' title='<br>November 2012 calendar from 5.10 The Test<br> <a href="</a></p>]]>
</content:encoded>
I want to display image but not
November 2012 calendar from 5.10 The Test
.<?php
// load SimpleXML
$item = new SimpleXMLElement('test1.xml', null, true);
echo <<<EOF
<table border="1px">
<tr cl>
</tr>
EOF;
foreach($item->channel->item as $boo) // loop through our books
{
echo <<<EOF
<tr>
<td rowspan="3">{$boo->children('content', true)->encoded}</td>
<td>{$boo->title}</td>
</tr>
<tr>
<td>{$boo->description}</td>
</tr>
<tr>
<td>{boo->comments}</td>
</tr>
EOF;
}
echo '</table>';
?>
I once answered it but I don't find the answer any longer.
If you take a look at the string (simplified/beautified):
<content:encoded><![CDATA[
<p>Lorem Ipsom</p>
<p>
<a href='laura-bertram-trance-gemini-145-1080.jpg'
title='<br>November 2012 calendar from 5.10 The Test<br> <a href="</a>
</p>]]>
</content:encoded>
You can see that you have HTML encoded inside the node-value of the <content:encoded>
element. So first you need to obtain the HTML value, which you already do:
$html = $boo->children('content', true)->encoded;
Then you need to parse the HTML inside $html
. With which libraries HTML parsing can be done with PHP is outlined in:
If you decide to use the more or less recommended DOMDocument
for the job, you only need to get the attribute value of a certain element:
Or for its sister library SimpleXML you already use (so this is more recommended, see as well the next section):
In context of your question here the following tip:
You're using SimpleXML. DOMDocument is a sister-library, meaning you can interchange between the two so you don't need to learn a full new library.
For example, you can use only the HTML parsing feature of DOMDocument
, but import it then into SimpleXML
. This is useful, because SimpleXML does not support HTML parsing.
That works via simplexml_import_dom()
.
A simplified step-by-step example:
// get the HTML string out of the feed:
$htmlString = $boo->children('content', true)->encoded;
// create DOMDocument for HTML parsing:
$htmlParser = new DOMDocument();
// load the HTML:
$htmlParser->loadHTML($htmlString);
// import it into simplexml:
$html = simplexml_import_dom($htmlParser);
Now you can use $html
as a new SimpleXMLElement that represents the HTML document. As your HTML chunks did not have any <body>
tags, according to the HTML specification, they are put inside the <body>
tag. This will allow you for example to access the href
attribute of the first <a>
inside the second <p>
element in your example:#
// access the element you're looking for:
$href = $html->body->p[1]->a['href'];
Here the full view from above (Online Demo):
// get the HTML string out of the feed:
$htmlString = $boo->children('content', true)->encoded;
// create DOMDocument for HTML parsing:
$htmlParser = new DOMDocument();
// your HTML gives parser warnings, keep them internal:
libxml_use_internal_errors(true);
// load the HTML:
$htmlParser->loadHTML($htmlString);
// import it into simplexml:
$html = simplexml_import_dom($htmlParser);
// access the element you're looking for:
$href = $html->body->p[1]->a['href'];
// output it
echo $href, "\n";
And what it outputs:
laura-bertram-trance-gemini-145-1080.jpg
这篇关于如何使用SimpleXML解析XML的XML的HTML内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!