如何使用SimpleXML解析XML的XML的HTML内容？ [英] How to parse CDATA HTML-content of XML using SimpleXML?

查看：161 发布时间：2018/6/13 10:58:54 php html xml rss simplexml

本文介绍了如何使用SimpleXML解析XML的XML的HTML内容？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在表格中显示Xml内容，所有的工作都很完美，但是我不想显示标签中的一些内容，我只想要图像但不是来自5.10的2012年11月日历<

 < content：encoded><！[CDATA [< p> 2012年11月的日历5.10测试< / p> 
< p>< a class =shutterset_href ='http：//trance-gemini.com/wordpress/wp-content/gallery/calendars/laura-bertram-trance-gemini-145-1080 .jpg'title ='& lt; br& gt; 2012年11月的日历5.10 The Test& gt; & lt; a href =& quot;< / a>< / p>]]> 
< / content：encoded>

我想显示图片但不是

2012年11月从5.10开始的日历

。

 <？php 
 //载入SimpleXML 
 $ item = new SimpleXMLElement（'test1。 xml'，null，true）; 
 
 echo<<< EOF 
< table border =1px>> 
< tr cl> 
 
< / tr> 
 EOF; 
 foreach（$ item-> channel-> item as $ boo）//循环我们的书籍
 {
 echo<<< EOF 
 
< tr> 
< td rowspan =3> {$ boo-> children（'content'，true） - > encoded}< / td> 
< td> {$ boo-> title}< / td> 
< / tr> 
 
< tr> 
< td> {$ boo->说明}< / td> 
< / tr> 
 
< tr> 
< td> {boo->评论}< / td> 
< / tr> 
 EOF; 
} 
 echo'< / table>'; 
？>

解决方案

我曾经回答过，但我没有找到如果你看一下字符串（简化/美化）：

 > < content：encoded><！[CDATA [
< p> Lorem Ipsom< / p> 
< p> 
< a href ='laura-bertram-trance-gemini-145-1080.jpg'
 title ='& lt; br& gt; 2012年11月的日历5.10 The Test& lt; br& amp ; GT; & lt; a href =& quot;< / a> 
< / p>]]> 
< / content：encoded>

您可以看到您的HTML编码在的节点值< content：encoded> 元素。所以首先你需要获得HTML值，你已经做了：

  $ html = $ boo-> children（'内容'，真） - >编码的;

然后你需要解析 $ html 。使用哪种库可以用PHP完成HTML解析：

如何使用PHP解析和处理HTML / XML？

如果您决定使用或多或少为作业推荐 DOMDocument ，您只需要获取某个元素的属性值：

PHP DOMDocument获取标记属性

或者你已经使用它的姊妹库SimpleXML（所以这个更推荐，下一节也要看）：

如何使用SimpleXML获取属性？

在您的问题的背景下，以下提示：

您正在使用SimpleXML。 DOMDocument是一个姊妹库，这意味着你可以在两者之间进行交换，所以你不需要学习一个全新的库。

例如，只能使用 DOMDocument 的HTML解析功能，但将其导入到 SimpleXML 中。这很有用，因为SimpleXML不支持HTML解析。

可以通过 simplexml_import_dom（） 。

一个简化的分步示例：
//从提要中获取HTML字符串： $ htmlString = $ boo-> children（'content'，真） - >编码; //为DOM解析创建DOMDocument： $ htmlParser = new DOMDocument（）; //载入HTML： $ htmlParser-> loadHTML（$ htmlString）; //将其导入到simplexml中： $ html = simplexml_import_dom（$ htmlParser）;
现在您可以使用 $ html 作为代表HTML文档的新SimpleXMLElement。由于您的HTML块没有任何< body> 标签，根据HTML规范，它们被放在< body> 标签。这将允许您例如访问第二个<$ c c>< a> 中的第一个属性 $ p $ //访问你的元素'寻找： $ href = $ html-> body-> p [1] - > a ['href'];
这里是上面的完整视图（在线演示）：
//从提要中获取HTML字符串： $ htmlString = $ boo-> children（'content'，true） - >编码; //为DOM解析创建DOMDocument： $ htmlParser = new DOMDocument（）; //您的HTML给解析器警告，使它们保持内部： libxml_use_internal_errors（true）; //载入HTML： $ htmlParser-> loadHTML（$ htmlString）; //将其导入到simplexml中： $ html = simplexml_import_dom（$ htmlParser）; //访问你正在寻找的元素： $ href = $ html-> body-> p [1] - > a ['href']; //输出 echo $ href，\\\ ;
以及输出结果：
laura-bertram-trance-gemini-145-1080.jpg
I am trying to display Xml content in to tables, all works perfectly but some content in the tag that i don't want to display, I want only image but not
November 2012 calendar from 5.10 The Test
like in xml, <content:encoded><![CDATA[November 2012 calendar from 5.10 The Test <a class="shutterset_" href='http://trance-gemini.com/wordpress/wp-content/gallery/calendars/laura-bertram-trance-gemini-145-1080.jpg' title=' November 2012 calendar from 5.10 The Test <a href="</a>]]> </content:encoded> I want to display image but not November 2012 calendar from 5.10 The Test . <?php // load SimpleXML $item = new SimpleXMLElement('test1.xml', null, true); echo <<<EOF <table border="1px"> <tr cl> </tr> EOF; foreach($item->channel->item as $boo) // loop through our books { echo <<<EOF <tr> <td rowspan="3">{$boo->children('content', true)->encoded}</td> <td>{$boo->title}</td> </tr> <tr> <td>{$boo->description}</td> </tr> <tr> <td>{boo->comments}</td> </tr> EOF; } echo '</table>'; ?> 解决方案 I once answered it but I don't find the answer any longer. If you take a look at the string (simplified/beautified): <content:encoded><![CDATA[ Lorem Ipsom <a href='laura-bertram-trance-gemini-145-1080.jpg' title=' November 2012 calendar from 5.10 The Test <a href="</a> ]]> </content:encoded> You can see that you have HTML encoded inside the node-value of the <content:encoded> element. So first you need to obtain the HTML value, which you already do: $html = $boo->children('content', true)->encoded; Then you need to parse the HTML inside $html. With which libraries HTML parsing can be done with PHP is outlined in: How to parse and process HTML/XML with PHP? If you decide to use the more or less recommended DOMDocument for the job, you only need to get the attribute value of a certain element: PHP DOMDocument getting Attribute of Tag Or for its sister library SimpleXML you already use (so this is more recommended, see as well the next section): How to get an attribute with SimpleXML? In context of your question here the following tip: You're using SimpleXML. DOMDocument is a sister-library, meaning you can interchange between the two so you don't need to learn a full new library. For example, you can use only the HTML parsing feature of DOMDocument, but import it then into SimpleXML. This is useful, because SimpleXML does not support HTML parsing. That works via simplexml_import_dom(). A simplified step-by-step example: // get the HTML string out of the feed: $htmlString = $boo->children('content', true)->encoded; // create DOMDocument for HTML parsing: $htmlParser = new DOMDocument(); // load the HTML: $htmlParser->loadHTML($htmlString); // import it into simplexml: $html = simplexml_import_dom($htmlParser); Now you can use $html as a new SimpleXMLElement that represents the HTML document. As your HTML chunks did not have any <body> tags, according to the HTML specification, they are put inside the <body> tag. This will allow you for example to access the href attribute of the first <a> inside the second element in your example:# // access the element you're looking for: $href = $html->body->p[1]->a['href']; Here the full view from above (Online Demo): // get the HTML string out of the feed: $htmlString = $boo->children('content', true)->encoded; // create DOMDocument for HTML parsing: $htmlParser = new DOMDocument(); // your HTML gives parser warnings, keep them internal: libxml_use_internal_errors(true); // load the HTML: $htmlParser->loadHTML($htmlString); // import it into simplexml: $html = simplexml_import_dom($htmlParser); // access the element you're looking for: $href = $html->body->p[1]->a['href']; // output it echo $href, "\n"; And what it outputs: laura-bertram-trance-gemini-145-1080.jpg 这篇关于如何使用SimpleXML解析XML的XML的HTML内容？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用SimpleXML解析XML的XML的HTML内容？ [英] How to parse CDATA HTML-content of XML using SimpleXML?

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

如何使用SimpleXML解析XML的XML的HTML内容？ [英] How to parse CDATA HTML-content of XML using SimpleXML?

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭