如何使用SimpleXML解析XML的XML的HTML内容? [英] How to parse CDATA HTML-content of XML using SimpleXML?

查看:161
本文介绍了如何使用SimpleXML解析XML的XML的HTML内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在表格中显示Xml内容,所有的工作都很完美,但是我不想显示标签中的一些内容,我只想要图像但不是来自5.10的2012年11月日历<

 < content:encoded><![CDATA [< p> 2012年11月的日历5.10测试< / p> 
< p>< a class =shutterset_href ='http://trance-gemini.com/wordpress/wp-content/gallery/calendars/laura-bertram-trance-gemini-145-1080 .jpg'title ='& lt; br& gt; 2012年11月的日历5.10 The Test& gt; & lt; a href =& quot;< / a>< / p>]]>
< / content:encoded>

我想显示图片但不是

2012年11月从5.10开始的日历

 <?php 
//载入SimpleXML
$ item = new SimpleXMLElement('test1。 xml',null,true);

echo<<< EOF
< table border =1px>>
< tr cl>

< / tr>
EOF;
foreach($ item-> channel-> item as $ boo)//循环我们的书籍
{
echo<<< EOF

< tr>
< td rowspan =3> {$ boo-> children('content',true) - > encoded}< / td>
< td> {$ boo-> title}< / td>
< / tr>

< tr>
< td> {$ boo->说明}< / td>
< / tr>

< tr>
< td> {boo->评论}< / td>
< / tr>
EOF;
}
echo'< / table>';
?>


解决方案

我曾经回答过,但我没有找到如果你看一下字符串(简化/美化):

 > < content:encoded><![CDATA [
< p> Lorem Ipsom< / p>
< p>
< a href ='laura-bertram-trance-gemini-145-1080.jpg'
title ='& lt; br& gt; 2012年11月的日历5.10 The Test& lt; br& amp ; GT; & lt; a href =& quot;< / a>
< / p>]]>
< / content:encoded>

您可以看到您的HTML编码的节点值< content:encoded> 元素。所以首先你需要获得HTML值,你已经做了:

  $ html = $ boo-> children('内容',真) - >编码的; 

然后你需要解析 $ html 。使用哪种库可以用PHP完成HTML解析:



如果您决定使用或多或少为作业推荐 DOMDocument ,您只需要获取某个元素的属性值:





或者你已经使用它的姊妹库SimpleXML(所以这个更推荐,下一节也要看):







在您的问题的背景下,以下提示:



您正在使用SimpleXML。 DOMDocument是一个姊妹库,这意味着你可以在两者之间进行交换,所以你不需要学习一个全新的库。



例如,只能使用 DOMDocument 的HTML解析功能,但将其导入到 SimpleXML 中。这很有用,因为SimpleXML不支持HTML解析。



可以通过 simplexml_import_dom()



一个简化的分步示例:

  //从提要中获取HTML字符串:
$ htmlString = $ boo-> children('content',真) - >编码;

//为DOM解析创建DOMDocument:
$ htmlParser = new DOMDocument();

//载入HTML:
$ htmlParser-> loadHTML($ htmlString);

//将其导入到simplexml中:
$ html = simplexml_import_dom($ htmlParser);

现在您可以使用 $ html 作为代表HTML文档的新SimpleXMLElement。由于您的HTML块没有任何< body> 标签,根据HTML规范,它们被放在< body> 标签。这将允许您例如访问第二个<$ c c>< a> 中的第一个属性

$ p $ //访问你的元素'寻找:
$ href = $ html-> body-> p [1] - > a ['href'];

这里是上面的完整视图(在线演示):

  //从提要中获取HTML字符串:
$ htmlString = $ boo-> children('content',true) - >编码;

//为DOM解析创建DOMDocument:
$ htmlParser = new DOMDocument();

//您的HTML给解析器警告,使它们保持内部:
libxml_use_internal_errors(true);

//载入HTML:
$ htmlParser-> loadHTML($ htmlString);

//将其导入到simplexml中:
$ html = simplexml_import_dom($ htmlParser);

//访问你正在寻找的元素:
$ href = $ html-> body-> p [1] - > a ['href'];

//输出
echo $ href,\\\
;

以及输出结果:

  laura-bertram-trance-gemini-145-1080.jpg 


I am trying to display Xml content in to tables, all works perfectly but some content in the tag that i don't want to display, I want only image but not

November 2012 calendar from 5.10 The Test

like in xml,

 <content:encoded><![CDATA[<p>November 2012 calendar from 5.10 The Test</p>
    <p><a class="shutterset_" href='http://trance-gemini.com/wordpress/wp-content/gallery/calendars/laura-bertram-trance-gemini-145-1080.jpg' title='&lt;br&gt;November 2012 calendar from 5.10 The Test&lt;br&gt; &lt;a href=&quot;</a></p>]]>
</content:encoded> 

I want to display image but not

November 2012 calendar from 5.10 The Test

.

<?php
// load SimpleXML
$item = new SimpleXMLElement('test1.xml', null, true);

echo <<<EOF
<table border="1px">
        <tr cl>

        </tr>       
EOF;
foreach($item->channel->item as $boo) // loop through our books
{
        echo <<<EOF

         <tr>
            <td rowspan="3">{$boo->children('content', true)->encoded}</td>
            <td>{$boo->title}</td>   
        </tr>

        <tr>
           <td>{$boo->description}</td>
        </tr>

        <tr>
           <td>{boo->comments}</td>
        </tr>
EOF;
}
echo '</table>';
?>

解决方案

I once answered it but I don't find the answer any longer.

If you take a look at the string (simplified/beautified):

<content:encoded><![CDATA[
    <p>Lorem Ipsom</p>
    <p>
      <a href='laura-bertram-trance-gemini-145-1080.jpg' 
         title='&lt;br&gt;November 2012 calendar from 5.10 The Test&lt;br&gt; &lt;a href=&quot;</a>
    </p>]]>
</content:encoded> 

You can see that you have HTML encoded inside the node-value of the <content:encoded> element. So first you need to obtain the HTML value, which you already do:

$html = $boo->children('content', true)->encoded;

Then you need to parse the HTML inside $html. With which libraries HTML parsing can be done with PHP is outlined in:

If you decide to use the more or less recommended DOMDocument for the job, you only need to get the attribute value of a certain element:

Or for its sister library SimpleXML you already use (so this is more recommended, see as well the next section):


In context of your question here the following tip:

You're using SimpleXML. DOMDocument is a sister-library, meaning you can interchange between the two so you don't need to learn a full new library.

For example, you can use only the HTML parsing feature of DOMDocument, but import it then into SimpleXML. This is useful, because SimpleXML does not support HTML parsing.

That works via simplexml_import_dom().

A simplified step-by-step example:

// get the HTML string out of the feed:
$htmlString = $boo->children('content', true)->encoded;

// create DOMDocument for HTML parsing:
$htmlParser = new DOMDocument();

// load the HTML:
$htmlParser->loadHTML($htmlString);

// import it into simplexml:
$html = simplexml_import_dom($htmlParser);

Now you can use $html as a new SimpleXMLElement that represents the HTML document. As your HTML chunks did not have any <body> tags, according to the HTML specification, they are put inside the <body> tag. This will allow you for example to access the href attribute of the first <a> inside the second <p> element in your example:#

// access the element you're looking for:
$href = $html->body->p[1]->a['href'];

Here the full view from above (Online Demo):

// get the HTML string out of the feed:
$htmlString = $boo->children('content', true)->encoded;

// create DOMDocument for HTML parsing:
$htmlParser = new DOMDocument();

// your HTML gives parser warnings, keep them internal:
libxml_use_internal_errors(true);

// load the HTML:
$htmlParser->loadHTML($htmlString);

// import it into simplexml:
$html = simplexml_import_dom($htmlParser);

// access the element you're looking for:
$href = $html->body->p[1]->a['href'];

// output it
echo $href, "\n";

And what it outputs:

laura-bertram-trance-gemini-145-1080.jpg

这篇关于如何使用SimpleXML解析XML的XML的HTML内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆