尝试仅解析RSS Feed中的图像 [英] Trying to Parse Only the Images from an RSS Feed

查看:81
本文介绍了尝试仅解析RSS Feed中的图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我是php新手.我在 这里 .但是,出于我的需要,对各种文章的分析还不够深入.

First, I am a php newbie. I have looked at the question and solution here. For my needs however, the parsing does not go deep enough into the various articles.

我的rss feed的一小部分样本如下:

A small sampling of my rss feed reads like this:

 <channel>
 <atom:link href="http://mywebsite.com/rss" rel="self" type="application/rss+xml" />
 <title>My Web Site</title>
 <description>My Feed</description>
 <link>http://mywebsite.com/</link>

 <image>
 <url>http://mywebsite.com/views/images/banner.jpg</url>
 <title>My Title</title>
 <link>http://mywebsite.com/</link>
 <description>Visit My Site</description>
 </image>

 <item>
 <title>Article One</title>
 <guid isPermaLink="true">http://mywebsite.com/details/e8c5106</guid>
 <link>http://mywebsite.com/geturl/e8c5106</link>
 <comments>http://mywebsite.com/details/e8c5106#comments</comments>     
 <pubDate>Wed, 09 Jan 2013 02:59:45 -0500</pubDate> 
 <category>Category 1</category>    
 <description>
      <![CDATA[<div>
      <img src="http://mywebsite.com/myimages/1521197-main.jpg" width="120" border="0"  />  
      <ul><li>Poster: someone's name;</li>
      <li>PostDate: Tue, 08 Jan 2013 21:49:35 -0500</li>
      <li>Rating: 5</li>
      <li>Summary:Lorem ipsum dolor </li></ul></div><div style="clear:both;">]]>
      </description>
 </item> 
 <item>..

我要解析的图像链接是每个项目> 说明

The image links that I want to parse out are the ones way inside each Item > Description

我的php文件中的代码为:

The code in my php file reads:

     <?php
 $xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1&r=ceddfb43483437b1ed08ab8a72cbc3d5');
 $imgs = $xml->xpath('/item/description/img');
 foreach($imgs as $image) {
      echo $image->src;
 }
 ?>

有人可以帮我弄清楚如何配置上面的php代码吗?

Can someone please help me figure out how to configure the php code above?

还有一个非常新手的问题...一旦获得了生成的图像URL,如何在html上连续显示图像?

Also a very newbie question... once I get the resulting image urls, how can I display the images in a row on my html?

非常感谢!!!!

Hernando

推荐答案

RSS feed中的<img>标记实际上不是XML文档的元素,与本网站上突出显示的语法相反,它们只是<description>元素恰好包含字符<>.

The <img> tags inside that RSS feed are not actually elements of the XML document, contrary to the syntax highlighting on this site - they are just text inside the <description> element which happen to contain the characters < and >.

字符串<![CDATA[告诉XML解析器,从此处开始直到遇到]]>的所有内容都将被视为原始字符串,而不管其包含什么内容.这对于将HTML嵌入XML很有用,因为HTML标记不一定是有效的XML.这等效于转义整个HTML(例如使用htmlspecialchars),以便<img>标记看起来像&lt;img&gt;. (我进入了有关另一个答案的更多技术细节.)

The string <![CDATA[ tells the XML parser that everything from there until it encounters ]]> is to be treated as a raw string, regardless of what it contains. This is useful for embedding HTML inside XML, since the HTML tags wouldn't necessarily be valid XML. It is equivalent to escaping the whole HTML (e.g. with htmlspecialchars) so that the <img> tags would look like &lt;img&gt;. (I went into more technical details on another answer.)

因此要从RSS中提取图像,需要两个步骤:首先,获取每个<description>的文本,其次,找到该文本中的所有<img>标签.

So to extract the images from the RSS requires two steps: first, get the text of each <description>, and second, find all the <img> tags in that text.

$xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1&r=ceddfb43483437b1ed08ab8a72cbc3d5');

$descriptions = $xml->xpath('//item/description');
foreach ( $descriptions as $description_node ) {
    // The description may not be valid XML, so use a more forgiving HTML parser mode
    $description_dom = new DOMDocument();
    $description_dom->loadHTML( (string)$description_node );

    // Switch back to SimpleXML for readability
    $description_sxml = simplexml_import_dom( $description_dom );

    // Find all images, and extract their 'src' param
    $imgs = $description_sxml->xpath('//img');
    foreach($imgs as $image) {
        echo (string)$image['src'];
    }
}

这篇关于尝试仅解析RSS Feed中的图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆