使用 XMLReader 解析媒体 RSS [英] Parsing Media RSS using XMLReader
问题描述
<频道><title>RSS 提要的标题</title><link>http://www.google.com</link><description>有关提要的详细信息</description><pubDate>11 月 24 日星期一 21:44:21 -0500</pubDate><language>en</language><项目><title>第 1 条</title><description><![CDATA[如何使用 StackOverflow.com]]></description><link>http://youtube.com/?v=y6_-cLWwEU0</link><media:player url="http://youtube.com/?v=y6_-cLWwEU0"/><media:thumbnail url="http://img.youtube.com/vi/y6_-cLWwEU0/default.jpg"宽度="120" 高度="90"/><media:title>Jared 在 StackOverflow 上</media:title><media:category label="Tags">tag1,tag2</media:category><media:credit>贾里德</media:credit><enclosure url="http://youtube.com/v/y6_-cLWwEU0.swf"长度=233"type="application/x-shockwave-flash"/></项目></频道></rss>
我决定使用 XMLReader 解析我的大型 xml 文件.我无法获取每个项目中的数据,尤其是缩略图
这是我的代码
//////////////////////////////$itemList = array();$i=0;$xmlReader = new XMLReader();$xmlReader->open('XMLFILE');while($xmlReader->read()) {if($xmlReader->nodeType == XMLReader::ELEMENT) {if($xmlReader->localName == 'title') {$xmlReader->read();$itemList[$i]['title'] = $xmlReader->value;}if($xmlReader->localName == 'description') {//移动到它的文本节点/子节点$xmlReader->read();$itemList[$i]['description'] = $xmlReader->value;}if($xmlReader->localName == 'media:thumbnail') {//移动到它的文本节点/子节点$xmlReader->read();$itemList[$i]['media:thumbnail'] = $xmlReader->value;$i++;}}}///////////////
由于我正在解析巨大的 XML 文件,因此建议使用 DOMXpath 吗?我非常感谢您的建议.
xtian,
如果您关心内存使用情况,我建议您远离 DOM/XPath,因为它要求首先将整个文件读入内存.XMLReader 一次只读取一个块(可能是 8K,因为这似乎是标准的 PHP 块大小).
我重新编写了您最初发布的内容,它捕获了
元素中包含的以下元素:
title
描述
media:thumbnail
media:title
你必须记住的是 XMLReader::localName
将返回元素名称减去任何 XMLNS 声明(例如 media:thumbnail
的 localName
是 缩略图
).您需要注意这一点,因为 media:title
值可能会覆盖 title
值.
这是我重新写的:
";$items = 数组();$i = 0;$xmlReader = new XMLReader();$xmlReader->open (XMLFILE, null, LIBXML_NOBLANKS);$isParserActive = 假;$simpleNodeTypes = array("title", "description", "media:title");while ($xmlReader->read()){$nodeType = $xmlReader->nodeType;//只处理开始/结束标签if ($nodeType != XMLReader::ELEMENT && $nodeType != XMLReader::END_ELEMENT){继续;}else if ($xmlReader->name == "item"){if (($nodeType == XMLReader::END_ELEMENT) && $isParserActive){$i++;}$isParserActive = ($nodeType != XMLReader::END_ELEMENT);}if (!$isParserActive || $nodeType == XMLReader::END_ELEMENT){继续;}$name = $xmlReader->name;if (in_array ($name, $simpleNodeTypes)){//跳转到文本节点$xmlReader->read();$items[$i][$name] = $xmlReader->value;}else if ($name == "media:thumbnail"){$items[$i]['media:thumbnail'] = 数组 (网址"=>$xmlReader->getAttribute("url"),宽度" =>$xmlReader->getAttribute("width"),高度" =>$xmlReader->getAttribute("height"));}}var_dump ($items);echo "</pre>";?>
如果您对它的工作原理有任何疑问,我非常乐意为您解答.
<rss version="2.0"
xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<title>Title of RSS feed</title>
<link>http://www.google.com</link>
<description>Details about the feed</description>
<pubDate>Mon, 24 Nov 08 21:44:21 -0500</pubDate>
<language>en</language>
<item>
<title>Article 1</title>
<description><![CDATA[How to use StackOverflow.com]]></description>
<link>http://youtube.com/?v=y6_-cLWwEU0</link>
<media:player url="http://youtube.com/?v=y6_-cLWwEU0" />
<media:thumbnail url="http://img.youtube.com/vi/y6_-cLWwEU0/default.jpg"
width="120" height="90" />
<media:title>Jared on StackOverflow</media:title>
<media:category label="Tags">tag1,tag2</media:category>
<media:credit>Jared</media:credit>
<enclosure url="http://youtube.com/v/y6_-cLWwEU0.swf"
length="233"
type="application/x-shockwave-flash"/>
</item>
</channel>
</rss>
I decided to use XMLReader parsing my large xml files. I am having trouble getting the data inside each item especially the thumbnail
Here's my code
//////////////////////////////
$itemList = array();
$i=0;
$xmlReader = new XMLReader();
$xmlReader->open('XMLFILE');
while($xmlReader->read()) {
if($xmlReader->nodeType == XMLReader::ELEMENT) {
if($xmlReader->localName == 'title') {
$xmlReader->read();
$itemList[$i]['title'] = $xmlReader->value;
}
if($xmlReader->localName == 'description') {
// move to its textnode / child
$xmlReader->read();
$itemList[$i]['description'] = $xmlReader->value;
}
if($xmlReader->localName == 'media:thumbnail') {
// move to its textnode / child
$xmlReader->read();
$itemList[$i]['media:thumbnail'] = $xmlReader->value;
$i++;
}
}
}
////////////////
Is it advisable to use DOMXpath since I was parsing huge XML file? I really appreciate your advice.
xtian,
If memory usage is a concern of yours, I would recommend staying away from DOM/XPath as it requires that the whole file be read into memory first. XMLReader only reads in a chunk at a time (probably 8K as that seems to be the standard PHP Chunk Size).
I have re-written what you originally posted and it captures the following elements contained within an <item>
Element:
title
description
media:thumbnail
media:title
The thing you have to remember is that XMLReader::localName
will return the Element name minus any XMLNS declaration (e.g. media:thumbnail
's localName
is thumbnail
). You will want to be careful of this as the media:title
value could overwrite the title
value.
Here is what I re-wrote:
<?php
define ('XMLFILE', dirname(__FILE__) . '/Rss.xml');
echo "<pre>";
$items = array ();
$i = 0;
$xmlReader = new XMLReader();
$xmlReader->open (XMLFILE, null, LIBXML_NOBLANKS);
$isParserActive = false;
$simpleNodeTypes = array ("title", "description", "media:title");
while ($xmlReader->read ())
{
$nodeType = $xmlReader->nodeType;
// Only deal with Beginning/Ending Tags
if ($nodeType != XMLReader::ELEMENT && $nodeType != XMLReader::END_ELEMENT)
{
continue;
}
else if ($xmlReader->name == "item")
{
if (($nodeType == XMLReader::END_ELEMENT) && $isParserActive)
{
$i++;
}
$isParserActive = ($nodeType != XMLReader::END_ELEMENT);
}
if (!$isParserActive || $nodeType == XMLReader::END_ELEMENT)
{
continue;
}
$name = $xmlReader->name;
if (in_array ($name, $simpleNodeTypes))
{
// Skip to the text node
$xmlReader->read ();
$items[$i][$name] = $xmlReader->value;
}
else if ($name == "media:thumbnail")
{
$items[$i]['media:thumbnail'] = array (
"url" => $xmlReader->getAttribute("url"),
"width" => $xmlReader->getAttribute("width"),
"height" => $xmlReader->getAttribute("height")
);
}
}
var_dump ($items);
echo "</pre>";
?>
If you have any questions on how this works, I would be more than happy to answer them for you.
这篇关于使用 XMLReader 解析媒体 RSS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!