XMLReader &simpleXML Combo,带条件
[英] XMLReader & simpleXML Combo, with Conditions
本文介绍了XMLReader &simpleXML Combo,带条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我使用 XMLReader 和 simpleXML 的组合来解析 WordPress 导出文件中的帖子.我意识到这有点不正常,但它更多是备份项目,因此如果我们将来需要,我们可以轻松地提取其中一篇文章.他们所在的 WP 网站需要关闭.
我遇到的问题是 XML 文件中的某些节点为空或包含无用值(即不是完整的帖子).我需要添加一些字符串长度条件,但是,我不确定如何检查每个条件.
open($path_to_xml_file);while($reader->read()){if($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'item'){$doc = new DOMDocument('1.0', 'UTF-8');$xml = simplexml_import_dom($doc->importNode($reader->expand(),true));//echo $xml->title;//管他呢//照顾好文章$newcontent = $xml->children('http://purl.org/rss/1.0/modules/content/');$contentString = $newcontent->encoded;$titleString = $xml->title;回声'<div class="article-container" id="article-' . $xml->title . '"><a href="#top" class="top-link">返回顶部</a><h2>'.$xml-> 标题.'</h2><div class="文章">'.$newcontent->encoded .'</div>
';}}?>
我能够仅使用 simpleXML 就成功地检查了这一点,但是,它本身就占用了太多的内存.这是我的 simplexml 代码:
item as $item) :$newcontent = $item->children('http://purl.org/rss/1.0/modules/content/');?><?php$contentString = $newcontent->encoded;$titleString = $item->title;if ((strlen($contentString) <13) || (strlen($titleString) <5)) {回声'';} 别的 {回声'<div class="article-container" id="article-' . $item->title . '"><a href="#top" class="top-link">返回顶部</a><h2>'.$item->title .'</h2><div class="文章">'.$newcontent->encoded .'</div>
';}?><?php endforeach;?>
更新
在 Francis 的帮助下,它现在可以工作了.代码如下:
open($path_to_xml_file);$contentNS = 'http://purl.org/rss/1.0/modules/content/';while($reader->read()) {if($reader->nodeType == XMLReader::ELEMENT and $reader->name == 'item') {$doc = new DOMDocument('1.0','UTF-8');$xml = simplexml_import_dom($doc->importNode($reader->expand(), true));$titleString = (string) $xml->title;$contentString = (string) $xml->children($contentNS)->encoded;如果 (strlen($contentString) > 12 和 strlen($titleString) > 4) {//小心你的输出转义!//下面这看起来可能是错误的://- 用于 ID 的 $titleString(使用 slug)//- $titleString 未转义//- $contentString 应该被转义?不确定这里.//你考虑过使用 XMLWriter() 吗?回声'<div class="article-container" id="article-' . $titleString . '"><a href="#top" class="top-link">返回顶部</a><h2>'.$titleString .'</h2><div class="文章">'.$contentString .'</div>
';} 别的 {回声'';}$reader->next();//跳过子树,转到下一个兄弟项//我们已经扩展()了它,所以我们不需要走它.}}?>
解决方案
当你说 $contentString = $newcontent->encoded
时,$contentString
的类型是不是 string
而是 SimpleXMLElement
.因此 strlen()
返回了一些无意义的东西.
您需要将 SimpleXMLElement
s 显式转换为 string
以获取元素的文本值:
$contentString = (string) $newcontent->encoded;
顺便说一句,您可以通过使用 XMLReader::expand()
的可选参数来简化 DOM 扩展和转换为 SimpleXMLElement
:
$sxe = simplexml_import_dom($reader->expand(new DOMDocument('1.0','UTF-8')));
EDIT 使用您的第一个代码块的完整示例编写来执行您想要的操作(我认为?)正如您所看到的,我所做的只是从您的第二个代码示例中获取内部循环并放入它在您的第一个代码示例的内部循环中.
$reader = new XMLReader();$reader->open($path_to_xml_file);$contentNS = 'http://purl.org/rss/1.0/modules/content/';while($reader->read()) {if($reader->nodeType == XMLReader::ELEMENT and $reader->name == 'item') {$xml = simplexml_import_dom($reader->expand(new DOMDocument('1.0', 'UTF-8')));$titleString = (string) $xml->title;$contentString = (string) $xml->children($contentNS)->encoded;如果 (strlen($contentString) > 12 和 strlen($titleString) > 4) {//小心你的输出转义!//下面这看起来可能是错误的://- 用于 ID 的 $titleString(使用 slug)//- $titleString 未转义//- $contentString 应该被转义?不确定在这里.//你考虑过使用 XMLWriter() 吗?回声'<div class="article-container" id="article-' . $titleString . '"><a href="#top" class="top-link">返回顶部</a><h2>'.$titleString .'</h2><div class="文章">'.$contentString .'</div>
';}$reader->next();//跳过子树,转到下一个兄弟项//我们已经扩展()了它,所以我们不需要走它.}}
I am using a combination of XMLReader and simpleXML to parse the Posts in a WordPress export file. I realize this is a little out of the norm but, its more of backup project, so we can easily pull up one of these articles if we need it in the futre. The WP site that they were on needs to come down.
The issue I am having is that some of the nodes in the XML file are empty or contain useless values (ie. Not full posts). I need to add some string length conditions but, I'm not sure how to check for each one.
<?php
$path_to_xml_file = 'compress.zlib://wordpress.2011.xml.gz';
$reader = new XMLReader();
$reader->open($path_to_xml_file);
while($reader->read())
{
if($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'item')
{
$doc = new DOMDocument('1.0', 'UTF-8');
$xml = simplexml_import_dom($doc->importNode($reader->expand(),true));
//echo $xml->title; //or whatever
// Take care of the articles
$newcontent = $xml->children('http://purl.org/rss/1.0/modules/content/');
$contentString = $newcontent->encoded;
$titleString = $xml->title;
echo '
<div class="article-container" id="article-' . $xml->title . '">
<a href="#top" class="top-link">Back to the Top</a>
<h2>' . $xml->title . '</h2>
<div class="articles">' . $newcontent->encoded . '</div>
</div>';
}
}
?>
I was able to successfully check this with just simpleXML but, it was too much of a memory hog all by itself. This was my simplexml code:
<?php
$url = 'wordpress.2011.xml.gz';
$xml = new SimpleXMLElement("compress.zlib://$url", NULL, TRUE);
foreach ($xml->item as $item) :
$newcontent = $item->children('http://purl.org/rss/1.0/modules/content/');
?>
<?php
$contentString = $newcontent->encoded;
$titleString = $item->title;
if ((strlen($contentString) < 13) || (strlen($titleString) < 5)) {
echo '';
} else {
echo '
<div class="article-container" id="article-' . $item->title . '">
<a href="#top" class="top-link">Back to the Top</a>
<h2>' . $item->title . '</h2>
<div class="articles">' . $newcontent->encoded . '</div>
</div>';
}
?>
<?php endforeach; ?>
UPDATE
With Francis' help, it is working now. Here is the code:
<?php
$path_to_xml_file = 'compress.zlib://wordpress.2011.xml.gz';
$reader = new XMLReader();
$reader->open($path_to_xml_file);
$contentNS = 'http://purl.org/rss/1.0/modules/content/';
while($reader->read()) {
if($reader->nodeType == XMLReader::ELEMENT and $reader->name == 'item') {
$doc = new DOMDocument('1.0','UTF-8');
$xml = simplexml_import_dom($doc->importNode($reader->expand(), true));
$titleString = (string) $xml->title;
$contentString = (string) $xml->children($contentNS)->encoded;
if (strlen($contentString) > 12 and strlen($titleString) > 4) {
// Be careful with your output escaping!
// This below looks like it might be wrong:
// - $titleString for an ID (use slug)
// - $titleString not escaped
// - $contentString should be escaped? not sure here.
// Have you considered using XMLWriter()?
echo '
<div class="article-container" id="article-' . $titleString . '">
<a href="#top" class="top-link">Back to the Top</a>
<h2>' . $titleString . '</h2>
<div class="articles">' . $contentString . '</div>
</div>';
} else {
echo'';
}
$reader->next(); //skip the subtrees, go to next item sibling
// we already expand()ed this so we don't need to walk it.
}
}
?>
解决方案
When you say $contentString = $newcontent->encoded
, the type of $contentString
is not string
but SimpleXMLElement
. Thus strlen()
is returning something nonsensical.
You need to explicitly cast SimpleXMLElement
s to string
to get the text value of the element:
$contentString = (string) $newcontent->encoded;
As an aside, you can simplify your DOM expansion and conversion to SimpleXMLElement
by using the optional argument to XMLReader::expand()
:
$sxe = simplexml_import_dom($reader->expand(new DOMDocument('1.0','UTF-8')));
EDIT with a complete example of your first code block written to do what you want (I think?) As you can see all I did was take the inner loop from your second code example and put it in the inner loop in your first code example.
$reader = new XMLReader();
$reader->open($path_to_xml_file);
$contentNS = 'http://purl.org/rss/1.0/modules/content/';
while($reader->read()) {
if($reader->nodeType == XMLReader::ELEMENT and $reader->name == 'item') {
$xml = simplexml_import_dom($reader->expand(new DOMDocument('1.0', 'UTF-8')));
$titleString = (string) $xml->title;
$contentString = (string) $xml->children($contentNS)->encoded;
if (strlen($contentString) > 12 and strlen($titleString) > 4) {
// Be careful with your output escaping!
// This below looks like it might be wrong:
// - $titleString for an ID (use slug)
// - $titleString not escaped
// - $contentString should be escaped? not sure here.
// Have you considered using XMLWriter()?
echo '
<div class="article-container" id="article-' . $titleString . '">
<a href="#top" class="top-link">Back to the Top</a>
<h2>' . $titleString . '</h2>
<div class="articles">' . $contentString . '</div>
</div>';
}
$reader->next(); //skip the subtrees, go to next item sibling
// we already expand()ed this so we don't need to walk it.
}
}
这篇关于XMLReader &simpleXML Combo,带条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!