XMLReader &simpleXML Combo,带条件 [英] XMLReader & simpleXML Combo, with Conditions

查看:30
本文介绍了XMLReader &simpleXML Combo,带条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 XMLReader 和 simpleXML 的组合来解析 WordPress 导出文件中的帖子.我意识到这有点不正常,但它更多是备份项目,因此如果我们将来需要,我们可以轻松地提取其中一篇文章.他们所在的 WP 网站需要关闭.

我遇到的问题是 XML 文件中的某些节点为空或包含无用值(即不是完整的帖子).我需要添加一些字符串长度条件,但是,我不确定如何检查每个条件.

open($path_to_xml_file);while($reader->read()){if($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'item'){$doc = new DOMDocument('1.0', 'UTF-8');$xml = simplexml_import_dom($doc->importNode($reader->expand(),true));//echo $xml->title;//管他呢//照顾好文章$newcontent = $xml->children('http://purl.org/rss/1.0/modules/content/');$contentString = $newcontent->encoded;$titleString = $xml->title;回声'<div class="article-container" id="article-' . $xml->title . '"><a href="#top" class="top-link">返回顶部</a><h2>'.$xml-> 标题.'</h2><div class="文章">'.$newcontent->encoded .'</div>

';}}?>

我能够仅使用 simpleXML 就成功地检查了这一点,但是,它本身就占用了太多的内存.这是我的 simplexml 代码:

item as $item) :$newcontent = $item->children('http://purl.org/rss/1.0/modules/content/');?><?php$contentString = $newcontent->encoded;$titleString = $item->title;if ((strlen($contentString) <13) || (strlen($titleString) <5)) {回声'';} 别的 {回声'<div class="article-container" id="article-' . $item->title . '"><a href="#top" class="top-link">返回顶部</a><h2>'.$item->title .'</h2><div class="文章">'.$newcontent->encoded .'</div>

';}?><?php endforeach;?>

更新

在 Francis 的帮助下,它现在可以工作了.代码如下:

open($path_to_xml_file);$contentNS = 'http://purl.org/rss/1.0/modules/content/';while($reader->read()) {if($reader->nodeType == XMLReader::ELEMENT and $reader->name == 'item') {$doc = new DOMDocument('1.0','UTF-8');$xml = simplexml_import_dom($doc->importNode($reader->expand(), true));$titleString = (string) $xml->title;$contentString = (string) $xml->children($contentNS)->encoded;如果 (strlen($contentString) > 12 和 strlen($titleString) > 4) {//小心你的输出转义!//下面这看起来可能是错误的://- 用于 ID 的 $titleString(使用 slug)//- $titleString 未转义//- $contentString 应该被转义?不确定这里.//你考虑过使用 XMLWriter() 吗?回声'<div class="article-container" id="article-' . $titleString . '"><a href="#top" class="top-link">返回顶部</a><h2>'.$titleString .'</h2><div class="文章">'.$contentString .'</div>

';} 别的 {回声'';}$reader->next();//跳过子树,转到下一个兄弟项//我们已经扩展()了它,所以我们不需要走它.}}?>

解决方案

当你说 $contentString = $newcontent->encoded 时,$contentString 的类型是不是 string 而是 SimpleXMLElement.因此 strlen() 返回了一些无意义的东西.

您需要将 SimpleXMLElements 显式转换为 string 以获取元素的文本值:

$contentString = (string) $newcontent->encoded;

顺便说一句,您可以通过使用 XMLReader::expand() 的可选参数来简化 DOM 扩展和转换为 SimpleXMLElement:

$sxe = simplexml_import_dom($reader->expand(new DOMDocument('1.0','UTF-8')));

EDIT 使用您的第一个代码块的完整示例编写来执行您想要的操作(我认为?)正如您所看到的,我所做的只是从您的第二个代码示例中获取内部循环并放入它在您的第一个代码示例的内部循环中.

$reader = new XMLReader();$reader->open($path_to_xml_file);$contentNS = 'http://purl.org/rss/1.0/modules/content/';while($reader->read()) {if($reader->nodeType == XMLReader::ELEMENT and $reader->name == 'item') {$xml = simplexml_import_dom($reader->expand(new DOMDocument('1.0', 'UTF-8')));$titleString = (string) $xml->title;$contentString = (string) $xml->children($contentNS)->encoded;如果 (strlen($contentString) > 12 和 strlen($titleString) > 4) {//小心你的输出转义!//下面这看起来可能是错误的://- 用于 ID 的 $titleString(使用 slug)//- $titleString 未转义//- $contentString 应该被转义?不确定在这里.//你考虑过使用 XMLWriter() 吗?回声'<div class="article-container" id="article-' . $titleString . '"><a href="#top" class="top-link">返回顶部</a><h2>'.$titleString .'</h2><div class="文章">'.$contentString .'</div>

';}$reader->next();//跳过子树,转到下一个兄弟项//我们已经扩展()了它,所以我们不需要走它.}}

I am using a combination of XMLReader and simpleXML to parse the Posts in a WordPress export file. I realize this is a little out of the norm but, its more of backup project, so we can easily pull up one of these articles if we need it in the futre. The WP site that they were on needs to come down.

The issue I am having is that some of the nodes in the XML file are empty or contain useless values (ie. Not full posts). I need to add some string length conditions but, I'm not sure how to check for each one.

<?php 

$path_to_xml_file = 'compress.zlib://wordpress.2011.xml.gz';


$reader = new XMLReader();
                $reader->open($path_to_xml_file);
                while($reader->read())
                {
                        if($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'item')
                        {
                                        $doc = new DOMDocument('1.0', 'UTF-8');
                                        $xml = simplexml_import_dom($doc->importNode($reader->expand(),true));
                                        //echo $xml->title; //or whatever

// Take care of the articles
$newcontent = $xml->children('http://purl.org/rss/1.0/modules/content/');
$contentString = $newcontent->encoded;
$titleString = $xml->title;

    echo '
    <div class="article-container" id="article-' .  $xml->title . '">
    <a href="#top" class="top-link">Back to the Top</a>
        <h2>' .  $xml->title . '</h2>
        <div class="articles">' . $newcontent->encoded . '</div>
    </div>';
                        }
                }

?>

I was able to successfully check this with just simpleXML but, it was too much of a memory hog all by itself. This was my simplexml code:

<?php 

    $url = 'wordpress.2011.xml.gz';
    $xml = new SimpleXMLElement("compress.zlib://$url", NULL, TRUE);

    foreach ($xml->item as $item) :

    $newcontent = $item->children('http://purl.org/rss/1.0/modules/content/');

    ?>

<?php
$contentString = $newcontent->encoded;
$titleString = $item->title;

if ((strlen($contentString) < 13) || (strlen($titleString) < 5))  {
    echo '';
} else {
    echo '
    <div class="article-container" id="article-' .  $item->title . '">
    <a href="#top" class="top-link">Back to the Top</a>
        <h2>' .  $item->title . '</h2>
        <div class="articles">' . $newcontent->encoded . '</div>
    </div>';
}
?>



 <?php endforeach; ?>

UPDATE

With Francis' help, it is working now. Here is the code:

<?php 

$path_to_xml_file = 'compress.zlib://wordpress.2011.xml.gz';

$reader = new XMLReader();
$reader->open($path_to_xml_file);
$contentNS = 'http://purl.org/rss/1.0/modules/content/';
while($reader->read()) {
    if($reader->nodeType == XMLReader::ELEMENT and $reader->name == 'item') {
        $doc = new DOMDocument('1.0','UTF-8');
        $xml = simplexml_import_dom($doc->importNode($reader->expand(), true));
        $titleString = (string) $xml->title;
        $contentString = (string) $xml->children($contentNS)->encoded;
        if (strlen($contentString) > 12 and strlen($titleString) > 4)  {
            // Be careful with your output escaping!
            // This below looks like it might be wrong:
            // - $titleString for an ID (use slug)
            // - $titleString not escaped
            // - $contentString should be escaped? not sure here.
            // Have you considered using XMLWriter()?
            echo '
<div class="article-container" id="article-' .  $titleString . '">
    <a href="#top" class="top-link">Back to the Top</a>
    <h2>' .  $titleString . '</h2>
    <div class="articles">' . $contentString . '</div>
</div>';
        } else {

        echo'';

        }

        $reader->next(); //skip the subtrees, go to next item sibling
        // we already expand()ed this so we don't need to walk it.
    }
}

?>

解决方案

When you say $contentString = $newcontent->encoded, the type of $contentString is not string but SimpleXMLElement. Thus strlen() is returning something nonsensical.

You need to explicitly cast SimpleXMLElements to string to get the text value of the element:

$contentString = (string) $newcontent->encoded;

As an aside, you can simplify your DOM expansion and conversion to SimpleXMLElement by using the optional argument to XMLReader::expand():

$sxe = simplexml_import_dom($reader->expand(new DOMDocument('1.0','UTF-8')));

EDIT with a complete example of your first code block written to do what you want (I think?) As you can see all I did was take the inner loop from your second code example and put it in the inner loop in your first code example.

$reader = new XMLReader();
$reader->open($path_to_xml_file);
$contentNS = 'http://purl.org/rss/1.0/modules/content/';
while($reader->read()) {
    if($reader->nodeType == XMLReader::ELEMENT and $reader->name == 'item') {
        $xml = simplexml_import_dom($reader->expand(new DOMDocument('1.0', 'UTF-8')));
        $titleString = (string) $xml->title;
        $contentString = (string) $xml->children($contentNS)->encoded;
        if (strlen($contentString) > 12 and strlen($titleString) > 4)  {
            // Be careful with your output escaping!
            // This below looks like it might be wrong:
            // - $titleString for an ID (use slug)
            // - $titleString not escaped
            // - $contentString should be escaped? not sure here.
            // Have you considered using XMLWriter()?
            echo '
<div class="article-container" id="article-' .  $titleString . '">
    <a href="#top" class="top-link">Back to the Top</a>
    <h2>' .  $titleString . '</h2>
    <div class="articles">' . $contentString . '</div>
</div>';
        }
        $reader->next(); //skip the subtrees, go to next item sibling
        // we already expand()ed this so we don't need to walk it.
    }
}

这篇关于XMLReader &amp;simpleXML Combo,带条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
PHP最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆