使用DOM和XPath从站点地图文件中删除节点 [英] Use DOM and XPath to remove a node from a sitemap file

查看:101
本文介绍了使用DOM和XPath从站点地图文件中删除节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试开发一个从我的站点地图文件中删除某些URL节点的功能。这是我到目前为止。

I am trying to develop a function that removes certain URL nodes from my sitemap file. Here is what I have so far.

$xpath = new DOMXpath($DOMfile);
$elements = $xpath->query("/urlset/url/loc[contains(.,'$pageUrl')]");
echo count($elements);
foreach($elements as $element){
    //this is where I want to delete the URL
    echo $element;
    echo "here".$element->nodeValue;
}

哪些输出111111。我不知道为什么我不能在foreach循环中回显一个字符串,如果$ elements count是'1'。

Which outputs "111111". I don't know why I can't echo a string in a foreach loop if the $elements count is '1'.

到目前为止,我已经做

$urls = $dom->getElementsByTagName( "url" );
foreach( $urls as $url ){
    $locs = $url->getElementsByTagName( "loc" );
    $loc = $locs->item(0)->nodeValue;
    echo $loc;
    if($loc == $fullPageUrl){
                   $removeUrl = $dom->removeChild($url);                
    }
}

如果我的站点地图不是这样,哪个可以正常工作大。现在它超时了,所以我希望使用xpath查询会更快。

Which would work fine if my sitemap wasn't so big. It times out right now, so I'm hoping using xpath queries will be faster.

在Gordon的评论之后,我尝试了:

After Gordon's comment, I tried:

$xpath = new DOMXpath($DOMfile);
$query = sprintf('/urlset/url[./loc = "%d"]', $pageUrl);
foreach($xpath->query($query) as $element) {
    //this is where I want to delete the URL
    echo $element;
    echo "here".$element->nodeValue;
}

它不返回任何东西。

我尝试进一步,使用键盘,使用在其他帖子中使用的内容,并做到这一点:

I tried going a step further and used codepad, using what was used in the other post mentioned, and did this:

<?php error_reporting(-1);
$xml = <<< XML <?xml version="1.0"
encoding="UTF-8" ?> <url>
<loc>professional_services</loc>
<loc>5professional_services</loc>
<loc>6professional_services</loc> 
</url> XML; 
$id = '5professional_services'; 
$dom = new DOMDocument; $dom->loadXML($xml);
$xpath = new DOMXPath($dom); $query = sprintf('/url/[loc = $id]');
foreach($xpath->query($query) as $record) {
     $record->parentNode->removeChild($record);
}
echo $dom->saveXml();

我收到一个Warning:DOMXPath :: query():Invalid expressionat the foreach循环线。感谢您对urlset的其他评论,我一定会在我的代码中加入双斜线,尝试并且没有返回。

and I'm getting a "Warning: DOMXPath::query(): Invalid expression" at the foreach loop line. Thanks for the other comment on the urlset, I'll be sure to include the double slashes in my code, tried it and it returned nothing.

推荐答案

来自站点地图的XML应该是:

XML from a sitemap should be :

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc></loc>
...
</url>
<url>
<loc></loc>
...
</url>
...
</urlset>

由于它有一个命名空间,所以查询比我以前的答案要复杂一些:

Since it got a namespace, the query is a little more complicated than my previous answer :

$xpath = new DOMXpath($DOMfile);
// Here register your namespace with a shortcut
$xpath->registerNamespace('sm', "http://www.sitemaps.org/schemas/sitemap/0.9");
// this request should work
$elements = $xpath->query('/sm:urlset/sm:url[sm:loc = "'.$pageUrl.'"]');

foreach($elements as $element){
    // This is a hint from the manual comments
    $element->parentNode->removeChild($element);
}
echo $DOMfile->saveXML();

我在睡觉前写下了记忆。如果不行,明天早上我会去测试。 (是的,我知道它可以带来一些downvote)

如果你没有命名空间(你应该但不是义务

If you don't have a namespace (you should but that's not an obligation sigh)

$elements = $xpath->query('/urlset/url[loc = "'.$pageUrl.'"]');

你有一个具体的例子,它在这里工作: http://codepad.org/vuGl1MAc

You got a concrete example that it's working here : http://codepad.org/vuGl1MAc

这篇关于使用DOM和XPath从站点地图文件中删除节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆