从一堆xml文件中特定正则表达式模式的搜索结果中获取xpath [英] Get xpath from search result of a specific regex pattern in a bunch of xml files

查看:63
本文介绍了从一堆xml文件中特定正则表达式模式的搜索结果中获取xpath的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多 XML 文件,我必须在这些文件中搜索一个字符串(详细地说,这将是一个不太复杂的正则表达式).

I have many XML files, and i have to search in these files a string (in detail that will be a not-too-complicated regex).

根据结果,我想获得字符串所在节点的 xpath,即:

With the results i want to get the xpath of the node in which the string is, i.e.:

pattern = /home|house/

files: file1.xml, file2.xml etc

结果:

"home" in file1.xml, xpath: //root/cars/car[2]
"house" in file2.xml, xpath: //root[1]/elemA[2][@attribute1='first']

我怎样才能做到这一点?我可以使用 PHP、python、Javascript、VIM 插件(因为我已经使用过这些)

How can i achieve this? I can use PHP, python, Javascript, VIM plugin (because i already worked with those)

推荐答案

在 PHP 中:glob XML 文件,xpath 所有节点,preg_match_all 他们的文本,如果匹配,使用 getNodePath() 并输出:

In PHP: glob the XML files, xpath all nodes, preg_match_all their text and if matches, get the nodes' xpath with getNodePath() and output it:

$pattern = '/home|house|guide/iu';

foreach (glob('data/*.xml') as $file)
{
    foreach (simplexml_load_file($file)->xpath('//*') as $node)
    {
        if (!preg_match_all($pattern, $node, $matches)) continue;

        printf(
            "\"%s\" in %s, xpath: %s\n", implode('", "', $matches[0]),
            basename($file), dom_import_simplexml($node)->getNodePath()
        );
    }
}

结果(示例):

"Guide" in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[158]/*[4]
"Guide" in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[224]/*[2]
"Guide" in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[224]/*[4]
"guide" in rdf-dmoz.xml, xpath: /*/*[4]/d:Description
"guide" in rdf-dmoz.xml, xpath: /*/*[5]/d:Description

顺便提一下好问题.

这篇关于从一堆xml文件中特定正则表达式模式的搜索结果中获取xpath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆