如何使用批处理脚本获取子字段的内容? [英] How can I get the content of the subfield with batch script?

查看:71
本文介绍了如何使用批处理脚本获取子字段的内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下xml:

<datafield tag="007G">
    <subfield code="c">GBV</subfield>
    <subfield code="0">688845614</subfield>
  </datafield>

,我尝试提取<subfield code="0" 688845614

and I try to extract the content of the <subfield code="0" 688845614

这是我的代码:

@echo off
for /F "tokens=2 delims=>/<" %%i in ('findstr "007G" curlread.txt') do echo %%i
pause

但作为输出,我只会得到<datafield tag="007G">

but as output I only get <datafield tag="007G">

xml文档中可能有很多<datafield tag="007G">,我需要从每个文档中获取<subfield code="0".

There could be many <datafield tag="007G"> in the xml doc and I need to get <subfield code="0" from every of it.

推荐答案

最好将结构化标记语言解析为分层数据,而不是将其抓取为纯文本.

It's always better to parse structured markup language as hierarchical data, rather than as flat text to scrape.

要仅从第一个<subfield code="0">节点返回数据,请替换您的findstr命令,如下所示:

To return the data from only the first <subfield code="0"> node, replace your findstr command as follows:

powershell "([xml](gc curlread.txt)).selectSingleNode('//subfield[@code=0]/text()').data"

如果您将有多个<subfield code="0">节点,并且想要所有节点中的数据,则

If you will have multiple <subfield code="0"> nodes and you want the data from all of them, then

powershell "([xml](gc curlread.txt)).selectNodes('//subfield[@code=0]/text()') | %%{ $_.data }"

XPath获胜.您还可以通过修改XPath选择器,仅指定<datafield tag="007G">子节点的<subfield code="0">节点:

XPath for the win. You can also specify only <subfield code="0"> nodes that are children of <datafield tag="007G"> by modifying the XPath selector like this:

//datafield[@tag=\"007G\"]/subfield[@code=0]/text()

重要提示:XPath中的引号必须以反斜杠转义.

Important: Quotation marks in the XPath must be backslash escaped.

编辑:鉴于您在以下注释中粘贴的XML:

Given the XML you pasted in your comment below:

<datafield tag="007G">
    <subfield code="c">GBV</subfield>
    <subfield code="0">688845614</subfield>
</datafield>
<datafield tag="008G">
    <subfield code="c">GBV</subfield>
    <subfield code="0">68614</subfield>
</datafield>

...请注意,这不是完全有效的XML.有效的XML具有单个层次结构根.在解析数据之前,您必须使用根标记将其括起来.

... be advised that that is not fully valid XML. Valid XML has a single hierarchical root. Before your data can be parsed, you'll have to enclose it with a root tag.

这是如何执行此操作的示例:

Here's an example of how to do that:

@echo off & setlocal

set "xml=curlread.xml"
rem // Note that quotation marks in the XPath must be backslash escaped
set "xpath=//datafield[@tag=\"007G\"]/subfield[@code=0]/text()"

for /f "delims=" %%I in (
    'powershell "([xml]('<r>{0}</r>' -f (gc %xml%))).selectNodes('%xpath%') | %%{$_.data}"'
) do (
    set "subfield=%%I"

    setlocal enabledelayedexpansion
    echo something useful with !subfield!
    endlocal
)
pause
goto :EOF

这篇关于如何使用批处理脚本获取子字段的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆