如何使用批处理脚本获取子字段的内容? [英] How can I get the content of the subfield with batch script?
问题描述
我有以下xml:
<datafield tag="007G">
<subfield code="c">GBV</subfield>
<subfield code="0">688845614</subfield>
</datafield>
,我尝试提取<subfield code="0"
688845614
and I try to extract the content of the <subfield code="0"
688845614
这是我的代码:
@echo off
for /F "tokens=2 delims=>/<" %%i in ('findstr "007G" curlread.txt') do echo %%i
pause
但作为输出,我只会得到<datafield tag="007G">
but as output I only get <datafield tag="007G">
xml文档中可能有很多<datafield tag="007G">
,我需要从每个文档中获取<subfield code="0"
.
There could be many <datafield tag="007G">
in the xml doc and I need to get <subfield code="0"
from every of it.
推荐答案
最好将结构化标记语言解析为分层数据,而不是将其抓取为纯文本.
It's always better to parse structured markup language as hierarchical data, rather than as flat text to scrape.
要仅从第一个<subfield code="0">
节点返回数据,请替换您的findstr
命令,如下所示:
To return the data from only the first <subfield code="0">
node, replace your findstr
command as follows:
powershell "([xml](gc curlread.txt)).selectSingleNode('//subfield[@code=0]/text()').data"
如果您将有多个<subfield code="0">
节点,并且想要所有节点中的数据,则
If you will have multiple <subfield code="0">
nodes and you want the data from all of them, then
powershell "([xml](gc curlread.txt)).selectNodes('//subfield[@code=0]/text()') | %%{ $_.data }"
XPath获胜.您还可以通过修改XPath选择器,仅指定<datafield tag="007G">
子节点的<subfield code="0">
节点:
XPath for the win. You can also specify only <subfield code="0">
nodes that are children of <datafield tag="007G">
by modifying the XPath selector like this:
//datafield[@tag=\"007G\"]/subfield[@code=0]/text()
重要提示:XPath中的引号必须以反斜杠转义.
Important: Quotation marks in the XPath must be backslash escaped.
编辑:鉴于您在以下注释中粘贴的XML:
Given the XML you pasted in your comment below:
<datafield tag="007G">
<subfield code="c">GBV</subfield>
<subfield code="0">688845614</subfield>
</datafield>
<datafield tag="008G">
<subfield code="c">GBV</subfield>
<subfield code="0">68614</subfield>
</datafield>
...请注意,这不是完全有效的XML.有效的XML具有单个层次结构根.在解析数据之前,您必须使用根标记将其括起来.
... be advised that that is not fully valid XML. Valid XML has a single hierarchical root. Before your data can be parsed, you'll have to enclose it with a root tag.
这是如何执行此操作的示例:
Here's an example of how to do that:
@echo off & setlocal
set "xml=curlread.xml"
rem // Note that quotation marks in the XPath must be backslash escaped
set "xpath=//datafield[@tag=\"007G\"]/subfield[@code=0]/text()"
for /f "delims=" %%I in (
'powershell "([xml]('<r>{0}</r>' -f (gc %xml%))).selectNodes('%xpath%') | %%{$_.data}"'
) do (
set "subfield=%%I"
setlocal enabledelayedexpansion
echo something useful with !subfield!
endlocal
)
pause
goto :EOF
这篇关于如何使用批处理脚本获取子字段的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!