如何使用Windows批处理提取特定XML标记属性的所有实例 [英] How to extract all instances of a specific XML tag attribute using Windows batch

查看:358
本文介绍了如何使用Windows批处理提取特定XML标记属性的所有实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个XML文件,我需要提取

I have an XML file and I need to extract

testname

来自

<con:testSuite name="testname" 

在XML文件中.

我不太确定该如何处理,或者不确定是否可以批量处理.

I am not quite sure how to approach this, or whether this is possible in batch.

这是我到目前为止的想法:

Here is what I have thought so far:

1)使用FINDSTR并存储具有

1) Use FINDSTR and store every line that has

<con:testSuite name=

在变量或临时文件中,如下所示:

in a variable or a temporary file, like this:

FINDSTR /C:"<con:testSuite name=" file.xml > tests.txt

2)以某种方式使用该文件或变量来提取字符串

2) Somehow use that file or variable to extract the strings

请注意,同一行中可能有多个匹配字符串的实例.

Note that there might be more than one instance of the matching string in the same line.

我是新手,可以为您提供任何帮助.

I am a novice at batch and any help is appreciated.

推荐答案

使用批处理解析XML非常麻烦.批处理不是一个好的文本处理器.但是,通常需要付出一些努力才能从给定的XML文件中提取所需的数据.但是输入文件很容易重新排列成等效的有效XML格式,这会破坏您的解析器.

Parsing XML is very painful with batch. Batch is not a good text processor to begin with. However, with some amount of effort you can usually extract the data you want from a given XML file. But the input file could easily be rearranged into an equivalent valid XML form that will break your parser.

有了这个免责声明...

With that disclaimer out of the way...

这是本地批处理解决方案

Here is a native batch solution

@echo off
setlocal disableDelayedExpansion
set input="test.xml"
set output="names.txt"

if exist %output% del %output%
for /f "delims=" %%A in ('findstr /n /c:"<con:testSuite name=" %input%') do (
  set "ln=%%A"
  setlocal enableDelayedExpansion
  call :parseLine
  endlocal
)
type %output%
exit /b

:parseLine
set "ln2=!ln:*<con:testSuite name=!"
if "!ln2!"=="!ln!" exit /b
for /f tokens^=2^ delims^=^" %%B in ("!ln2!") do (
  setlocal disableDelayedExpansion
  >>%output% echo(%%B
  endlocal
)
set "ln=!ln2!"
goto :parseLine

使用FINDSTR /N选项只是为了确保没有任何行以;开头,因此我们不必担心讨厌的默认FOR"EOL"选项.

The FINDSTR /N option is only there to guarantee that no line begins with a ; so that we don't have to worry about the pesky default FOR "EOL" option.

打开和关闭延迟扩展的切换是为了保护输入文件中可能包含的所有!字符.如果您知道!从未出现在输入中,则只需在顶部的setlocal enableDelayedExpansion并删除所有其他setlocalendlocal命令.

The toggling of delayed expansion on and off is to protect any ! characters that may be in the input file. If you know that ! never appears in the input, then you can simply setlocal enableDelayedExpansion at the top and remove all other setlocal and endlocal commands.

最后一个FOR/F使用特殊的转义序列来允许将双引号指定为DELIM字符.

The last FOR /F uses special escape sequences to enable the specification of a double quote as a DELIM character.

在评论中回答其他问题

您不能简单地将附加约束放在现有的FINDSTR命令中,因为它将返回具有匹配项的整行.记住,您自己说过,同一行中可能有不止一个匹配字符串的实例" .名字可能以正确的前缀开头,而同一行上的第二个名字可能不以正确的前缀开头.您只想保留一个启动正确的启动子即可.

You cannot simply put the additional constraint in the existing FINDSTR command because it will return the entire line that has a match. Remember you said yourself, "there might be more than one instance of the matching string in the same line". The first name might start with the correct prefix, and the 2nd name on the same line might not. You only want to keep the one that starts appropriately.

一种解决方案是简单地如下更改echo(%%B >>%output%行:

One solution is to simply change the echo(%%B >>%output% line as follows:

echo(%%B|findstr "^lp_" >>%output%

FINDSTR使用正则表达式元字符^来指定字符串必须以lp_开头.此时引号已被删除,因此我们不必担心它们.

The FINDSTR is using a regular expression meta-character ^ to specify that the string must start with lp_. The quotes have already been removed at this point, so we don't have to worry about them.

但是,将来可能会遇到必须在搜索字符串中包含"的情况.另外,在初始FINDSTR中包含lp_屏幕可能会稍快一些,以免不必要地调用:parseLine.

However, you may run into a situation in the future where you must include " in your search string. Plus it might be marginally faster to include the lp_ screen in the initial FINDSTR so that :parseLine is not called unnecessarily.

FINDSTR要求使用反斜杠对搜索字符串双引号进行转义.但是Windows CMD处理器也有自己的转义规则.像>这样的特殊字符需要加引号或转义.原始代码使用了引号,但是您想在字符串中包含引号,这会在命令中创建不平衡的引号. Windows批处理通常喜欢成对引用.对于CMD,必须至少将引号之一转义为^".如果CMD和FINDSTR的引号都需要转义,则它看起来像\^".

FINDSTR requires that search string double quotes are escaped with a back slash. But the Windows CMD processor also has its own rules for escaping. Special characters like > need to be either quoted or escaped. The original code used quotes, but you want to include a quote in the string, and that creates unbalanced quotes in your command. Windows batch generally likes quotes in pairs. At least one of the quotes must be escaped for CMD as ^". If the quote needs to be escaped for both CMD and FINDSTR, then it looks like \^".

但是,从CMD角度来看,字符串中不再有功能引号的任何特殊字符也必须使用^进行转义.

But any special characters within the string that are no longer functionally quoted from a CMD perspective must be escaped using ^ as well.

这是一种转义所有特殊字符的解决方案.看起来很糟糕,也很令人困惑.

Here is one solution that escapes all special characters. It looks awful and is very confusing.

for /f "delims=" %%A in ('findstr /n /c:^"^<con:testSuite^ name^=\^"lp_^" %input%') do (

这是另一个看起来更好的解决方案,但是仍然难以跟踪CMD所逃避的内容以及FINDSTR所逃避的内容.

Here is another solution that looks much better, but it is still confusing to keep track of what is escaped for CMD and what is escaped for FINDSTR.

for /f "delims=" %%A in ('findstr /n /c:"<con:testSuite name=\"lp_^" %input%') do (

使事情更简单的一种方法是将搜索转换为正则表达式.可以使用[\"\"]搜索单引号.这是一个与引号或引号匹配的字符类表达式-我知道这很愚蠢.但是它使报价保持配对,以便CMD满意.现在,您不必担心为CMD转义任何字符,而您可以专注于正则表达式搜索字符串.

One way to keep things a bit simpler is to convert the search into a regular expression. A single double quote can be searched using [\"\"]. It is a character class expression that matches either a quote or a quote - silly I know. But it keeps quotes paired so that CMD is happy. Now you don't have to worry about escaping any characters for CMD, and you can concentrate on the regex search string.

for /f "delims=" %%A in ('findstr /nr /c:"<con:testSuite name=[\"\"]lp_" %input%') do (

这篇关于如何使用Windows批处理提取特定XML标记属性的所有实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆