DOS Batch:处理XML文件中的双引号 [英] DOS Batch : dealing with double quotes from XML files

查看:263
本文介绍了DOS Batch:处理XML文件中的双引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经编写了以下代码,以读取XML文件(file_1.xml和file_2.xml),并提取标记之间的字符串并将其写入TXT文件.问题是某些字符串包含双引号,然后程序将这些字符作为正确的指令(而不是字符串的一部分)...

I have written the code below to read XML files (file_1.xml and file_2.xml) and to extract the string between tags and to write it down into a TXT file. The issue is that some strings include double quotation marks and the program then takes these characters as being proper instructions (not part of the strings)...

file_1.xml的内容:

Content of file_1.xml :

<AAA>C086002-T1111</AAA>
<AAA>C086002-T1222 </AAA>
<AAA>C086002-TR333 "</AAA>
<AAA>C086002-T5444  </AAA>

file_2.xml的内容:

Content of file_2.xml :

<AAA>C086002-T5555 </AAA>
<AAA>C086002-T1666</AAA>
<AAA>C086002-T1777 "</AAA>
<AAA>C086002-T1888          "</AAA>

我的代码:

@echo off

setlocal enabledelayedexpansion

for /f "delims=;" %%f in ('dir /b D:\depart\*.xml') do (

    for /f "usebackq delims=;" %%z in ("D:\depart\%%f") do (

        (for /f "delims=<AAA></AAA> tokens=2" %%a in ('echo "%%z" ^| Findstr /r "<AAA>"') do (

            set code=%%a
            set code=!code:""=!
            set code=!code: =!
            echo !code!

        )) >> result.txt
    )
)

我在result.txt中得到它:

I get this in result.txt :

C086002-T1111
C086002-T1222
C086002-T5444
C086002-T5555
C086002-T1666

实际上,缺少8行中的3行.这些行包含双引号,或跟随包含双引号的行...

In fact, 3 out of the 8 lines are missing. These lines include double quotation marks or follow lines that include double quotation marks...

如何处理这些字符并将其视为字符串的一部分?

How can I deal with these characters and consider them as parts of the strings ?

推荐答案

请注意-用批处理解析XML是一项冒险的工作,因为XML通常会忽略空白.只需将XML重新格式化为另一种等效的有效格式,就可能破坏您编写的任何脚本.话虽这么说...

Please note - parsing XML with batch is a risky business because XML generally ignores white space. Any script you write could probably be broken by simply reformatting the XML into another equivalent valid form. That being said...

我没有通过完整地追踪问题来完全解释您观察到的行为,但是不平衡的报价导致此行出现问题:

I haven't traced the problem through to fully explain your observed behavior, but the unbalanced quote is causing a problem with this line:

(for /f "delims=<AAA></AAA> tokens=2" %%a in ('echo "%%z" ^| Findstr /r "<AAA>"') do (

您可以消除该问题,并通过预先消除所有引号来使代码正常工作.

You can eliminate that problem and get your code to sort of work by eliminating any quotes before-hand.

@echo off

setlocal enabledelayedexpansion
del result.txt
for /f "delims=;" %%f in ('dir /b D:\depart\*.xml') do (
  for /f "usebackq delims=;" %%z in ("D:\depart\%%f") do (
    set code=%%z
    set code=!code:"=!
    set code=!code: =!
    (for /f "delims=<AAA></AAA> tokens=2" %%a in ('echo "!code!" ^| Findstr /r "<AAA>"') do (
      echo %%a
    )) >> result.txt
  )
)

但是您有潜在的重大问题. DELIMS不指定字符串-它指定字符列表.因此,您的DELIMS=<AAA></AAA>等同于DELIMS=<>/A.如果您的元素值中包含A或/,则您的代码将失败.

But you have a potential major problem. DELIMS does not specify a string - it specifies a list of characters. So your DELIMS=<AAA></AAA> is equivalent to DELIMS=<>/A. If your element value ever has an A or / in it, then your code will fail.

还有一个更好的方法:

首先,您可以使用FINDSTR一次收集所有文件中的所有<AAA>----</AAA>行,而不会产生任何循环:

First off, you can use FINDSTR to collect all your <AAA>----</AAA> lines from all files in one pass, without any loop:

findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"

每条匹配行将作为文件路径输出,后跟一个冒号,然后是匹配行,如:

Each matching line will be output as the file path, followed by a colon, followed by the matching line, as in:

D:\depart\file_1.xml:<AAA>C086002-T1111</AAA>

文件路径永远不能包含<>,因此您可以使用以下内容来迭代结果,并捕获相应的令牌:

The file path can never contain <, or >, so you can use the following to iterate the result, capturing the appropriate token:

for /f "delims=<> tokens=3" %%A in ( ...

最后,您可以在整个循环中加上括号,然后仅重定向一次.我假设您希望每次运行都创建一个新文件,所以我使用>而不是>>.

Finally, you can put parentheses around the entire loop, and redirect just once. I'm assuming you want each run to create a new file, so I use > instead of >>.

@echo off
setlocal enabledelayedexpansion
>result.txt (
  for /f "delims=<> tokens=3" %%A in (
    'findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"''
  ) do (
    set code=%%A
    set code=!code:"=!
    set code=!code: =!
    echo(!code!
)

假设您只需要修剪前导或尾随空格/引号,那么解决方案就更简单了.它确实需要奇数语法才能将引号指定为DELIM字符.请注意,最后一个^%%B之间有两个空格.第一个转义的空格被当作DELIM字符.未转义的空格终止FOR/F options字符串.

Assuming that you only need to trim leading or trailing spaces/quotes, then the solution is even simpler. It does require odd syntax to specify a quote as a DELIM character. Note that there are two spaces between the last ^ and %%B. The first escaped space is taken as a DELIM character. The unescaped space terminates the FOR /F options string.

@echo off
>result.txt (
  for /f "delims=<> tokens=3" %%A in (
    'findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"'
  ) do for /f delims^=^"^  %%B in ("%%A") do echo(%%B
)

更新以回复评论

我假设您的数据值将永远不包含冒号.

I'm assuming your data value will never contain a colon.

如果要将源文件名附加到输出的每一行,则只需更改第一个FOR/F即可捕获第一个标记(源文件)和第三个标记(数据值).该文件将包含完整路径以及结尾的冒号.第二个FOR/F使用~nx修饰符将文件追加到源数据字符串中,以仅获取名称和扩展名(无驱动器或路径),并且在DELIMS选项中添加了一个冒号,以便删除尾随的冒号.

If you want to append source file name to each line of output, then you simply need to alter the first FOR /F to capture the first token (the source file) as well as the third token (the data value). The file will contain the full path as well as a trailing colon. The second FOR /F appends the file to the source data string using the ~nx modifier to get just the name and extension (no drive or path), and a colon is added to the DELIMS option so the trailing colon is trimmed off.

@echo off
>result.txt (
  for /f "delims=<> tokens=1,3" %%A in (
    'findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"'
  ) do for /f delims^=:^"^  %%C in ("%%B;%%~nxA") do echo %%C
)

这篇关于DOS Batch:处理XML文件中的双引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆