DOS Batch:处理XML文件中的双引号 [英] DOS Batch : dealing with double quotes from XML files
问题描述
我已经编写了以下代码,以读取XML文件(file_1.xml和file_2.xml),并提取标记之间的字符串并将其写入TXT文件.问题是某些字符串包含双引号,然后程序将这些字符作为正确的指令(而不是字符串的一部分)...
I have written the code below to read XML files (file_1.xml and file_2.xml) and to extract the string between tags and to write it down into a TXT file. The issue is that some strings include double quotation marks and the program then takes these characters as being proper instructions (not part of the strings)...
file_1.xml的内容:
Content of file_1.xml :
<AAA>C086002-T1111</AAA>
<AAA>C086002-T1222 </AAA>
<AAA>C086002-TR333 "</AAA>
<AAA>C086002-T5444 </AAA>
file_2.xml的内容:
Content of file_2.xml :
<AAA>C086002-T5555 </AAA>
<AAA>C086002-T1666</AAA>
<AAA>C086002-T1777 "</AAA>
<AAA>C086002-T1888 "</AAA>
我的代码:
@echo off
setlocal enabledelayedexpansion
for /f "delims=;" %%f in ('dir /b D:\depart\*.xml') do (
for /f "usebackq delims=;" %%z in ("D:\depart\%%f") do (
(for /f "delims=<AAA></AAA> tokens=2" %%a in ('echo "%%z" ^| Findstr /r "<AAA>"') do (
set code=%%a
set code=!code:""=!
set code=!code: =!
echo !code!
)) >> result.txt
)
)
我在result.txt中得到它:
I get this in result.txt :
C086002-T1111
C086002-T1222
C086002-T5444
C086002-T5555
C086002-T1666
实际上,缺少8行中的3行.这些行包含双引号,或跟随包含双引号的行...
In fact, 3 out of the 8 lines are missing. These lines include double quotation marks or follow lines that include double quotation marks...
如何处理这些字符并将其视为字符串的一部分?
How can I deal with these characters and consider them as parts of the strings ?
推荐答案
请注意-用批处理解析XML是一项冒险的工作,因为XML通常会忽略空白.只需将XML重新格式化为另一种等效的有效格式,就可能破坏您编写的任何脚本.话虽这么说...
Please note - parsing XML with batch is a risky business because XML generally ignores white space. Any script you write could probably be broken by simply reformatting the XML into another equivalent valid form. That being said...
我没有通过完整地追踪问题来完全解释您观察到的行为,但是不平衡的报价导致此行出现问题:
I haven't traced the problem through to fully explain your observed behavior, but the unbalanced quote is causing a problem with this line:
(for /f "delims=<AAA></AAA> tokens=2" %%a in ('echo "%%z" ^| Findstr /r "<AAA>"') do (
您可以消除该问题,并通过预先消除所有引号来使代码正常工作.
You can eliminate that problem and get your code to sort of work by eliminating any quotes before-hand.
@echo off
setlocal enabledelayedexpansion
del result.txt
for /f "delims=;" %%f in ('dir /b D:\depart\*.xml') do (
for /f "usebackq delims=;" %%z in ("D:\depart\%%f") do (
set code=%%z
set code=!code:"=!
set code=!code: =!
(for /f "delims=<AAA></AAA> tokens=2" %%a in ('echo "!code!" ^| Findstr /r "<AAA>"') do (
echo %%a
)) >> result.txt
)
)
但是您有潜在的重大问题. DELIMS不指定字符串-它指定字符列表.因此,您的DELIMS=<AAA></AAA>
等同于DELIMS=<>/A
.如果您的元素值中包含A或/,则您的代码将失败.
But you have a potential major problem. DELIMS does not specify a string - it specifies a list of characters. So your DELIMS=<AAA></AAA>
is equivalent to DELIMS=<>/A
. If your element value ever has an A or / in it, then your code will fail.
还有一个更好的方法:
首先,您可以使用FINDSTR一次收集所有文件中的所有<AAA>----</AAA>
行,而不会产生任何循环:
First off, you can use FINDSTR to collect all your <AAA>----</AAA>
lines from all files in one pass, without any loop:
findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"
每条匹配行将作为文件路径输出,后跟一个冒号,然后是匹配行,如:
Each matching line will be output as the file path, followed by a colon, followed by the matching line, as in:
D:\depart\file_1.xml:<AAA>C086002-T1111</AAA>
文件路径永远不能包含<
或>
,因此您可以使用以下内容来迭代结果,并捕获相应的令牌:
The file path can never contain <
, or >
, so you can use the following to iterate the result, capturing the appropriate token:
for /f "delims=<> tokens=3" %%A in ( ...
最后,您可以在整个循环中加上括号,然后仅重定向一次.我假设您希望每次运行都创建一个新文件,所以我使用>
而不是>>
.
Finally, you can put parentheses around the entire loop, and redirect just once. I'm assuming you want each run to create a new file, so I use >
instead of >>
.
@echo off
setlocal enabledelayedexpansion
>result.txt (
for /f "delims=<> tokens=3" %%A in (
'findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"''
) do (
set code=%%A
set code=!code:"=!
set code=!code: =!
echo(!code!
)
假设您只需要修剪前导或尾随空格/引号,那么解决方案就更简单了.它确实需要奇数语法才能将引号指定为DELIM字符.请注意,最后一个^
和%%B
之间有两个空格.第一个转义的空格被当作DELIM字符.未转义的空格终止FOR/F options字符串.
Assuming that you only need to trim leading or trailing spaces/quotes, then the solution is even simpler. It does require odd syntax to specify a quote as a DELIM character. Note that there are two spaces between the last ^
and %%B
. The first escaped space is taken as a DELIM character. The unescaped space terminates the FOR /F options string.
@echo off
>result.txt (
for /f "delims=<> tokens=3" %%A in (
'findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"'
) do for /f delims^=^"^ %%B in ("%%A") do echo(%%B
)
更新以回复评论
我假设您的数据值将永远不包含冒号.
I'm assuming your data value will never contain a colon.
如果要将源文件名附加到输出的每一行,则只需更改第一个FOR/F即可捕获第一个标记(源文件)和第三个标记(数据值).该文件将包含完整路径以及结尾的冒号.第二个FOR/F使用~nx
修饰符将文件追加到源数据字符串中,以仅获取名称和扩展名(无驱动器或路径),并且在DELIMS选项中添加了一个冒号,以便删除尾随的冒号.
If you want to append source file name to each line of output, then you simply need to alter the first FOR /F to capture the first token (the source file) as well as the third token (the data value). The file will contain the full path as well as a trailing colon. The second FOR /F appends the file to the source data string using the ~nx
modifier to get just the name and extension (no drive or path), and a colon is added to the DELIMS option so the trailing colon is trimmed off.
@echo off
>result.txt (
for /f "delims=<> tokens=1,3" %%A in (
'findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"'
) do for /f delims^=:^"^ %%C in ("%%B;%%~nxA") do echo %%C
)
这篇关于DOS Batch:处理XML文件中的双引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!