使用groovy更新xml文件时保留格式 [英] preserve formatting when updating xml file with groovy
问题描述
我有大量包含网址的XML文件。我正在编写一个groovy实用程序来查找每个URL并将其替换为更新后的版本。
给定example.xml:
<?xml version =1.0encoding =UTF-8?>
< page>
< content>
<节>
< link>
< url> / some / old / url< / url>
< / link>
< link>
< url> / some / old / url< / url>
< / link>
< / section>
<节>
< link>
< url>
/ a / different / old / url?with = specialChars& amp; amp; amp; amp; amp; ampped; escaped = true
< / url>
< / link>
< / section>
< / content>
< / page>
脚本运行后,example.xml应该包含:
<?xml version =1.0encoding =UTF-8?>
< page>
< content>
<节>
< link>
< url> / a / new / and / improved / url< / url>
< / link>
< link>
< url> / a / new / and / improved / url< / url>
< / link>
< / section>
<节>
< link>
< url>
/ a / different / new /和/ improved / url?with = specialChars& amp; ampEs& amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp;
< / link>
< / section>
< / content>
< / page>
使用groovy优秀的xml支持很容易,除了我想更改URL和其他文件
。我的意思是:
到目前为止,在尝试XmlParser,DOMBuilder,XmlNodePrinter,XmlUtil.serialize()等多种组合后,我已阅读每个文件一行一行,并应用xml实用程序和正则表达式的混合。
读取和写入每个文件:
files.each {档案档案 - >
def lineEnding = file.text.contains('\r\\\
')? '\r\\\
':'\\\
'
def newLineAtEof = file.text.endsWith(lineEnding)
def lines = file.readLines()
file.withWriter {w - >
lines.eachWithIndex {line,index - >
line = update(line)
w.write(line)
if(index< lines.size-1)w.write(lineEnding)
else if(newLineAtEof) w.write(lineEnding)
}
}
}
搜索并更新一行中的网址:
def matcher =(line =〜urlTagRegexp)//匹配< url>元素及其内容
matcher.each {groups - >
def urlNode = new XmlParser()。parseText(line)
def url = urlNode.text()
def newUrl = translate(url)
if(newUrl){
urlNode.value = newUrl
def replacement = nodeToString(urlNode)
line = matcher.replaceAll(替换)
}
}
def nodeToString (节点){
def writer = new StringWriter()
writer.withPrintWriter {printWriter - >
def printer = new XmlNodePrinter(printWriter)
printer.preserveWhitespace = true
printer.print(node)
}
writer.toString()。replaceAll(/ \r\\\
] /,'')
}
除非它不能处理一个标签分割成多行,并且在将文件写回去时搞乱换行是麻烦的。
我是groovy的新手,但我感觉像是必须有一个更加方便的方法来做到这一点。
//gist.github.com/akhikhl/8070808rel =noreferrer> https://gist.github.com/akhikhl/8070808 来演示如何使用Groovy和JDOM2完成这种转换。重要注意事项:
- Groovy技术上允许使用任何java库。如果使用Groovy JDK无法完成
操作,可以使用其他库完成。 应该明确包含 - jaxen库(实现XPath)(通过@Grab或通过maven / gradle),因为它是JDOM2的一个可选依赖项。
- @GrabExclude指令的顺序修复了jaxen对JDOM-1.0的离奇依赖。
- XPathFactory.compile还支持变量绑定和过滤器(请参阅联机javadoc)。
- XPathExpression(由编译器返回)支持两个主要函数 - evaluate和evaluateFirst。评估总是返回所有XML节点的列表,满足指定的谓词,而evaluateFirst只返回第一个匹配的XML节点。
更新
以下代码:
新的XMLOutputter()。带有{
format = Format.getRawFormat()
format.setLineSeparator(LineSeparator.NONE)
output(doc,System.out)
}
解决了保留空格和行分隔符的问题。 getRawFormat构造一个保留空格的格式对象。 LineSeparator.NONE指示格式对象,它不应该转换行分隔符。
上面提到的要点也包含这个新代码。
I have a large number of XML files that contain URLs. I'm writing a groovy utility to find each URL and replace it with an updated version.
Given example.xml:
<?xml version="1.0" encoding="UTF-8"?>
<page>
<content>
<section>
<link>
<url>/some/old/url</url>
</link>
<link>
<url>/some/old/url</url>
</link>
</section>
<section>
<link>
<url>
/a/different/old/url?with=specialChars&escaped=true
</url>
</link>
</section>
</content>
</page>
Once the script has run, example.xml should contain:
<?xml version="1.0" encoding="UTF-8"?>
<page>
<content>
<section>
<link>
<url>/a/new/and/improved/url</url>
</link>
<link>
<url>/a/new/and/improved/url</url>
</link>
</section>
<section>
<link>
<url>
/a/different/new/and/improved/url?with=specialChars&stillEscaped=true
</url>
</link>
</section>
</content>
</page>
This is easy to do using groovy's excellent xml support, except that I want to change the URLs and nothing else about the file.
By that I mean:
- whitespace must not change (files might contain spaces, tabs, or both)
- comments must be preserved
- windows vs. unix-style line separators must be preserved
- the xml declaration at the top must not be added or removed
- attributes in tags must retain their order
So far, after trying many combinations of XmlParser, DOMBuilder, XmlNodePrinter, XmlUtil.serialize(), and so on, I've landed on reading each file line-by-line and applying an ugly hybrid of the xml utilities and regular expressions.
Reading and writing each file:
files.each { File file ->
def lineEnding = file.text.contains('\r\n') ? '\r\n' : '\n'
def newLineAtEof = file.text.endsWith(lineEnding)
def lines = file.readLines()
file.withWriter { w ->
lines.eachWithIndex { line, index ->
line = update(line)
w.write(line)
if (index < lines.size-1) w.write(lineEnding)
else if (newLineAtEof) w.write(lineEnding)
}
}
}
Searching for and updating URLs within a line:
def matcher = (line =~ urlTagRegexp) //matches a <url> element and its contents
matcher.each { groups ->
def urlNode = new XmlParser().parseText(line)
def url = urlNode.text()
def newUrl = translate(url)
if (newUrl) {
urlNode.value = newUrl
def replacement = nodeToString(urlNode)
line = matcher.replaceAll(replacement)
}
}
def nodeToString(node) {
def writer = new StringWriter()
writer.withPrintWriter { printWriter ->
def printer = new XmlNodePrinter(printWriter)
printer.preserveWhitespace = true
printer.print(node)
}
writer.toString().replaceAll(/[\r\n]/, '')
}
This mostly works, except it can't handle a tag split over multiple lines, and messing with newlines when writing the files back out is cumbersome.
I'm new to groovy, but I feel like there must be a groovier way of doing this.
I just created gist at: https://gist.github.com/akhikhl/8070808 to demonstrate how such transformation is done with Groovy and JDOM2.
Important notes:
- Groovy technically allows using any java libraries. If something cannot be done with Groovy JDK, it can be done with other library.
- jaxen library (implementing XPath) should be included explicitly (via @Grab or via maven/gradle), since it's an optional dependency of JDOM2.
- The sequence of @Grab/@GrabExclude instructions fixes the quirky dependence of jaxen on JDOM-1.0.
- XPathFactory.compile also supports variable binding and filters (see online javadoc).
- XPathExpression (which is returned by compile) supports two major functions - evaluate and evaluateFirst. evaluate always returns a list of all XML-nodes, satisfying the specified predicate, while evaluateFirst returns just the first matching XML-node.
Update
The following code:
new XMLOutputter().with {
format = Format.getRawFormat()
format.setLineSeparator(LineSeparator.NONE)
output(doc, System.out)
}
solves a problem with preserving whitespaces and line separators. getRawFormat constructs a format object that preserves whitespaces. LineSeparator.NONE instructs format object, that it should not convert line separators.
The gist mentioned above contains this new code as well.
这篇关于使用groovy更新xml文件时保留格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!