使用groovy更新xml文件时保留格式 [英] preserve formatting when updating xml file with groovy

查看:309
本文介绍了使用groovy更新xml文件时保留格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量包含网址的XML文件。我正在编写一个groovy实用程序来查找每个URL并将其替换为更新后的版本。



给定example.xml:

 <?xml version =1.0encoding =UTF-8?> 
< page>
< content>
<节>
< link>
< url> / some / old / url< / url>
< / link>
< link>
< url> / some / old / url< / url>
< / link>
< / section>
<节>
< link>
< url>
/ a / different / old / url?with = specialChars& amp; amp; amp; amp; amp; ampped; escaped = true
< / url>
< / link>
< / section>
< / content>
< / page>

脚本运行后,example.xml应该包含:

 <?xml version =1.0encoding =UTF-8?> 
< page>
< content>
<节>
< link>
< url> / a / new / and / improved / url< / url>
< / link>
< link>
< url> / a / new / and / improved / url< / url>
< / link>
< / section>
<节>
< link>
< url>
/ a / different / new /和/ improved / url?with = specialChars& amp; ampEs& amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp;
< / link>
< / section>
< / content>
< / page>

使用groovy优秀的xml支持很容易,除了我想更改URL和其他文件



我的意思是:


  • 空白不得更改(文件可能包含空格,制表符或两者)
  • 必须保留注释

  • windows与unix-样式行分隔符必须保留

  • 不能添加或删除顶部的xml声明
  • 标签中的
  • 属性必须保留其顺序
  • >


到目前为止,在尝试XmlParser,DOMBuilder,XmlNodePrinter,XmlUtil.serialize()等多种组合后,我已阅读每个文件一行一行,并应用xml实用程序和正则表达式的混合。



读取和写入每个文件:

  files.each {档案档案 - > 
def lineEnding = file.text.contains('\r\\\
')? '\r\\\
':'\\\
'
def newLineAtEof = file.text.endsWith(lineEnding)
def lines = file.readLines()
file.withWriter {w - >
lines.eachWithIndex {line,index - >
line = update(line)
w.write(line)
if(index< lines.size-1)w.write(lineEnding)
else if(newLineAtEof) w.write(lineEnding)
}
}
}

搜索并更新一行中的网址:

  def matcher =(line =〜urlTagRegexp)//匹配< url>元素及其内容
matcher.each {groups - >
def urlNode = new XmlParser()。parseText(line)
def url = urlNode.text()
def newUrl = translate(url)
if(newUrl){
urlNode.value = newUrl
def replacement = nodeToString(urlNode)
line = matcher.replaceAll(替换)
}
}

def nodeToString (节点){
def writer = new StringWriter()
writer.withPrintWriter {printWriter - >
def printer = new XmlNodePrinter(printWriter)
printer.preserveWhitespace = true
printer.print(node)
}
writer.toString()。replaceAll(/ \r\\\
] /,'')
}

除非它不能处理一个标签分割成多行,并且在将文件写回去时搞乱换行是麻烦的。



我是groovy的新手,但我感觉像是必须有一个更加方便的方法来做到这一点。

//gist.github.com/akhikhl/8070808rel =noreferrer> https://gist.github.com/akhikhl/8070808 来演示如何使用Groovy和JDOM2完成这种转换。



重要注意事项:


  1. Groovy技术上允许使用任何java库。如果使用Groovy JDK无法完成
    操作,可以使用其他库完成。
  2. 应该明确包含
  3. jaxen库(实现XPath)(通过@Grab或通过maven / gradle),因为它是JDOM2的一个可选依赖项。

  4. @GrabExclude指令的顺序修复了jaxen对JDOM-1.0的离奇依赖。
  5. XPathFactory.compile还支持变量绑定和过滤器(请参阅联机javadoc)。
  6. XPathExpression(由编译器返回)支持两个主要函数 - evaluate和evaluateFirst。评估总是返回所有XML节点的列表,满足指定的谓词,而evaluateFirst只返回第一个匹配的XML节点。

更新



以下代码:

 新的XMLOutputter()。带有{
format = Format.getRawFormat()
format.setLineSeparator(LineSeparator.NONE)
output(doc,System.out)
}

解决了保留空格和行分隔符的问题。 getRawFormat构造一个保留空格的格式对象。 LineSeparator.NONE指示格式对象,它不应该转换行分隔符。



上面提到的要点也包含这个新代码。


I have a large number of XML files that contain URLs. I'm writing a groovy utility to find each URL and replace it with an updated version.

Given example.xml:

<?xml version="1.0" encoding="UTF-8"?>
<page>
    <content>
        <section>
            <link>
                <url>/some/old/url</url>
            </link>
            <link>
                <url>/some/old/url</url>
            </link>
        </section>
        <section>
            <link>
                <url>
                    /a/different/old/url?with=specialChars&amp;escaped=true
                </url>
            </link>
        </section>
    </content>
</page>

Once the script has run, example.xml should contain:

<?xml version="1.0" encoding="UTF-8"?>
<page>
    <content>
        <section>
            <link>
                <url>/a/new/and/improved/url</url>
            </link>
            <link>
                <url>/a/new/and/improved/url</url>
            </link>
        </section>
        <section>
            <link>
                <url>
                    /a/different/new/and/improved/url?with=specialChars&amp;stillEscaped=true
                </url>
            </link>
        </section>
    </content>
</page>

This is easy to do using groovy's excellent xml support, except that I want to change the URLs and nothing else about the file.

By that I mean:

  • whitespace must not change (files might contain spaces, tabs, or both)
  • comments must be preserved
  • windows vs. unix-style line separators must be preserved
  • the xml declaration at the top must not be added or removed
  • attributes in tags must retain their order

So far, after trying many combinations of XmlParser, DOMBuilder, XmlNodePrinter, XmlUtil.serialize(), and so on, I've landed on reading each file line-by-line and applying an ugly hybrid of the xml utilities and regular expressions.

Reading and writing each file:

files.each { File file ->
    def lineEnding = file.text.contains('\r\n') ? '\r\n' : '\n'
    def newLineAtEof = file.text.endsWith(lineEnding)
    def lines = file.readLines()
    file.withWriter { w ->
        lines.eachWithIndex { line, index ->
            line = update(line)
            w.write(line)
            if (index < lines.size-1) w.write(lineEnding)
            else if (newLineAtEof) w.write(lineEnding)
        }
    }
}

Searching for and updating URLs within a line:

def matcher = (line =~ urlTagRegexp) //matches a <url> element and its contents
matcher.each { groups ->
    def urlNode = new XmlParser().parseText(line)
    def url = urlNode.text()
    def newUrl = translate(url)
    if (newUrl) {
        urlNode.value = newUrl
        def replacement = nodeToString(urlNode)
        line = matcher.replaceAll(replacement)
    }
}

def nodeToString(node) {
    def writer = new StringWriter()
    writer.withPrintWriter { printWriter ->
        def printer = new XmlNodePrinter(printWriter)
        printer.preserveWhitespace = true
        printer.print(node)
    }
    writer.toString().replaceAll(/[\r\n]/, '')
}

This mostly works, except it can't handle a tag split over multiple lines, and messing with newlines when writing the files back out is cumbersome.

I'm new to groovy, but I feel like there must be a groovier way of doing this.

解决方案

I just created gist at: https://gist.github.com/akhikhl/8070808 to demonstrate how such transformation is done with Groovy and JDOM2.

Important notes:

  1. Groovy technically allows using any java libraries. If something cannot be done with Groovy JDK, it can be done with other library.
  2. jaxen library (implementing XPath) should be included explicitly (via @Grab or via maven/gradle), since it's an optional dependency of JDOM2.
  3. The sequence of @Grab/@GrabExclude instructions fixes the quirky dependence of jaxen on JDOM-1.0.
  4. XPathFactory.compile also supports variable binding and filters (see online javadoc).
  5. XPathExpression (which is returned by compile) supports two major functions - evaluate and evaluateFirst. evaluate always returns a list of all XML-nodes, satisfying the specified predicate, while evaluateFirst returns just the first matching XML-node.

Update

The following code:

new XMLOutputter().with {
  format = Format.getRawFormat()
  format.setLineSeparator(LineSeparator.NONE)
  output(doc, System.out)
}

solves a problem with preserving whitespaces and line separators. getRawFormat constructs a format object that preserves whitespaces. LineSeparator.NONE instructs format object, that it should not convert line separators.

The gist mentioned above contains this new code as well.

这篇关于使用groovy更新xml文件时保留格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆