使用 Java 删除 XML 中的空标签 [英] Remove empty tags at XML using Java

查看:19
本文介绍了使用 Java 删除 XML 中的空标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为 servlet 提供一些功能,我想做的一件事是,在接收 InputStream(基本上是解析为 XML 格式的 PDF 文档)时将该数据设置为 String 对象,然后我尝试删除所有空标签,但到目前为止我还没有得到任何好的结果:

I'm giving some functionality to a servlet, one of the things I want to do is, when receiving the InputStream (which is basically a PDF document parsed into an XML format) set that data to a String object, then I try to delete all the empty tags, but I haven't got any good result so far:

这是servlet正在接收的数据

This is the data the servlet is receiving

    <form1>
        <GenInfo>
            <Section1>
                <EmployeeDet>
                    <Title>999990000</Title>
                    <Firstname>MIKE</Firstname>
                    <Surname>SPENCER</Surname>
                    <CoName/>
                    <EmpAdd>
                        <Address><Add1/><Add2/><Town/><County/><Pcode/></Address>
                    </EmpAdd>
                    <PosHeld>DEVELOPER</PosHeld>
                    <Email/>
                    <ConNo/>
                    <Nationality/>
                    <PPSNo/>
                    <EmpNo/>
                </EmployeeDet>
            </Section1>
        </GenInfo>
    </form1>

最终结果应该是这样的:

The final result should be looking like this:

    <form1>
        <GenInfo>
            <Section1>
                <EmployeeDet>
                    <Title>999990000</Title>
                    <Firstname>MIKE</Firstname>
                    <Surname>SPENCER</Surname>
                    <PosHeld>DEVELOPER</PosHeld>
                </EmployeeDet>
            </Section1>
        </GenInfo>
    </form1>

如果这是一个重复的问题,我深表歉意,但我对类似的帖子进行了一些研究,但没有一个可以为我提供正确的方法,这就是我在单独的帖子中询问您的原因.

My apologies if it is a repeated question but I did some research over similar posts and none of them could provide me the correct approach, that's why I am asking you in a separate post.

提前致谢.

推荐答案

这里是 regex 做你想做的事情的方式.我敢肯定,可能有一些我没有想到的边缘"情况,但有时您不知道何时使用 regex.此外,DOM 解析器可能是执行此操作的最佳方式.

Here's regex way of doing what you're wanting. I'm sure there are probably some "edge" cases that I'm not thinking of, but sometimes you can't tell when you use regex. Also, a DOM parser is probably the best way to do this.

public static void main(String[] args) throws Exception {
    String[] patterns = new String[] {
        // This will remove empty elements that look like <ElementName/>
        "\\s*<\\w+/>", 
        // This will remove empty elements that look like <ElementName></ElementName>
        "\\s*<\\w+></\\w+>", 
        // This will remove empty elements that look like 
        // <ElementName>
        // </ElementName>
        "\\s*<\\w+>\n*\\s*</\\w+>"
    };

    String xml = "    <form1>\n" +
                    "        <GenInfo>\n" +
                    "            <Section1>\n" +
                    "                <EmployeeDet>\n" +
                    "                    <Title>999990000</Title>\n" +
                    "                    <Firstname>MIKE</Firstname>\n" +
                    "                    <Surname>SPENCER</Surname>\n" +
                    "                    <CoName/>\n" +
                    "                    <EmpAdd>\n" +
                    "                        <Address><Add1/><Add2/><Town/><County/><Pcode/></Address>\n" +
                    "                    </EmpAdd>\n" +
                    "                    <PosHeld>DEVELOPER</PosHeld>\n" +
                    "                    <Email/>\n" +
                    "                    <ConNo/>\n" +
                    "                    <Nationality/>\n" +
                    "                    <PPSNo/>\n" +
                    "                    <EmpNo/>\n" +
                    "                </EmployeeDet>\n" +
                    "            </Section1>\n" +
                    "        </GenInfo>\n" +
                    "    </form1>";

    for (String pattern : patterns) {
        Matcher matcher = Pattern.compile(pattern).matcher(xml);
        xml = matcher.replaceAll("");
    }

    System.out.println(xml);
}

结果:

    <form1>
        <GenInfo>
            <Section1>
                <EmployeeDet>
                    <Title>999990000</Title>
                    <Firstname>MIKE</Firstname>
                    <Surname>SPENCER</Surname>
                    <PosHeld>DEVELOPER</PosHeld>
                </EmployeeDet>
            </Section1>
        </GenInfo>
    </form1>

这篇关于使用 Java 删除 XML 中的空标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆