如何在反向域名排序和自定义过滤方面规范化 XML [英] How to normalize XML on reverse domain name sorting and custom filtering

查看:16
本文介绍了如何在反向域名排序和自定义过滤方面规范化 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在研究地理应用程序.随着时间的推移,产品的 XML 变得有点混乱.在跨多个环境(如 Dev、Test 等)同步更改时会出现问题.我试图找出一种方法来规范化内容,这样我就可以在编辑和合并时避免一些麻烦,从而实现高效的开发.我知道这听起来很疯狂,而且背景有很多,但让我跳到离开历史的实际问题.

问题来了:

  1. 应用了多个排序顺序,例如:

    • 根据反向域名排序.例如,它应该将 dcba 读作 abcdmap.google.com 作为 com.google.map排序.
    • 当域包含非字母数字字符时,如 *、?、[、] 等,则该节点应在特定字符之后,因为范围很广.
    • 按端口排序路径作为第二次后续排序.
    • 元素下的标签应用类似的排序顺序(如果存在).
  2. 删除 标签,当这些值是通用的,比如 http/https 用于方案标签和 80 或 443 用于端口标签,否则保持.此外,如果没有值,则删除,例如 .
  3. 按原样保留所有其他标记和值.
  4. 诸如缩进 2 个空格字符和实际数据之类的小事,而不需要样板文件.

这里有一些有问题的 XML:

XML

<mapIndividual><src><方案>https</方案><domain>photos.yahoo.com</domain><path>somepath</path><query>blah</query></src><loc>C:\var\tmp</loc><x>废话</x><y>废话</y></mapIndividual><mapIndividual><src><scheme>tcp</scheme><domain>map.google.com</domain><端口>80</端口><path>/value</path><query>blah</query></src><tgt;<方案>https</方案><domain>map.google.com</domain><端口>443</端口><path>/value</path><query>blah</query></tgt><x>废话</x><y>废话</y></mapIndividual><mapIndividual><src><scheme>http</scheme><域>*.c.b.a</域><path>somepath</path><端口>8085</端口><query>blah</query></src><tgt;<域>r.q.p</域><path>somepath</path><query>blah</query></tgt><x>废话</x><y>废话</y></mapIndividual><mapIndividual><src><scheme>http</scheme><域>d.c.b.a</域><path>somepath</path><端口>8085</端口><query>blah</query></src><tgt;<域>r.q.p</域><path>somepath</path><query>blah</query></tgt><x>等等</x><y>废话</y></mapIndividual><地图></mapGeo>

我能够按原样对值应用基本排序,但无法找到生成反向域名的方法.我遇到过 XSL 扩展,但还没有尝试过.这是我正在研究的解决方案的开始部分,非常基础.

XSL

<xsl:output method="xml" indent="yes"/><xsl:template match="node()"><xsl:copy><xsl:apply-templates select="node()"/></xsl:copy></xsl:模板><xsl:template match="maps"><xsl:copy><xsl:apply-templates select="*"><xsl:sort select="src/domain"/><xsl:sort select="src/port"/></xsl:apply-templates></xsl:copy></xsl:模板></xsl:stylesheet>

预期产出

<mapIndividual><src><域>d.c.b.a</域><path>somepath</path><端口>8085</端口><query>blah</query></src><tgt;<域>r.q.p</域><path>somepath</path><query>blah</query></tgt><x>废话</x><y>废话</y></mapIndividual><mapIndividual><src><域>*.c.b.a</域><path>path1</path><端口>8085</端口><query>blah</query></src><tgt;<域>r.q.p</域><path>path2</path><query>blah</query></tgt><x>废话</x><y>废话</y></mapIndividual><mapIndividual><src><scheme>tcp</scheme><domain>map.google.com</domain><path>/value</path><query>blah</query></src><tgt;<domain>map.google.com</domain><path>/value</path><query>blah</query></tgt><x>废话</x><y>废话</y></mapIndividual><mapIndividual><src><domain>photos.yahoo.com</domain><path>somepath</path><query>blah</query></src><loc>C:\var\tmp</loc><x>废话</x><y>废话</y></mapIndividual><地图></mapGeo>

注意:我更喜欢 XSLT 1.0,因为它在当前环境中受支持.XSLT 2.0 将是一个加分项.

更新:我找到了支持 XSLT 2.0 和 XSLT 3.0 的解决方案,所以请忽略我之前对 XSLT 1.0 的说明.

提前谢谢您!

干杯,

解决方案

这个 XSLT 1.0 样式表(没有扩展)

<xsl:排序选择="子串后(子串后(substring-after(translate(src/domain,'*','~'),'.'),'.'),'.')"/><xsl:排序选择="子串后(substring-after(translate(src/domain,'*','~'),'.'),'.')"/><xsl:排序select="substring-after(translate(src/domain,'*','~'),'.')"/><xsl:sort select="translate(src/domain,'*','~')"/><xsl:sort select="src/port"/></xsl:apply-templates></xsl:copy></xsl:模板></xsl:stylesheet>

输出

<mapIndividual><src><scheme>http</scheme><域>d.c.b.a</域><path>somepath</path><端口>8085</端口><query>blah</query></src><tgt;<域>r.q.p</域><path>somepath</path><query>blah</query></tgt><x>废话</x><y>废话</y></mapIndividual><mapIndividual><src><scheme>http</scheme><域>*.c.b.a</域><path>somepath</path><端口>8085</端口><query>blah</query></src><tgt;<域>r.q.p</域><path>somepath</path><query>blah</query></tgt><x>废话</x><y>废话</y></mapIndividual><mapIndividual><src><scheme>tcp</scheme><domain>map.google.com</domain><端口>80</端口><path>/value</path><query>blah</query></src><tgt;<方案>https</方案><domain>map.google.com</domain><端口>443</端口><path>/value</path><query>blah</query></tgt><x>废话</x><y>废话</y></mapIndividual><mapIndividual><src><方案>https</方案><domain>photos.yahoo.com</domain><path>somepath</path><query>blah</query></src><loc>C:\var\tmp</loc><x>废话</x><y>废话</y></mapIndividual></地图></mapGeo>

请注意:这是使用 .(点)在前面,~ 按字母顺序(在至少对美国而言).也可能(原文如此)不能很好地扩展......

我和 Martin Honnen 评论:这在 XSLT 2.0 中会更好地解决

I've been working on a Geo application. Over the time the product's XML has grown bit messy. The problem arises when synchronizing the changes across multiple environments, like Dev, Test, etc. I'm trying to figure out a way to normalize the content, so I can avoid some cumbersome while editing and merging, and hence, have a productive development. I know it sounds crazy, and there's lot on the background, but let me jump to the actual issue leaving the history.

Here's the issue:

  1. Multiple sorting orders applied, like:

    • Sort based on reverse domain name. For example, it should read d.c.b.a as a.b.c.d or map.google.com as com.google.map for sorting.
    • When the domain contains non-alphanumeric char, like *, ?, [, ], etc, then that node should be after the specific one as the scope is wide.
    • Sort on port & path as 2nd subsequent sorting.
    • Apply similar sorting order for tags under <tgt> element if present.
  2. Eliminate <scheme> and <port> tags when the values are generic, like http / https for scheme tag and 80 or 443 for port tag, otherwise retain. Also, remove if there's no value, like <scheme/>.
  3. Preserve all other tag and values as-is.
  4. Trivial thing like indent to 2 space characters and actual data without having wanted boilerplate stuff.

Here's a bit of the problematic XML:

XML

<?xml version='1.0' encoding='UTF-8' ?>
<?tapia chrome-version='2.0' ?>
<mapGeo>
  <a>blah</a>
  <b>blah</b>
  <maps>
    <mapIndividual>
      <src>
        <scheme>https</scheme>
        <domain>photos.yahoo.com</domain>
        <path>somepath</path>
        <query>blah</query>
      </src>
      <loc>C:\var\tmp</loc>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <scheme>tcp</scheme>
        <domain>map.google.com</domain>
        <port>80</port>
        <path>/value</path>
        <query>blah</query>
      </src>
      <tgt>
        <scheme>https</scheme>
        <domain>map.google.com</domain>
        <port>443</port>
        <path>/value</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <scheme>http</scheme>
        <domain>*.c.b.a</domain>
        <path>somepath</path>
        <port>8085</port>
        <query>blah</query>
      </src>
      <tgt>
        <domain>r.q.p</domain>
        <path>somepath</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <scheme>http</scheme>
        <domain>d.c.b.a</domain>
        <path>somepath</path>
        <port>8085</port>
        <query>blah</query>
      </src>
      <tgt>
        <domain>r.q.p</domain>
        <path>somepath</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
  <maps>
</mapGeo>

I was able to apply basic sorting on the values as is, but couldn't figure out a way to generate reverse domain name. I came across XSL extension, but haven't tried yet. Here's the beginning part of the solution I was working on, which is very basic.

XSL

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

<xsl:template match="node()">
    <xsl:copy>
      <xsl:apply-templates select="node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="maps">
    <xsl:copy>
      <xsl:apply-templates select="*">
        <xsl:sort select="src/domain" />
        <xsl:sort select="src/port" />
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Expected Output

<?xml version='1.0' encoding='UTF-8' ?>
<?tapia chrome-version='2.0' ?>
<mapGeo>
  <a>blah</a>
  <b>blah</b>
  <maps>
    <mapIndividual>
      <src>
        <domain>d.c.b.a</domain>
        <path>somepath</path>
        <port>8085</port>
        <query>blah</query>
      </src>
      <tgt>
        <domain>r.q.p</domain>
        <path>somepath</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <domain>*.c.b.a</domain>
        <path>path1</path>
        <port>8085</port>
        <query>blah</query>
      </src>
      <tgt>
        <domain>r.q.p</domain>
        <path>path2</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <scheme>tcp</scheme>
        <domain>map.google.com</domain>
        <path>/value</path>
        <query>blah</query>
      </src>
      <tgt>
        <domain>map.google.com</domain>
        <path>/value</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <domain>photos.yahoo.com</domain>
        <path>somepath</path>
        <query>blah</query>
      </src>
      <loc>C:\var\tmp</loc>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
  <maps>
</mapGeo>

Note: I'd prefer XSLT 1.0 as that's supported in the current environment. XSLT 2.0 would be a plus.

Update: I figured out solution to support XSLT 2.0 and XSLT 3.0, so please ignore my previous note for XSLT 1.0.

Thank you in Advance!

Cheers,

解决方案

This XSLT 1.0 stylesheet (without extensions)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output indent="yes" />
    <xsl:strip-space elements="*"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="maps">
        <xsl:copy>
            <xsl:apply-templates select="*">
                <xsl:sort 
                    select="translate(src/domain,translate(src/domain,'.',''),'')" 
                    order="descending"/>
                <xsl:sort 
                    select="
                      substring-after(
                        substring-after(
                          substring-after(translate(src/domain,'*','~'),'.'),'.'),'.')"/>
                <xsl:sort 
                    select="
                        substring-after(
                            substring-after(translate(src/domain,'*','~'),'.'),'.')"/>
                <xsl:sort 
                    select="substring-after(translate(src/domain,'*','~'),'.')"/>
                <xsl:sort select="translate(src/domain,'*','~')" />
                <xsl:sort select="src/port" />
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Output

<?xml version="1.0" encoding="UTF-8"?>
<?tapia chrome-version='2.0' ?>
<mapGeo>
   <a>blah</a>
   <b>blah</b>
   <maps>
      <mapIndividual>
         <src>
            <scheme>http</scheme>
            <domain>d.c.b.a</domain>
            <path>somepath</path>
            <port>8085</port>
            <query>blah</query>
         </src>
         <tgt>
            <domain>r.q.p</domain>
            <path>somepath</path>
            <query>blah</query>
         </tgt>
         <x>blah</x>
         <y>blah</y>
      </mapIndividual>
      <mapIndividual>
         <src>
            <scheme>http</scheme>
            <domain>*.c.b.a</domain>
            <path>somepath</path>
            <port>8085</port>
            <query>blah</query>
         </src>
         <tgt>
            <domain>r.q.p</domain>
            <path>somepath</path>
            <query>blah</query>
         </tgt>
         <x>blah</x>
         <y>blah</y>
      </mapIndividual>
      <mapIndividual>
         <src>
            <scheme>tcp</scheme>
            <domain>map.google.com</domain>
            <port>80</port>
            <path>/value</path>
            <query>blah</query>
         </src>
         <tgt>
            <scheme>https</scheme>
            <domain>map.google.com</domain>
            <port>443</port>
            <path>/value</path>
            <query>blah</query>
         </tgt>
         <x>blah</x>
         <y>blah</y>
      </mapIndividual>
      <mapIndividual>
         <src>
            <scheme>https</scheme>
            <domain>photos.yahoo.com</domain>
            <path>somepath</path>
            <query>blah</query>
         </src>
         <loc>C:\var\tmp</loc>
         <x>blah</x>
         <y>blah</y>
      </mapIndividual>
   </maps>
</mapGeo>

Do note: this is ussing the fact that . (dot) precedes and ~ follows (tilde) letters in alphabetical order (at least for US). Also might (sic) not scale well...

I'm with Martin Honnen comment: this would be better solved in XSLT 2.0

这篇关于如何在反向域名排序和自定义过滤方面规范化 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆