xslt 2.0 标记化和分组 [英] xslt 2.0 tokenize and group

查看:43
本文介绍了xslt 2.0 标记化和分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含以下数据的文本文件:

I have an text file with following data:

<t>Heros
Firstname Sean
Lastname Connery
DOB 25-08-1930

Films
Dr.No 1962
Goldfinger 1964
Thunerball 1965

Award
name Academy
time 1

Award
name BAFTA
time 2

Award
name Gloden Globes
time 3</t>

预期输出应如下所示:

<Jamesfilms>
    <heros>
        <firstName>Sean</firstName>
        <lastName>Connery</lastName>
        <DOB>25-08-1930</DOB>
    </heros>
    <films>
        <Dr.No>1962</Dr.No>
        <Goldfinger>1964</Goldfinger>
        <Thunerball>1965</Thunerball>
    </films>
    <award>
        <name>Academy</name>
        <times>1</times>
    </award>
    <award>
        <name>BAFTA</name>
        <times>2</times>
    </award>
    <award>
        <name>Gloden Globes</name>
        <times>3</times>
    </award>
</Jamesfilms>

文本文件内容为空格分隔键值对,如何划分键值并生成XML节点?

the text file content are space separator key value pairs, how to divide key values and generate XML node?

我尝试了 Daniel Haley 答案,并尝试解决以下异常:

I have tried Daniel Haley answer, and trying to resolve below exception:

Error at xsl:for-each on line 10 of transformer.xslt:
  XTDE1170: Invalid relative URI: Illegal character in path at index 5: 

Java 类:

    final String TXT_PATH = "E:/tmp/test/input.txt";
    final String XSLT_PATH = "E:/tmp/test/txtToXml.xslt";
    final String XML_PATH = "E:/tmp/test/test_xml_result.xml";

    TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl();
    Transformer transformer = tFactory.newTransformer(new StreamSource(new File(XSLT_PATH)));
    transformer.transform(new StreamSource(new File(TXT_PATH)),new StreamResult(new File(XML_PATH)));

和修改过的 xslt:

and modified xslt:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="input-encoding" as="xs:string" select="'iso-8859-1'"/>

  <xsl:variable name="initData" as="node()">
    <Jamesfilms>
      <xsl:for-each select="tokenize(unparsed-text(., $input-encoding),'\r?\n\r?\n')">
        <xsl:variable name="tokens" select="tokenize(.,'\r?\n')"/>
        <xsl:choose>
          <xsl:when test="$tokens[1] castable as xs:QName">
            <xsl:element name="{$tokens[1]}">
              <xsl:for-each select="$tokens[position() > 1]">
                <xsl:variable name="tokens2" select="tokenize(.,'\s')"/>
                <xsl:choose>
                  <xsl:when test="$tokens2[1] castable as xs:QName">
                    <xsl:element name="{$tokens2[1]}">
                      <xsl:value-of select="$tokens2[position()>1]" separator=" "/>
                    </xsl:element>                      
                  </xsl:when>
                  <xsl:otherwise>
                    <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens2[1]"/></xsl:message>
                  </xsl:otherwise>
                </xsl:choose>
              </xsl:for-each>
            </xsl:element>            
          </xsl:when>
          <xsl:otherwise>
            <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens[1]"/></xsl:message>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    </Jamesfilms>
  </xsl:variable>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/">
    <xsl:apply-templates select="$initData"/>    
  </xsl:template>

  <!--Add additional templates to do further transforming of the initial data ($initData).-->

</xsl:stylesheet>

推荐答案

你不应该需要分组;你可以标记化(以及标记化和标记化......).

You shouldn't need to group; you could just tokenize (and tokenize and tokenize...).

这是一个例子.它对元素名称的大小写没有任何作用.您可以在构建 $initData 期间处理这些更改,也可以添加其他模板来处理任何更改.

Here's an example. It doesn't do anything with the case of the element names. You can either handle those changes during the building of $initData, or you can add additional templates to handle any changes.

此外,元素名称必须是有效的 QNames.现在样式表通过一条消息终止处理,但您可以更改处理方式.

Also, the element names have to be valid QNames. Right now the stylesheet terminates processing with a message, but you can change how that's handled.

这至少应该让你开始......

This should at least get you started...

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="input-encoding" as="xs:string" select="'iso-8859-1'"/>
  <xsl:param name="input-uri" as="xs:string" select="'so.txt'"/>

  <xsl:variable name="initData" as="node()">
    <Jamesfilms>
      <xsl:for-each select="tokenize(unparsed-text($input-uri, $input-encoding),'\r?\n\r?\n')">
        <xsl:variable name="tokens" select="tokenize(.,'\r?\n')"/>
        <xsl:choose>
          <xsl:when test="$tokens[1] castable as xs:QName">
            <xsl:element name="{$tokens[1]}">
              <xsl:for-each select="$tokens[position() > 1]">
                <xsl:variable name="tokens2" select="tokenize(.,'\s')"/>
                <xsl:choose>
                  <xsl:when test="$tokens2[1] castable as xs:QName">
                    <xsl:element name="{$tokens2[1]}">
                      <xsl:value-of select="$tokens2[position()>1]" separator=" "/>
                    </xsl:element>                      
                  </xsl:when>
                  <xsl:otherwise>
                    <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens2[1]"/></xsl:message>
                  </xsl:otherwise>
                </xsl:choose>
              </xsl:for-each>
            </xsl:element>            
          </xsl:when>
          <xsl:otherwise>
            <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens[1]"/></xsl:message>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    </Jamesfilms>
  </xsl:variable>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/">
    <xsl:apply-templates select="$initData"/>    
  </xsl:template>

  <!--Add additional templates to do further transforming of the initial data ($initData).-->

</xsl:stylesheet>

<小时>

编辑

您将文本文件作为转换的输入传入.这就是您必须添加 元素的原因.

You're passing the text file in as the input of the transform. That's why you had to add the <t> element.

由于您实际上没有 XML 输入,因此您可以将样式表本身作为输入传递.什么都不会得到处理,因为我们只是将模板应用到模板中匹配根 (/) 的变量.

Since you don't actually have an XML input, you can pass the stylesheet itself in as input. Nothing will get processed because we're only applying-templates to the variable in the template that matches root (/).

您还需要使用 transformer.setParameter("input-uri", TXT_PATH); 设置 input-uri 参数.如果您的路径是绝对路径,请务必添加 file:/// 协议.

You also need to set the input-uri parameter with transformer.setParameter("input-uri", TXT_PATH);. If your path is absolute, be sure to add the file:/// protocol.

示例...

文本文件

Heros
Firstname Sean
Lastname Connery
DOB 25-08-1930

Films
Dr.No 1962
Goldfinger 1964
Thunerball 1965

Award
name Academy
time 1

Award
name BAFTA
time 2

Award
name Gloden Globes
time 3

Java(您需要更改路径/文件名)

Java (you'll need to change paths/filenames)

final String TXT_PATH = "file:///C:/tmp/input.txt";
final String XSLT_PATH = "C:/tmp/txt2xml.xsl";
final String XML_PATH = "C:/tmp/test_xml_result.xml";

TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl();
Transformer transformer = tFactory.newTransformer(new StreamSource(new File(XSLT_PATH)));
transformer.setParameter("input-uri", TXT_PATH);
transformer.transform(new StreamSource(new File(XSLT_PATH)),new StreamResult(new File(XML_PATH)));

XSLT 2.0

同上.

输出

<Jamesfilms>
   <Heros>
      <Firstname>Sean</Firstname>
      <Lastname>Connery</Lastname>
      <DOB>25-08-1930</DOB>
   </Heros>
   <Films>
      <Dr.No>1962</Dr.No>
      <Goldfinger>1964</Goldfinger>
      <Thunerball>1965</Thunerball>
   </Films>
   <Award>
      <name>Academy</name>
      <time>1</time>
   </Award>
   <Award>
      <name>BAFTA</name>
      <time>2</time>
   </Award>
   <Award>
      <name>Gloden Globes</name>
      <time>3</time>
   </Award>
</Jamesfilms>

但是,由于您使用的是 Saxon,您可以使用 s9api 并指定初始模板.这是我的做法,而不是将样式表作为输入传递给转换.

However, since you're using Saxon you could use the s9api and specify an initial template. This is the way I would do it instead of passing the stylesheet as the input to the transform.

示例...

Java

final String TXT_PATH = "file:///C:/tmp/input.txt";
final String XSLT_PATH = "C:/tmp/txt2xml.xsl";
final String XML_PATH = "C:/tmp/test_xml_result.xml";

Processor processor = new Processor(false);
Serializer serializer = processor.newSerializer();
serializer.setOutputFile(new File(XML_PATH));
XsltCompiler compiler = processor.newXsltCompiler();
XsltExecutable executable = compiler.compile(new StreamSource(new File(XSLT_PATH)));
XsltTransformer transformer = executable.load();
transformer.setInitialTemplate(new QName("root"));
transformer.setParameter(new QName("input-uri"), new XdmAtomicValue(TXT_PATH));
transformer.setDestination(serializer);
transformer.transform();

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="input-encoding" as="xs:string" select="'iso-8859-1'"/>
  <xsl:param name="input-uri" as="xs:string"/>

  <xsl:variable name="initData" as="node()">
    <Jamesfilms>
      <xsl:for-each select="tokenize(unparsed-text($input-uri, $input-encoding),'\r?\n\r?\n')">
        <xsl:variable name="tokens" select="tokenize(.,'\r?\n')"/>
        <xsl:choose>
          <xsl:when test="$tokens[1] castable as xs:QName">
            <xsl:element name="{replace($tokens[1],'\s','')}">
              <xsl:for-each select="$tokens[position() > 1]">
                <xsl:variable name="tokens2" select="tokenize(.,'\s')"/>
                <xsl:choose>
                  <xsl:when test="$tokens2[1] castable as xs:QName">
                    <xsl:element name="{$tokens2[1]}">
                      <xsl:value-of select="$tokens2[position()>1]" separator=" "/>
                    </xsl:element>                      
                  </xsl:when>
                  <xsl:otherwise>
                    <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens2[1]"/></xsl:message>
                  </xsl:otherwise>
                </xsl:choose>
              </xsl:for-each>
            </xsl:element>            
          </xsl:when>
          <xsl:otherwise>
            <xsl:message terminate="yes">Invalid element name: <xsl:value-of select="$tokens[1]"/></xsl:message>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    </Jamesfilms>
  </xsl:variable>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/" name="root">
    <xsl:apply-templates select="$initData"/>    
  </xsl:template>

  <!--Add additional templates to do further transforming of the initial data ($initData).-->

</xsl:stylesheet>

输入和输出是一样的.如果您需要我将 java 导入添加到示例中,请告诉我.

Input and output would be the same. Let me know if you need me to add the java imports to the example.

这篇关于xslt 2.0 标记化和分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆