XPath使用属性和节点解析eCFR XML [英] XPath to parse eCFR XML using attributes and nodes

查看:86
本文介绍了XPath使用属性和节点解析eCFR XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对该问题进行了重大编辑,以使事情更加清楚.

This question has been significantly edited to make things a bit clearer.

我正在尝试从联邦法规XML提要的电子提要中提取数据(

I am attempting to pull data out of the electronic Code of Federal Regulations XML feed (http://www.gpo.gov/fdsys/bulkdata/CFR/2015/title-15/CFR-2015-title15-vol2.xml) and am having trouble.

具体地说,我想获取将由Node和Attribute组合匹配的数据.在以下XML片段中,您可以看到一些我想抓取的文本.我想获取存在属性FP-2的每个FP节点的数据.我还想获取具有属性FP-1的每个FP节点的数据.

Specifically, I'd like to grab data that will be matched by a combination of Node and Attribute. In the following snippet of XML, you can see some of the text I'd like to grab. I would like to obtain the data for each FP node where the attribute FP-2 is present. I would also like to grab the data for each FP node having the attribute FP-1.

<APPENDIX>
              <EAR>Pt. 774, Supp. 1</EAR>
              <HD SOURCE="HED">Supplement No. 1 to Part 774—The Commerce Control List</HD>
              <HD SOURCE="HD1">Category 0—Nuclear Materials, Facilities, and Equipment [and Miscellaneous Items]</HD>
              <HD SOURCE="HD1">A. "End Items," "Equipment," "Accessories," "Attachments," "Parts," "Components," and "Systems"</HD>
              <FP SOURCE="FP-2">
                <E T="02">0A002Power generating or propulsion equipment "specially designed" for use with space, marine or mobile "nuclear reactors". (These items are "subject to the ITAR." See 22 CFR parts 120 through 130.)</E>
              </FP>
              
              <FP SOURCE="FP-2">
                <E T="02">0A018Items on the Wassenaar Munitions List (see List of Items Controlled).</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="04">License Requirements</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">Reason for Control:</E> NS, AT, UN</FP>
              <GPOTABLE CDEF="s50,r50" COLS="2" OPTS="L2">
                <BOXHD>
                  <CHED H="1">Control(s)</CHED>
                  <CHED H="1">Country Chart (See Supp. No. 1 to part 738)</CHED>
                </BOXHD>
                <ROW>
                  <ENT I="01">NS applies to entire entry</ENT>
                  <ENT>NS Column 1.</ENT>
                </ROW>
                <ROW>
                  <ENT I="01">AT applies to entire entry</ENT>
                  <ENT>AT Column 1.</ENT>
                </ROW>
                <ROW>
                  <ENT I="01">UN applies to entire entry</ENT>
                  <ENT>See § 746.1(b) for UN controls.</ENT>
                </ROW>
              </GPOTABLE>
              <FP SOURCE="FP-1">
                <E T="05">List Based License Exceptions (See Part 740 for a description of all license exceptions)</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">LVS:</E> $3,000 for 0A018.b</FP>
              <FP SOURCE="FP-1">$1,500 for 0A018.c and .d</FP>
              <FP SOURCE="FP-1">
                <E T="03">GBS:</E> N/A</FP>
              <FP SOURCE="FP-1">
                <E T="03">CIV:</E> N/A</FP>
              <FP SOURCE="FP-1">
                <E T="04">List of Items Controlled</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">Related Controls:</E> (1) See also 0A979, 0A988, and 22 CFR 121.1 Categories I(a), III(b-d), and X(a). (2) See ECCN 0A617.y.1 and .y.2 for items formerly controlled by ECCN 0A018.a. (3) See ECCN 1A613.c for military helmets providing less than NIJ Type IV protection and ECCN 1A613.y.1 for conventional military steel helmets that, immediately prior to July 1, 2014, were classified under 0A018.d and 0A988. (4) See 22 CFR 121.1 Category X(a)(5) and (a)(6) for controls on other military helmets.</FP>
              <FP SOURCE="FP-1">
                <E T="03">Related Definitions:</E> N/A</FP>
              <FP>
                <E T="03">Items:</E> a. [Reserved]</FP>
              <P>b. "Specially designed" components and parts for ammunition, except cartridge cases, powder bags, bullets, jackets, cores, shells, projectiles, boosters, fuses and components, primers, and other detonating devices and ammunition belting and linking machines (all of which are "subject to the ITAR." (See 22 CFR parts 120 through 130);</P>
              <NOTE>
                <HD SOURCE="HED">
                  <E T="03">Note:</E>
                </HD>
                <P>
                  <E T="03">0A018.b does not apply to "components" "specially designed" for blank or dummy ammunition as follows:</E>
                </P>
                <P>
                  <E T="03">a. Ammunition crimped without a projectile (blank star);</E>
                </P>
 </APPENDIX>

为了使事情复杂化,我正在尝试将这些数据提取到Filemaker中,但是在编辑后,我将坚持使用简单的XSL.

To complicate matters, I'm trying to pull this data into Filemaker, but upon edit, I'll stick to simple XSL.

以下XSL无需区分即可捕获所有FP节点.

The following XSL grabs all of the FP nodes without differentiation.

<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//FP">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

修改它以匹配xsl:template match ="FP [@ SOURCE ='FP-1']允许我根据属性进行必要的匹配,但是我仍然不清楚如何捕获数据我需要.有想法吗?

Modifying this to match on xsl:template match="FP[@SOURCE='FP-1'] allows me to make the necessary match based on the attribute, but I'm still not clear on how to capture the data I need. Thoughts?

推荐答案

几件事:

  1. 您的XSLT实际上不是XSLT格式
  2. 在XPath中,要引用属性(即SOURCE),必须以@为前缀.
  3. 最后,有许多FP1和FP2,但是您的设置仅选择第一个实例.
  1. Your XSLT actually is not an XSLT format
  2. In XPath, to reference an attribute (i.e., SOURCE), it must be prefixed with @.
  3. Finally, there are many FP1s and FP2s but your setup only choose first instances.

考虑以下XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8"/>

<xsl:template match="/">
   <FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">

    <METADATA>
        <FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
    <FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
    </METADATA>

    <RESULTSET>

    <xsl:for-each select="//FP[@SOURCE = 'FP-2']/E[@T='02']">
    <ROW>
        <COL>
            <DATA><xsl:value-of select="substring(.,1,5)"/></DATA>
        </COL>
    </ROW>
    </xsl:for-each>    

    <xsl:for-each select="//FP[@SOURCE = 'FP-1']/E[@T='02']">
    <ROW>
        <COL>
            <DATA><xsl:value-of select="substring(.,1,5)"/></DATA>
        </COL>
    </ROW>
    </xsl:for-each>        

    </RESULTSET>
</FMPXMLRESULT>

</xsl:template>
</xsl:stylesheet>

哪个会输出:

<?xml version='1.0' encoding='UTF-8'?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
  <METADATA>
    <FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
    <FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
  </METADATA>
  <RESULTSET>
    <ROW>
      <COL>
        <DATA>0A002</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A018</DATA>
      </COL>
    </ROW>
  </RESULTSET>
</FMPXMLRESULT>

以及完整Web链接xml的部分输出:

And partial output of full web link xml:

<?xml version='1.0' encoding='UTF-8'?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
  <METADATA>
    <FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
    <FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
  </METADATA>
  <RESULTSET>
    <ROW>
      <COL>
        <DATA>2A000</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A002</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A018</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A521</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A604</DATA>
      </COL>
    </ROW>
    <ROW>
      <COL>
        <DATA>0A606</DATA>
      </COL>
    </ROW>
    ...

实际上,将XSLT处理器指向GPO链接以及所有FP1和FP2的输出.我只是用Python做到了!接近3,000行!

In fact, point your XSLT processor to the GPO link and all FP1s and FP2s output. I just did so with Python! Close to 3,000 lines!

这篇关于XPath使用属性和节点解析eCFR XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆