Converting Hierarchical DOM-Tree XML Files:XSLT 可以转换为“扁平化"的 XML 文件吗?XML 文件不会丢失数据? [英] Converting Hierarchical DOM-Tree XML Files: Can XSLT convert to a "flattened" XML file without losing data?

查看:24
本文介绍了Converting Hierarchical DOM-Tree XML Files:XSLT 可以转换为“扁平化"的 XML 文件吗?XML 文件不会丢失数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理每个月都会收到的一批 XML 文件.它们都遵循相同的 DOM 树结构,并且不附带模式文件.这是一个示例:

I am working with a batch of XML files that come every month. They all follow the same DOM tree structure and they do not come with schema files. Here is a sample:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<REPORT><REPORT-DTL><REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END></REPORT-DTL>
  <GROUP><GROUP-DTL><GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME></GROUP-DTL>
    <PROVIDER><PROVIDER-DTL><PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME></PROVIDER-DTL>
      <PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientANumber1234</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientALastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientAFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1941-02-11</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX></PATIENT-DTL>
        <SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientAServiceDate2020-09-07</SERVICE-DATE><SERVICE-CODE>PatientAServiceCodeABC1</SERVICE-CODE><SERVICE-DESCRIPTION>PatientAServiceDescription-Facelift</SERVICE-DESCRIPTION><SERVICE-AMT>PatientAServiceAmount8.90</SERVICE-AMT></SERVICE-DTL1>
      </PATIENT>
      <PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientBNumber1235</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientBLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientBFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1955-10-11</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX></PATIENT-DTL>
        <SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientBServiceDate2020-12-08</SERVICE-DATE><SERVICE-CODE>PatientBServiceCodeABC2</SERVICE-CODE><SERVICE-DESCRIPTION>PatientBServiceDescription-Checkup</SERVICE-DESCRIPTION><SERVICE-AMT>PatientBServiceAmount10.50</SERVICE-AMT></SERVICE-DTL1>
      </PATIENT>
      <PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientCNumber1236</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientCLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientCFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1965-02-07</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX></PATIENT-DTL>
        <SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientCServiceDate2020-11-11</SERVICE-DATE><SERVICE-CODE>PatientCServiceCodeABC3</SERVICE-CODE><SERVICE-DESCRIPTION>PatientCServiceDescription-X-Ray</SERVICE-DESCRIPTION><SERVICE-AMT>PatientCServiceAmount18.00</SERVICE-AMT></SERVICE-DTL1>
      </PATIENT>
      <PATIENT><PATIENT-DTL><PATIENT-HEALTH-NUMBER>PatientDNumber1237</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientDLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientDFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1975-07-09</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX></PATIENT-DTL>
        <SERVICE-DTL1><SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientDServiceDate2020-01-10</SERVICE-DATE><SERVICE-CODE>PatientDServiceCodeABC4</SERVICE-CODE><SERVICE-DESCRIPTION>PatientDServiceDescription-Nose Cleaning</SERVICE-DESCRIPTION><SERVICE-AMT>PatientDServiceAmount6.00</SERVICE-AMT></SERVICE-DTL1>
      </PATIENT>
    </PROVIDER>
  </GROUP>
</REPORT>

注意代码的层次结构.所有患者数据属于"到父 PROVIDER 节点.每个程序属于"给取代它的患者.XML 代码描述了这些 4个相互关联的表格.

Note the hierarchical structure of the code. All of the PATIENT data "belongs" to a parent PROVIDER node. Each PROCEDURE "belongs" to the PATIENT who supersedes it. The XML code describes these 4 interrelated tables.

我的数据库软件无法从 XML 文件导入这些相互关联的表 -- 我的数据库应用程序无法遵循代码的层次结构.

My database software can't import these interrelated tables from an XML file -- my database app can't follow the hierarchy of the code.

相反,如果数据被扁平化"并且相互关联的节点被解包",我可以导入数据.(并复制).这里是我希望我的扁平表格在导入时的外观.是的,它有很多冗余——这个新表的前 12 列对于每条记录都是相同的.(没关系,我可以稍后消除冗余;此时,我只想读入所有数据.)

Instead, I can import the data if it were "flattened", and the interrelated nodes were "unpacked" (and replicated). Here is how I would like my flattened table to look upon import. Yes, it has lots of redundancy -- the first 12 columns of this new table are identical for every record. (That's OK, I can eliminate the redundancies later; at this point, I just want to read in all the data.)

这是生成上图中扁平化表格的 XML 代码:

Here is the XML code that generated the flattened table in the image above:

<?xml version="1.0" encoding="ISO-8859-1" ?>

<REPORT>
    <ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
  <GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
<PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
<PATIENT-HEALTH-NUMBER>PatientANumber1234</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientALastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientAFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1941-02-11</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX>
        <SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientAServiceDate2020-09-07</SERVICE-DATE><SERVICE-CODE>PatientAServiceCodeABC1</SERVICE-CODE><SERVICE-DESCRIPTION>PatientAServiceDescription-Facelift</SERVICE-DESCRIPTION><SERVICE-AMT>PatientAServiceAmount8.90</SERVICE-AMT>
    </ImportedTable>

    <ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
  <GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
<PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
<PATIENT-HEALTH-NUMBER>PatientBNumber1235</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientBLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientBFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1955-10-11</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX>
        <SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientBServiceDate2020-12-08</SERVICE-DATE><SERVICE-CODE>PatientBServiceCodeABC2</SERVICE-CODE><SERVICE-DESCRIPTION>PatientBServiceDescription-Checkup</SERVICE-DESCRIPTION><SERVICE-AMT>PatientBServiceAmount10.50</SERVICE-AMT>
    </ImportedTable>

    <ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
<GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
    <PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
      <PATIENT-HEALTH-NUMBER>PatientCNumber1236</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientCLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientCFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1965-02-07</PATIENT-BIRTHDATE><PATIENT-SEX>F</PATIENT-SEX>
        <SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientCServiceDate2020-11-11</SERVICE-DATE><SERVICE-CODE>PatientCServiceCodeABC3</SERVICE-CODE><SERVICE-DESCRIPTION>PatientCServiceDescription-X-Ray</SERVICE-DESCRIPTION><SERVICE-AMT>PatientCServiceAmount18.00</SERVICE-AMT>
    </ImportedTable>

    <ImportedTable>
<REPORT-ID>PCRP60R1-C</REPORT-ID><REPORT-DATE>2020-10-01</REPORT-DATE><REPORT-NAME>OUTSIDE USE REPORT (PATIENTS WITH SIGNED CONSENT)</REPORT-NAME><REPORT-PERIOD-START>2020-09-01</REPORT-PERIOD-START><REPORT-PERIOD-END>2020-09-30</REPORT-PERIOD-END>
<GROUP-ID>DoctorAGroup1234</GROUP-ID><GROUP-TYPE>HOSP</GROUP-TYPE><GROUP-NAME>COUNTY HOSP</GROUP-NAME>
    <PROVIDER-NUMBER>DoctorAID1234</PROVIDER-NUMBER><PROVIDER-LAST-NAME>DoctorALastname</PROVIDER-LAST-NAME><PROVIDER-FIRST-NAME>DoctorAFirstname</PROVIDER-FIRST-NAME><PROVIDER-MIDDLE-NAME>DoctorAMiddleName</PROVIDER-MIDDLE-NAME>
      <PATIENT-HEALTH-NUMBER>PatientDNumber1237</PATIENT-HEALTH-NUMBER><PATIENT-LAST-NAME>PatientDLastname</PATIENT-LAST-NAME><PATIENT-FIRST-NAME>PatientDFirstname</PATIENT-FIRST-NAME><PATIENT-BIRTHDATE>1975-07-09</PATIENT-BIRTHDATE><PATIENT-SEX>M</PATIENT-SEX>
     <SERVICE-LOC> </SERVICE-LOC><SERVICE-DATE>PatientDServiceDate2020-01-10</SERVICE-DATE><SERVICE-CODE>PatientDServiceCodeABC4</SERVICE-CODE><SERVICE-DESCRIPTION>PatientDServiceDescription-Nose Cleaning</SERVICE-DESCRIPTION><SERVICE-AMT>PatientDServiceAmount6.00</SERVICE-AMT>
    </ImportedTable>
</REPORT>

所以,我的问题是 XSLT 是否可以从顶部 XML 文件转换为底部 XML 文件.请注意,在转换后的 XML 文件中,我只对保留包含文本的节点感兴趣.(在此转换中可以安全地忽略原始文件中的任何非文本节点.)是否有可以进行此转换的代码?(注意:我一直在阅读一些处理 XML 转换的线程,但由于此数据集的关系结构,这种情况并不常见.如果此问题在其他地方得到了回答,请告诉我!)

So, my question is whether XSLT can transform from the top XML file into the bottom one. Note that, in the converted XML file, I am only interested in retaining nodes containing text. (Any non-text nodes from the original file can safely be ignored in this transformation.) Is there code that could make this transformation? (Note: I have been reading a number of threads dealing with XML conversions but this situation is unusual because of the relational structure of this data set. If this question has been answered elsewhere, please let me know!)

非常感谢,

罗恩

推荐答案

扁平化分层 XML 是一项常见任务.您只需为最原子表中的每条记录创建一个元素,并包含其父表中的数据.碰巧的是,示例 XML 输入的结构使这变得非常简单:

Flattening a hierarchical XML is a common task. You simply create an element for each record in the most atomic table and include the data from its parent tables. As it happens, the structure of your example XML input makes this very easy:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/REPORT">
    <xsl:copy>
        <xsl:variable name="report-data" select="REPORT-DTL/*" />
        <xsl:for-each select="GROUP">
            <xsl:variable name="group-data" select="GROUP-DTL/*" />
            <xsl:for-each select="PROVIDER">
                <xsl:variable name="provider-data" select="PROVIDER-DTL/*" />
                <xsl:for-each select="PATIENT">
                    <ImportedTable>
                        <xsl:copy-of select="$report-data"/>
                        <xsl:copy-of select="$group-data"/>
                        <xsl:copy-of select="$provider-data"/>
                        <xsl:copy-of select="PATIENT-DTL/*"/>
                        <xsl:copy-of select="SERVICE-DTL1/*"/>
                    </ImportedTable>
                </xsl:for-each>
            </xsl:for-each>
        </xsl:for-each>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

这篇关于Converting Hierarchical DOM-Tree XML Files:XSLT 可以转换为“扁平化"的 XML 文件吗?XML 文件不会丢失数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆