将元素重复到新行ElementTree [英] Repeating elements to new rows ElementTree

查看:82
本文介绍了将元素重复到新行ElementTree的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的代码获取XML文件的目录,并将其解析为CSV文件。仅此社区中的用户可以这样做。我学到了很多。

 从xml.etree中将ElementTree导入为ET 
,从集合中导入defaultdict
从pathlib导入csv
导入路径

目录='C:/ Users / docs / FolderwithXMLs'

带有open('output.csv','w' ,newline ='')作为f:
writer = csv.writer(f)

headers = ['id','service_code','rational','qualify','description_num ','description_txt','set_data_xin','set_data_xax','set_data_value','set_data_x']

writer.writerow(headers)

xml_files_list = list(map( str,Path(directory).glob('** / *。xml')))
用于xml_files_list中的xml_file:
树= ET.parse(xml_file)
根= tree.getroot ()

start_nodes = root.findall('.// START')
for start_nodes中的sn:
行= defaultdict(str)


对于sn.attrib.items()中的k,v:
row [k] = v

对于sn.findall(' .//Rational'):
row ['rational'] = rn.text

for sn in sn.findall('.// Qualify'):
row [ 'qualify'] = sn.findall('.// Description')中ds的qu.text


row ['description_txt'] = ds.text
行['description_num'] = ds.attrib ['num']


for st in sn.findall('.// SetData'):
for k,v in st.attrib.items():
row ['set_data _'+ str(k)] = v
row_data = [row [i] for in headers]
writer.writerow(row_data )
row = defaultdict(str)

xml文件具有另一种格式喜欢这个

 <?xml version = 1.0 encoding = utf-8吗? > 
< ProjectData>
< FINAL>
< START id = ID0001 service_code = 0x5196>
< Docs Docs_type = START>
< Rational> 225196< / Rational>
< Qualify> 6251960000A0DE< / Qualify>
< / Docs>
< Description num = 1213f2312>参数< / Description>
< DataFile dg = 12 dg_id = let>
< SetData value = 32 />
< / DataFile>
< / START>
< START id = DG0003 service_code = 0x517B>
< Docs Docs_type = START>
< Rational> 23423< / Rational>
< Qualify> 342342< / Qualify>
< / Docs>
< Description num = 3423423f3423>第三个< / Description>
< DataFile dg = 55 dg_id = big>
< SetData x = E1 value = 21259 />
< SetData x = E2 value = 02 />
< / DataFile>
< / START>
< START id = ID0048 service_code = 0x5198>
< RawData rawdata_type = ASDS>
< Rational> 225198< / Rational>
< Qualify> 343243324234234< / Qualify>
< / RawData>
< Description num = 434234234>第四< / Description>
< DataFile unit = 21 unit_id = FEDS>
< FileX unit = eg离散= false axis_pts = 19 name = Vsome text_id = bx5 unit_id = GDFSD />
< SetData xin = 5 xax = 233 value = 323 />
< SetData xin = 123 xax = 77 value = 555 />
< SetData xin = 17 xax = 65 value = 23 />
< / DataFile>
< / START>
< / FINAL>
< / ProjectData>

结果如下图所示。



最近,我一直在尝试修改代码,以使结果看起来类似于图片吼叫。
让我们以id = ID0048为例,该代码仅解析一次id,service_code,但是如果有多行SetData,它将创建一个新行,但不会重复id,service_code和其他代码。努力实现下面的图片



解决方案

使用Python的第三方模块






要遍历XML文件的文件夹,只需将上面的内容集成到一个循环中即可。这里将所有XML处理包装到一个方法中,以通过列表理解来构建结果列表,最后以迭代方式写入CSV。 注意:对于一组标头,仅将标头放在CSV中,然后如上所述将其从XSLT中删除。

 从pathlib导入lxml.etree等
导入路径

#加载XSL脚本
xsl = et.parse('Script.xsl')#加载XML文件一次(删除标题)

def proc_xml(xml_file):
xml = et.parse(xml_file)#加载XML文件
transform = et.XSLT(xsl)#初始化变压器
结果= transform(xml)#转换输入
返回str(结果)

xml_files_list = list(map(str,Path(directory).glob('** / *。xml ')))
结果= [xml_files_list中x的proc_xml(x)]

,其中open('Output.csv','w',newline ='')as f:
f.write('id,service_code,rational,qualify,description_num,description,'
'data_file_dg,data_file_dg_id,data_file_unit,data_file_unit_id,'
'set_data_x,set_data_xin,set_data_xat,set_data_valuen ')

#将XML保存为CSV
代表r:
f.write(r)


The code below takes a directory of XMLs files and parses them into a CSV fie. This was possible only for a user in this community. I have learned so much.

from xml.etree import ElementTree as ET
from collections import defaultdict
import csv
from pathlib import Path

directory = 'C:/Users/docs/FolderwithXMLs'

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)

    headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']

    writer.writerow(headers)

    xml_files_list = list(map(str,Path(directory).glob('**/*.xml')))
    for xml_file in xml_files_list:
        tree = ET.parse(xml_file)
        root = tree.getroot()

        start_nodes = root.findall('.//START')
        for sn in start_nodes:
            row = defaultdict(str)


            for k,v in sn.attrib.items():
                row[k] = v

            for rn in sn.findall('.//Rational'):
                row['rational'] = rn.text

            for qu in sn.findall('.//Qualify'):
                row['qualify'] = qu.text

            for ds in sn.findall('.//Description'):
                row['description_txt'] = ds.text
                row['description_num'] = ds.attrib['num']


            for st in sn.findall('.//SetData'):
                for k,v in st.attrib.items():
                    row['set_data_'+ str(k)] = v
                row_data = [row[i] for i in headers]
                writer.writerow(row_data)
                row = defaultdict(str)

The xml files on the other hand have a format likes this

<?xml version="1.0" encoding="utf-8"?>
<ProjectData>
<FINAL>
    <START id="ID0001" service_code="0x5196">
      <Docs Docs_type="START">
        <Rational>225196</Rational>
        <Qualify>6251960000A0DE</Qualify>
      </Docs>
      <Description num="1213f2312">The parameter</Description>
      <DataFile dg="12" dg_id="let">
        <SetData value="32" />
      </DataFile>
    </START>
    <START id="DG0003" service_code="0x517B">
      <Docs Docs_type="START">
        <Rational>23423</Rational>
        <Qualify>342342</Qualify>
      </Docs>
      <Description num="3423423f3423">The third</Description>
      <DataFile dg="55" dg_id="big">
        <SetData x="E1" value="21259" />
        <SetData x="E2" value="02" />
      </DataFile>
    </START>
    <START id="ID0048" service_code="0x5198">
      <RawData rawdata_type="ASDS">
        <Rational>225198</Rational>
        <Qualify>343243324234234</Qualify>
      </RawData>
      <Description num="434234234">The forth</Description>
      <DataFile unit="21" unit_id="FEDS">
        <FileX unit="eg" discrete="false" axis_pts="19" name="Vsome" text_id="bx5" unit_id="GDFSD" />
        <SetData xin="5" xax="233" value="323" />
        <SetData xin="123" xax="77" value="555" />
        <SetData xin="17" xax="65" value="23" />
      </DataFile>
    </START>
</FINAL>
</ProjectData>

The results look like the picture below.

Recently I have been trying to modify the code, so that the results look similar to the picture bellow. Let’s take id="ID0048", the code parses id, service_code only once but it if there are multiple lines of SetData, it will create a new line but it wont repeat the id, service_code and the others. Struggling to achieve something like the picture below

解决方案

Consider the special purpose language, XSLT, using Python's third-party module, lxml, to directly transform XML to CSV output. Specifically, have XSLT pull from the lower level, SetData and retrieve upper level information with ancestor.

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:variable name="delim">,</xsl:variable>
  <xsl:template match="/ProjectData">
      <!------------------------------- HEADERS ------------------------------->
      <xsl:text>id,service_code,rational,qualify,description_num,description,</xsl:text>
      <xsl:text>data_file_dg,data_file_dg_id,data_file_unit,data_file_unit_id,</xsl:text>
      <xsl:text>set_data_x,set_data_xin,set_data_xat,set_data_value&#xa;</xsl:text>
      <!-----------------------------------------------------------------------> 
      <xsl:apply-templates select="descendant::SetData"/>
  </xsl:template>

  <xsl:template match="SetData">
      <xsl:value-of select="concat(ancestor::START/@id, $delim,
                                   ancestor::START/@service_code, $delim,
                                   ancestor::START/*[1]/Rational, $delim,
                                   ancestor::START/*[1]/Qualify, $delim,
                                   ancestor::START/Description/@num, $delim,
                                   ancestor::START/Description, $delim,
                                   ancestor::START/DataFile/@dg, $delim,
                                   ancestor::START/DataFile/@dg_id, $delim,
                                   ancestor::START/DataFile/@unit, $delim,
                                   ancestor::START/DataFile/@unit_id, $delim,
                                   @x, $delim,
                                   @xin, $delim,
                                   @xat, $delim,
                                   @value)"/>
      <xsl:text>&#xa;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

Python (no for loops or if/else logic)

import lxml.etree as et

# LOAD XML AND XSL FILES
xml = et.parse('Input.xml')
xsl = et.parse('Script.xsl')

# INITIALIZE TRANSFORMER
transform = et.XSLT(xsl)

# TRANSFORM INPUT
result = transform(xml)

print(str(result))
# id,service_code,rational,qualify,description_num,description,data_file_dg,data_file_dg_id,data_file_unit,data_file_unit_id,set_data_x,set_data_xin,set_data_xat,set_data_value
# ID0001,0x5196,225196,6251960000A0DE,1213f2312,The parameter,12,let,,,,,,32
# DG0003,0x517B,23423,342342,3423423f3423,The third,55,big,,,E1,,,21259
# DG0003,0x517B,23423,342342,3423423f3423,The third,55,big,,,E2,,,02
# ID0048,0x5198,225198,343243324234234,434234234,The forth,,,21,FEDS,,5,,323
# ID0048,0x5198,225198,343243324234234,434234234,The forth,,,21,FEDS,,123,,555
# ID0048,0x5198,225198,343243324234234,434234234,The forth,,,21,FEDS,,17,,23

# SAVE XML TO CSV
with open('Output.csv', 'wb') as f:
    f.write(str(result))

Online Demo


To loop across a folder of XML files, simply integrate above in a loop. Here wraps all XML processing into a single method to build a list of results via list comprehension and finally written to CSV iteratively. NOTE: For one set of headers, place headers only in CSV and remove from XSLT as indicated above.

import lxml.etree as et
from pathlib import Path

# LOAD XSL SCRIPT
xsl = et.parse('Script.xsl')   # LOAD XML FILE ONCE (REMOVE HEADERS)

def proc_xml(xml_file):     
    xml = et.parse(xml_file)   # LOAD XML FILE  
    transform = et.XSLT(xsl)   # INITIALIZE TRANSFORMER
    result = transform(xml)    # TRANSFORM INPUT    
    return str(result)

xml_files_list = list(map(str,Path(directory).glob('**/*.xml')))
results = [proc_xml(x) for x in xml_files_list]

with open('Output.csv', 'w', newline='') as f:
    f.write('id,service_code,rational,qualify,description_num,description,'
            'data_file_dg,data_file_dg_id,data_file_unit,data_file_unit_id,'
            'set_data_x,set_data_xin,set_data_xat,set_data_value\n')

    # SAVE XML TO CSV
    for r in results:
        f.write(r)

这篇关于将元素重复到新行ElementTree的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆