使用Elementree从另一个子元素读取 [英] Reading from another child element using Elementree

查看:69
本文介绍了使用Elementree从另一个子元素读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的代码遍历xml文件并将它们解析为单个csv文件

The code below goes through the xml files and parses them into a single csv file

from xml.etree import ElementTree as ET
from collections import defaultdict
import csv
from pathlib import Path

directory = 'path to a folder with xml files'

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)

    headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
    writer.writerow(headers)

    xml_files_list = list(map(str, Path(directory).glob('**/*.xml')))
    print(xml_files_list)
    for xml_file in xml_files_list:
        tree = ET.parse(xml_file)
        root = tree.getroot()

        start_nodes = root.findall('.//START')
        for sn in start_nodes:
            row = defaultdict(str)

            repeated_values = dict()
            for k,v in sn.attrib.items():
                repeated_values[k] = v

            for rn in sn.findall('.//Rational'):
                repeated_values['rational'] = rn.text

            for qu in sn.findall('.//Qualify'):
                repeated_values['qualify'] = qu.text

            for ds in sn.findall('.//Description'):
                repeated_values['description_txt'] = ds.text
                repeated_values['description_num'] = ds.attrib['num']

            for st in sn.findall('.//SetData'):
                for k,v in st.attrib.items():
                    row['set_data_'+ str(k)] = v
                for key in repeated_values.keys():
                    row[key] = repeated_values[key]
                row_data = [row[i] for i in headers]
                writer.writerow(row_data)
                row = defaultdict(str)

这是

<?xml version="1.0" encoding="utf-8"?>
<ProjectData>
 <Phones>
    <Date />
    <Prog />
    <Box />
    <Feature />
    <IN>MAFWDS</IN>
    <Set>234234</Set>
    <Pr>23423</Pr>
    <Number>afasfhrtv</Number>
    <Simple>dfasd</Simple>
    <Nr />
    <Get>6070106091</Get>
    <Reno>1233</Reno>
  </Phones>
<FINAL>
    <START id="B001" service_code="0x5196">
      <Docs Docs_type="START">
        <Rational>225196</Rational>
        <Qualify>6251960000A0DE</Qualify>
      </Docs>
      <Description num="1213f2312">The parameter</Description>
      <DataFile dg="12" dg_id="let">
        <SetData value="32" />
      </DataFile>
    </START>
    <START id="C003" service_code="0x517B">
      <Docs Docs_type="START">
        <Rational>23423</Rational>
        <Qualify>342342</Qualify>
      </Docs>
      <Description num="3423423f3423">The third</Description>
      <DataFile dg="55" dg_id="big">
        <SetData x="E1" value="21259" />
        <SetData x="E2" value="02" />
      </DataFile>
    </START>
    <START id="Z048" service_code="0x5198">
      <RawData rawdata_type="ASDS">
        <Rational>225198</Rational>
        <Qualify>343243324234234</Qualify>
      </RawData>
      <Description num="434234234">The forth</Description>
      <DataFile unit="21" unit_id="FEDS">
        <FileX unit="eg" discrete="false" axis_pts="19" name="Vsome" text_id="bx5" unit_id="GDFSD" />
        <SetData xin="5" xax="233" value="323" />
        <SetData xin="123" xax="77" value="555" />
        <SetData xin="17" xax="65" value="23" />
      </DataFile>
    </START>
</FINAL>
</ProjectData>

这是输出的样子

当前正在努力修改代码,因此转到电话( Projectdata的另一个子级)从 Set 中获取元素,并且 Get 将它们与 _ 附加在一起,并将其解析到具有标题名称*的第一列中*识别**
图片下面显示了外观。

Currently struggling to modify the code , so it goes to Phones (which is another child of Projectdata) takes elements from Set and Get attaches them together with _ and parses them into the first column that has the header names ** Identify** The picture bellow shows how It should look.

推荐答案

将您的标头行修改为

headers = ['identify', 'id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']



p_get = tree.find('.//Phones/Get').text
p_set = tree.find('.//Phones/Set').text

并将此信息添加到 row_data 之前这行 writer.writerow(row_data)

and add this info to the row_data just before the line writer.writerow(row_data)

像这样:

row_data.insert(0, p_get + '_' + p_set)

更新 >

row_data[0] = p_get + '_' + p_set

这篇关于使用Elementree从另一个子元素读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆