BeautifulSoup XML到CSV [英] BeautifulSoup XML to CSV

查看:43
本文介绍了BeautifulSoup XML到CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的代码获取一个xml文件,并将其解析为csv文件.

The code below takes an xml files and parses it into csv file.

import openpyxl    
from bs4 import BeautifulSoup


with open('1last.xml') as f_input:
    soup = BeautifulSoup(f_input, 'lxml')

wb = openpyxl.Workbook()
ws = wb.active
ws.title = "Sheet1"

ws.append(["Description", "num", "text"])

for description in soup.find_all("description"):
    ws.append(["", description['num'], description.text])

ws.append(["SetData", "x", "value", "xin", "xax"])

for setdata in soup.find_all("setdata"):
    ws.append(["", setdata.get('x', ''), setdata.get('value', ''), setdata.get('xin', ''), setdata.get('xax', '')])

wb.save(filename="1last.csv")

这是输出

这是XML文件

<?xml version="1.0" encoding="utf-8"?>
<ProjectData>
<FINAL>
    <START id="ID0001" service_code="0x5196">
      <Docs Docs_type="START">
        <Rational>225196</Rational>
        <Qualify>6251960000A0DE</Qualify>
      </Docs>
      <Description num="1213f2312">The parameter</Description>
      <DataFile dg="12" dg_id="let">
        <SetData value="32" />
      </DataFile>
    </START>
    <START id="DG0003" service_code="0x517B">
      <Docs Docs_type="START">
        <Rational>23423</Rational>
        <Qualify>342342</Qualify>
      </Docs>
      <Description num="3423423f3423">The third</Description>
      <DataFile dg="55" dg_id="big">
        <SetData x="E1" value="21259" />
        <SetData x="E2" value="02" />
      </DataFile>
    </START>
    <START id="ID0048" service_code="0x5198">
      <RawData rawdata_type="START">
        <Rational>225198</Rational>
        <Qualify>343243324234234</Qualify>
      </RawData>
      <Description num="434234234">The forth</Description>
      <DataFile unit="21" unit_id="FEDS">
        <Ycross unit="ce" points="21" name="Thefiles" text_id="54" unit_id="98" 
        <SetData xin="5" xax="233" value="323" />
        <SetData xin="123" xax="77" value="555" />
        <SetData xin="17" xax="65" value="23" />
      </DataFile>
    </START>
</FINAL>
</ProjectData>

最近,我一直在尝试修改代码,使其遍历 START 的所有子级并将其解析为列.如果一个子元素有更多行,则它将像上面的代码一样解析为新行.不幸的是,目前还没有成功,只是停留在这里

Recently I have been trying to modify the code so it goes through all the children of START and parse them into columns. If one child element has more rows, it will parse into a new line just as what the code above does. Unfortunately, not successful and just stuck at this moment

此图显示了输出的外观.

This picture shows on how the output should look like.

推荐答案

您可以尝试类似的方法.

You can try something like this.

我只为几个标签编写了代码.您可以轻松地类似地填充其余必需的标签.希望对您有帮助!

I have written the code for only a few of the tags. You can easily fill up the rest of the required tags similarly. Hope it helps!

已编辑以添加设置的数据标签值.

Edited to add the set data tag values.

from xml.etree import ElementTree as ET
from collections import defaultdict
import csv

tree = ET.parse(StringIO(data))
root = tree.getroot()

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)

    start_nodes = root.findall('.//START')
    headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
    writer.writerow(headers)
    for sn in start_nodes:
        row = defaultdict(str)

        for k,v in sn.attrib.items():
            row[k] = v

        for rn in sn.findall('.//Rational'):
            row['rational'] = rn.text

        for qu in sn.findall('.//Qualify'):
            row['qualify'] = qu.text

        for ds in sn.findall('.//Description'):
            row['description_txt'] = ds.text
            row['description_num'] = ds.attrib['num']

        # all other tags except set data must be parsed before this.
        for st in sn.findall('.//SetData'):
            for k,v in st.attrib.items():
                row['set_data_'+ str(k)] = v
            row_data = [row[i] for i in headers]
            writer.writerow(row_data)
            row = defaultdict(str)

更新

添加

        for st in sn.findall('.//DataFile'):
            for k,v in st.attrib.items():
                row['datafile_'+ str(k)] = v 

        for st in sn.findall('.//Ycross'):
            for k,v in st.attrib.items():
                row['ycross_'+ str(k)] = v 

以及 headers 列表中的相应值

这篇关于BeautifulSoup XML到CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆