BeautifulSoup XML到CSV [英] BeautifulSoup XML to CSV
问题描述
下面的代码获取一个xml文件,并将其解析为csv文件.
The code below takes an xml files and parses it into csv file.
import openpyxl
from bs4 import BeautifulSoup
with open('1last.xml') as f_input:
soup = BeautifulSoup(f_input, 'lxml')
wb = openpyxl.Workbook()
ws = wb.active
ws.title = "Sheet1"
ws.append(["Description", "num", "text"])
for description in soup.find_all("description"):
ws.append(["", description['num'], description.text])
ws.append(["SetData", "x", "value", "xin", "xax"])
for setdata in soup.find_all("setdata"):
ws.append(["", setdata.get('x', ''), setdata.get('value', ''), setdata.get('xin', ''), setdata.get('xax', '')])
wb.save(filename="1last.csv")
这是输出
这是XML文件
<?xml version="1.0" encoding="utf-8"?>
<ProjectData>
<FINAL>
<START id="ID0001" service_code="0x5196">
<Docs Docs_type="START">
<Rational>225196</Rational>
<Qualify>6251960000A0DE</Qualify>
</Docs>
<Description num="1213f2312">The parameter</Description>
<DataFile dg="12" dg_id="let">
<SetData value="32" />
</DataFile>
</START>
<START id="DG0003" service_code="0x517B">
<Docs Docs_type="START">
<Rational>23423</Rational>
<Qualify>342342</Qualify>
</Docs>
<Description num="3423423f3423">The third</Description>
<DataFile dg="55" dg_id="big">
<SetData x="E1" value="21259" />
<SetData x="E2" value="02" />
</DataFile>
</START>
<START id="ID0048" service_code="0x5198">
<RawData rawdata_type="START">
<Rational>225198</Rational>
<Qualify>343243324234234</Qualify>
</RawData>
<Description num="434234234">The forth</Description>
<DataFile unit="21" unit_id="FEDS">
<Ycross unit="ce" points="21" name="Thefiles" text_id="54" unit_id="98"
<SetData xin="5" xax="233" value="323" />
<SetData xin="123" xax="77" value="555" />
<SetData xin="17" xax="65" value="23" />
</DataFile>
</START>
</FINAL>
</ProjectData>
最近,我一直在尝试修改代码,使其遍历 START 的所有子级并将其解析为列.如果一个子元素有更多行,则它将像上面的代码一样解析为新行.不幸的是,目前还没有成功,只是停留在这里
Recently I have been trying to modify the code so it goes through all the children of START and parse them into columns. If one child element has more rows, it will parse into a new line just as what the code above does. Unfortunately, not successful and just stuck at this moment
此图显示了输出的外观.
This picture shows on how the output should look like.
推荐答案
您可以尝试类似的方法.
You can try something like this.
我只为几个标签编写了代码.您可以轻松地类似地填充其余必需的标签.希望对您有帮助!
I have written the code for only a few of the tags. You can easily fill up the rest of the required tags similarly. Hope it helps!
已编辑以添加设置的数据标签值.
Edited to add the set data tag values.
from xml.etree import ElementTree as ET
from collections import defaultdict
import csv
tree = ET.parse(StringIO(data))
root = tree.getroot()
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
start_nodes = root.findall('.//START')
headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
writer.writerow(headers)
for sn in start_nodes:
row = defaultdict(str)
for k,v in sn.attrib.items():
row[k] = v
for rn in sn.findall('.//Rational'):
row['rational'] = rn.text
for qu in sn.findall('.//Qualify'):
row['qualify'] = qu.text
for ds in sn.findall('.//Description'):
row['description_txt'] = ds.text
row['description_num'] = ds.attrib['num']
# all other tags except set data must be parsed before this.
for st in sn.findall('.//SetData'):
for k,v in st.attrib.items():
row['set_data_'+ str(k)] = v
row_data = [row[i] for i in headers]
writer.writerow(row_data)
row = defaultdict(str)
更新
添加
for st in sn.findall('.//DataFile'):
for k,v in st.attrib.items():
row['datafile_'+ str(k)] = v
for st in sn.findall('.//Ycross'):
for k,v in st.attrib.items():
row['ycross_'+ str(k)] = v
以及 headers
列表中的相应值
这篇关于BeautifulSoup XML到CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!