Python从xml中提取数据并保存到excel [英] Python extract data from xml and save it to excel

查看:64
本文介绍了Python从xml中提取数据并保存到excel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 XML 文件中提取一些数据并将其保存为表格格式,例如 XLS 或 DBF.

这是我的 XML 文件:

我对位于 DATAAREA/LandIndex/LandIndex/下的 agreementdetail 标签内的信息感兴趣

更新:

多亏了 MattDMo,这个任务已经从死点移动了一点.所以我在下面制作了这个脚本.它迭代文件并获取 agreementdetail 的所有实例,并为每个输出 agreementidagreementtype.

导入 xml.etree.ElementTree 作为 ET进口arcpyxmlfile = 'D:/Working/Test/Test.xml'element_tree = ET.parse(xmlfile)root = element_tree.getroot()协议 = root.findall(".//协议细节")结果 = []元素 = ('agreementid', 'agreementtype')同意:对象 = {}对于元素中的 e:obj[e] = a.find(e).text结果.附加(对象)arcpy.AddMessage(结果)

我收到的输出包含一堆这样的字符串:{'agreementid': '001 4860', 'agreementtype': 'NATURAL GAS'}

现在我需要将此输出转换为表格格式(.csv、.dbf、.xls 等),以便协议 ID 和协议类型为列:

agreementid |协议类型001 4860 |天然气

如果您能指导我如何完成它,我将不胜感激.或者任何例子?

附言Python 版本是 2.7

解决方案

以下应该有效:

导入 xml.etree.ElementTree 作为 ET进口arcpyxmlfile = 'D:/Working/Test/Test.xml'element_tree = ET.parse(xmlfile)root = element_tree.getroot()协议 = root.find(".//agreementid").textarcpy.AddMessage(协议)

root.find() 调用使用 XPath表达式(快速备忘单在 Python 文档这里) 以查找名为 agreementid 的当前级别下任何级别的第一个标签.如果您的文件中有多个标签,您可以使用 root.findall() 并迭代结果.例如,如果有三个名为 agreementid 的字段,并且您知道需要第二个,则 root.findall(".//agreementid")[1]应该工作.

I would like to extract some data from an XML file and save it in a table format, such as XLS or DBF.

Here is XML file i have:

<?xml version="1.0" encoding="utf-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
  <SOAP-ENV:Header />
  <SOAP-ENV:Body>
    <ADD_LandIndex_001>
      <CNTROLAREA>
        <BSR>
          <VERB>ADD</VERB>
          <NOUN>LandIndex</NOUN>
          <REVISION>001</REVISION>
        </BSR>
      </CNTROLAREA>
      <DATAAREA>
        <LandIndex>
          <reportId>AMI100031</reportId>
          <requestKey>R3278458</requestKey>
          <SubmittedBy>EN4871</SubmittedBy>
          <submittedOn>2015/01/06 4:20:11 PM</submittedOn>
          <LandIndex>
            <agreementdetail>
              <agreementid>001       4860</agreementid>
              <agreementtype>NATURAL GAS</agreementtype>
              <currentstatus>
                <status>ACTIVE</status>
                <statuseffectivedate>1965/02/18</statuseffectivedate>
                <termdate>1965/02/18</termdate>
              </currentstatus>
              <designatedrepresentative>
              </designatedrepresentative>
            </agreementdetail>
          </LandIndex>
        </LandIndex>
      </DATAAREA>
    </ADD_LandIndex_001>
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

I am interested in information inside the agreementdetail tag which is under DATAAREA/LandIndex/LandIndex/

UPDATE:

Thanks to MattDMo this task has moved a bit from its dead point. So I made this script below. It iterates the file and gets all instances of the agreementdetail and outputs agreementid and agreementtype for each.

import xml.etree.ElementTree as ET
import arcpy

xmlfile = 'D:/Working/Test/Test.xml'
element_tree = ET.parse(xmlfile)
root = element_tree.getroot()
agreement = root.findall(".//agreementdetail")
result = []
elements = ('agreementid', 'agreementtype')

for a in agreement:
    obj = {}
    for e in elements:
        obj[e] = a.find(e).text
    result.append(obj)

arcpy.AddMessage(result)

The output I am receiving consists of a bunch of this strings: {'agreementid': '001 4860', 'agreementtype': 'NATURAL GAS'}

Now I need to convert this output into a table format (.csv, .dbf, .xls etc.) so that agreementid and agreementtype are columns:

agreementid    | agreementtype 
001       4860 | NATURAL GAS

I will be very grateful if you could guide me on how to accomplish it. Or maybe any example?

P.S. Python version is 2.7

解决方案

The following should work:

import xml.etree.ElementTree as ET
import arcpy

xmlfile = 'D:/Working/Test/Test.xml'
element_tree = ET.parse(xmlfile)
root = element_tree.getroot()
agreement = root.find(".//agreementid").text
arcpy.AddMessage(agreement)

The root.find() call uses an XPath expression (quick cheatsheet is in the Python docs here) to find the first tag at any level under the current level named agreementid. If there are multiple tags named that in your file, you can use root.findall() and iterate over the results. If, for example, there are three fields named agreementid, and you know you want the second one, then root.findall(".//agreementid")[1] should work.

这篇关于Python从xml中提取数据并保存到excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆