Python将Excel 2004 xml转换为csv(或excel) [英] Python convert Excel 2004 xml to csv(or excel)

查看:428
本文介绍了Python将Excel 2004 xml转换为csv(或excel)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将excel xlm文件转换为csv(或excel)格式,任何人都可以建议如何做的python?
我试图在这个网络中搜索答案,但是所有答案都使用xlst,但我是一个新鲜的人python,你能详细展示吗?



谢谢。
这是由textpad工具打开的文件:

 <?xml version =1.0encoding =UTF -8?> 
<?mso-application progid =Excel.Sheet?>
< Workbook xmlns =urn:schemas-microsoft-com:office:spreadsheet
xmlns:o =urn:schemas-microsoft-com:office:office
xmlns:x =urn:schemas-microsoft-com:office:excel
xmlns:ss =urn:schemas-microsoft-com:office:spreadsheet
xmlns:html =http: w3.org/TR/REC-html40\">
< OfficeDocumentSettings xmlns =urn:schemas-microsoft-com:office:office>
< DownloadComponents />
< LocationOfComponents HRef =file:/// \/>
< / OfficeDocumentSettings>
< ExcelWorkbook xmlns =urn:schemas-microsoft-com:office:excel>
< WindowHeight> 12525< / WindowsHeight>
< WindowWidth> 15195< / WindowWidth>
< WindowTopX> 480< / WindowTopX>
< WindowTopY> 120< / WindowTopY>
< ActiveSheet> 0< / ActiveSheet>
< ProtectStructure> False< / ProtectStructure>
< ProtectWindows> False< / ProtectWindows>
< / ExcelWorkbook>
< Stylees>
< Style ss:ID =Defaultss:Name =Normal>
< Alignment ss:Vertical =Bottom/>
< Borders />
< Font />
< Interior />
< NumberFormat />
< Protection />
< / Style>
< Style ss:ID =bold>
< Font ss:Bold =1/>
< / Style>
< Style ss:ID =percent>
< NumberFormat ss:Format =Percent/>
< / Style>
< Style ss:ID =currency>
< NumberFormat ss:Format =Currency/>
< / Style>

< Style ss:ID =header>
< Font ss:Color =#000000ss:Bold =1/>
< Alignment ss:WrapText =1ss:Horizo​​ntal =Centerss:Vertical =Center/>

< / Style>
< / Styles>
< Worksheet ss:Name =SABR Data>< Table ss:ExpandedColumnCount =33ss:ExpandedRowCount =1265x:FullColumns =1x:FullRows =1
< Column ss:Index =1ss:Width =150/>
< Column ss:Index =2ss:Width =150/>
< Column ss:Index =3ss:Width =150/>
< Column ss:Index =4ss:Width =150/>
< Column ss:Index =5ss:Width =150/>
< Column ss:Index =6ss:Width =150/>
< Column ss:Index =7ss:Width =150/>
< Column ss:Index =8ss:Width =150/>
< Column ss:Index =9ss:Width =150/>
< Column ss:Index =10ss:Width =150/>
< Column ss:Index =11ss:Width =150/>
< Column ss:Index =12ss:Width =150/>
< Column ss:Index =13ss:Width =150/>
< Column ss:Index =14ss:Width =150/>
< column ss:Index =15ss:Width =150/>
< Column ss:Index =16ss:Width =150/>
< Column ss:Index =17ss:Width =150/>
< Column ss:Index =18ss:Width =150/>
< Column ss:Index =19ss:Width =150/>
< Column ss:Index =20ss:Width =150/>
< Column ss:Index =21ss:Width =150/>
< Column ss:Index =22ss:Width =150/>
< Column ss:Index =23ss:Width =150/>
< Column ss:Index =24ss:Width =150/>
< Column ss:Index =25ss:Width =150/>
< Column ss:Index =26ss:Width =150/>
< Column ss:Index =27ss:Width =150/>
< Column ss:Index =28ss:Width =150/>
< Column ss:Index =29ss:Width =150/>
< Column ss:Index =30ss:Width =150/>
< Column ss:Index =31ss:Width =150/>
< Column ss:Index =32ss:Width =150/>
< Column ss:Index =33ss:Width =150/>
< Row>
< Cell ss:StyleID =header>< Data ss:Type =String> Site< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String> Segment< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String>国家< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String> Day< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String> Week< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String> Month< / Data&
< Cell ss:StyleID =header>< Data ss:Type =String>季度< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String>财务区< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String>销售电话 - 总< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String>预订订单< / Data&
< Cell ss:StyleID =header>< Data ss:Type =String> Close Rate< / Data&
< Cell ss:StyleID =header>< Data ss:Type =String> AOV< / Data&
< Cell ss:StyleID =header>< Data ss:Type =String>每次呼叫收入< / Data&
< Cell ss:StyleID =header>< Data ss:Type =String>价格匹配顺序%< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String> Deal Closer Order%< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String>价格匹配USD%< / Data>< / Cell&
< / Cell>< / Cell>< / Data>
< Cell ss:StyleID =header>< Data ss:Type =String> BTB True Attach CPU USD< / Data&
< / Cell>< / Cell>< / Data>
< Cell ss:StyleID =header>< Data ss:Type =String>销售呼叫 - 队列< / Data&
< / Cell>< / Cell>< / Data>
< Cell ss:StyleID =header>< Data ss:Type =String> BTB True Attach Watch USD< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String> CPU CTO%< / Data&
< Cell ss:StyleID =header>< Data ss:Type =String>个性化iPad%< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String>销售呼叫 - 零售< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String> Close Rate CPU Hero Orders< / Data&
< Cell ss:StyleID =header>< Data ss:Type =String>关闭iPad Hero Orders< / Data&
< Cell ss:StyleID =header>< Data ss:Type =String>关闭iPhone Hero Orders< / Data&
< Cell ss:StyleID =header>< Data ss:Type =String> Close Rate Watch Hero Orders< / Data&
< Cell ss:StyleID =header>< Data ss:Type =String> BTB True Attach USD< / Data>< / Cell&
< Cell ss:StyleID =header>< Data ss:Type =String>预订的Hero Orders< / Data&
< / Row>


< / Table>
< WorksheetOptions xmlns =urn:schemas-microsoft-com:office:excel>
< ProtectObjects> False< / ProtectObjects>
< ProtectScenarios> False< / ProtectScenarios>
< / WorksheetOptions>
< / Worksheet>
< Worksheet ss:Name =SABR Settings>< Table ss:ExpandedColumnCount =5ss:ExpandedRowCount =21x:FullColumns =1x:FullRows =1
< Column ss:Index =1ss:Width =150/>
< Column ss:Index =2ss:Width =150/>
< Column ss:Index =3ss:Width =150/>
< Column ss:Index =4ss:Width =150/>
< Column ss:Index =5ss:Width =150/>
< Row>
< Cell ss:StyleID =header>< Data ss:Type =String>报告URL< / Data&
< / Row>

< / Table>
< WorksheetOptions xmlns =urn:schemas-microsoft-com:office:excel>
< ProtectObjects> False< / ProtectObjects>
< ProtectScenarios> False< / ProtectScenarios>
< / WorksheetOptions>
< / Worksheet>
< / Workbook>


解决方案

对于初学者,Excel 2014 XML格式, XML 。为了解析这一点,您需要详细了解如何处理XML Python



要写入CSV文件,您可以使用 Python的CSV模组



我不会为您整个程式 - StackOverflow不是那种网站,而是让你开始:


  1. 使用以下命令将XML数据加载到Python: p>

      import xml.etree.ElementTree as ET 
    tree = ET.parse('SABR_Download.xls')#下载的文件名称
    root = tree.getroot()


  2. SABR数据。还有一个名为SABR设置,我不打扰。注意,您需要在查找任何节点时添加命名空间前缀。您可以更轻松地使用字典,查看此示例。首先,让我们得到我们想要的工作表的节点:

     对于root.iter中的节点('{urn:schemas- microsoft-com:office:spreadsheet}工作表'):
    if node.attrib ['{urn:schemas-microsoft-com:office:spreadsheet} Name'] =='SABR Data':
    ws_node = node
    break#我们找到工作表节点,突破for循环

    请使用XPATH更轻松地找到它: findall


  3. 列的标题显然位于XML节点 ; Cell ss:StyleID =header> ...< / Cell> 。你可以通过类似于用来找到工作表的 .iter 来获得每一个,然后在正确的节点上嵌套循环。提示:查看标记名称并使用节点的 .text 属性来获取< Data>


  4. 然后重复此步骤。


此外,如果你是Python的新手,我强烈推荐你阅读几个教程,然后尝试这个复杂的东西。


I am trying to convert a excel xlm file into csv (or excel) format, could anyone suggest how to do it by python? I tried to search in this web for answer, but all answer using xlst, but I am a fresh man to python, could you please demonstrate it in detail?

Thank you. Here is the file opened by textpad tools:

<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
    <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
        <DownloadComponents/>
        <LocationOfComponents HRef="file:///\"/>
    </OfficeDocumentSettings>
    <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
        <WindowHeight>12525</WindowHeight>
        <WindowWidth>15195</WindowWidth>
        <WindowTopX>480</WindowTopX>
        <WindowTopY>120</WindowTopY>
        <ActiveSheet>0</ActiveSheet>
        <ProtectStructure>False</ProtectStructure>
        <ProtectWindows>False</ProtectWindows>
    </ExcelWorkbook>
    <Styles>
        <Style ss:ID="Default" ss:Name="Normal">
            <Alignment ss:Vertical="Bottom"/>
            <Borders/>
            <Font/>
            <Interior/>
            <NumberFormat/>
            <Protection/>
        </Style>
        <Style ss:ID="bold">
            <Font ss:Bold="1" />
        </Style>
        <Style ss:ID="percent">
            <NumberFormat ss:Format="Percent" />
        </Style>
        <Style ss:ID="currency">
            <NumberFormat ss:Format="Currency" />
        </Style>

        <Style ss:ID="header">
            <Font ss:Color="#000000" ss:Bold="1" />
<Alignment ss:WrapText="1" ss:Horizontal="Center" ss:Vertical="Center" />

        </Style>
    </Styles>
<Worksheet ss:Name="SABR Data"><Table ss:ExpandedColumnCount="33" ss:ExpandedRowCount="1265" x:FullColumns="1" x:FullRows="1">
<Column ss:Index="1" ss:Width="150" />
<Column ss:Index="2" ss:Width="150" />
<Column ss:Index="3" ss:Width="150" />
<Column ss:Index="4" ss:Width="150" />
<Column ss:Index="5" ss:Width="150" />
<Column ss:Index="6" ss:Width="150" />
<Column ss:Index="7" ss:Width="150" />
<Column ss:Index="8" ss:Width="150" />
<Column ss:Index="9" ss:Width="150" />
<Column ss:Index="10" ss:Width="150" />
<Column ss:Index="11" ss:Width="150" />
<Column ss:Index="12" ss:Width="150" />
<Column ss:Index="13" ss:Width="150" />
<Column ss:Index="14" ss:Width="150" />
<Column ss:Index="15" ss:Width="150" />
<Column ss:Index="16" ss:Width="150" />
<Column ss:Index="17" ss:Width="150" />
<Column ss:Index="18" ss:Width="150" />
<Column ss:Index="19" ss:Width="150" />
<Column ss:Index="20" ss:Width="150" />
<Column ss:Index="21" ss:Width="150" />
<Column ss:Index="22" ss:Width="150" />
<Column ss:Index="23" ss:Width="150" />
<Column ss:Index="24" ss:Width="150" />
<Column ss:Index="25" ss:Width="150" />
<Column ss:Index="26" ss:Width="150" />
<Column ss:Index="27" ss:Width="150" />
<Column ss:Index="28" ss:Width="150" />
<Column ss:Index="29" ss:Width="150" />
<Column ss:Index="30" ss:Width="150" />
<Column ss:Index="31" ss:Width="150" />
<Column ss:Index="32" ss:Width="150" />
<Column ss:Index="33" ss:Width="150" />
<Row>
<Cell ss:StyleID="header"><Data ss:Type="String">Site</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Segment</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Country</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Day</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Week</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Month</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Quarter</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Finance Region</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Booked Order USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Sales Calls - Total</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Booked Orders</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Close Rate</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">AOV</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Revenue Per Call</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Price Match Order %</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Deal Closer Order %</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Price Match USD %</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">CPU Hero USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">BTB True Attach CPU USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">iPad Hero USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Sales Calls - Queue</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Watch Hero USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">BTB True Attach Watch USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">CPU CTO %</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Personalized iPad %</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Sales Calls - RETAIL</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Close Rate CPU Hero Orders</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Close Rate iPad Hero Orders</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Close Rate iPhone Hero Orders</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Close Rate Watch Hero Orders</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">BTB True Attach USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Booked Hero Orders</Data></Cell>
</Row>


</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
        <ProtectObjects>False</ProtectObjects>
        <ProtectScenarios>False</ProtectScenarios>
        </WorksheetOptions>
        </Worksheet>
<Worksheet ss:Name="SABR Settings"><Table ss:ExpandedColumnCount="5" ss:ExpandedRowCount="21" x:FullColumns="1" x:FullRows="1">
<Column ss:Index="1" ss:Width="150" />
<Column ss:Index="2" ss:Width="150" />
<Column ss:Index="3" ss:Width="150" />
<Column ss:Index="4" ss:Width="150" />
<Column ss:Index="5" ss:Width="150" />
<Row>
<Cell ss:StyleID="header"><Data ss:Type="String">Report URL</Data></Cell>
</Row>

</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
        <ProtectObjects>False</ProtectObjects>
        <ProtectScenarios>False</ProtectScenarios>
        </WorksheetOptions>
        </Worksheet>
</Workbook>

解决方案

For starters, the Excel 2014 XML format is, well, XML. In order to parse that, you need to read up on how to process XML with Python.

To write out to a CSV file, you can use Python's CSV module.

I'm not going to do the whole program for you - StackOverflow is not that kind of site - but to get you started:

  1. Load the XML data to Python using:

    import xml.etree.ElementTree as ET
    tree = ET.parse('SABR_Download.xls')  # the downloaded file's name
    root = tree.getroot()
    

  2. These are in the worksheet 'SABR Data'. There is another one called 'SABR Settings', which I'm not going to bother with. Note that you need to add the namespace prefix when looking for any node. You can do that more easily with a dictionary, see this example. So first, let's get the node for the worksheet we want:

    for node in root.iter('{urn:schemas-microsoft-com:office:spreadsheet}Worksheet'):
        if node.attrib['{urn:schemas-microsoft-com:office:spreadsheet}Name'] == 'SABR Data':
            ws_node = node
            break  # we found the worksheet node, break out of for loop
    

    You can use XPATH to find it more easily with findall.

  3. The headers of the columns are clearly on the XML nodes <Cell ss:StyleID="header">...</Cell>. You can get each of these by doing something similar with the .iter as used to find the worksheet, and then nested loops on the correct nodes. TIP: look at the tag name and use the node's .text attribute to get the text in the <Data> elements.

  4. Then repeat similarly for the actual data rows.

  5. Then write out the headers and data to a CSV file.

Also, if you are this fresh to Python, I highly recommend you read a few tutorials before trying something this complex.

这篇关于Python将Excel 2004 xml转换为csv(或excel)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆