python-docx:读取表的特定单元格中的下拉列表 [英] python-docx: read dropdown lists in a specific cell of a table

查看:95
本文介绍了python-docx:读取表的特定单元格中的下拉列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个表,它们在 .docx 文件中包含下拉列表,我想在某些特定的单元格中获取这些列表的值.感谢此线程: python-docx从dropdownlist获取信息(在表中),我可以使用以下代码获取文档的所有下拉列表的值:

I have several tables that contain dropdown lists in a .docx file and I want to get the value of those lists in some specific cells. Thanks to this thread: python-docx get info from dropdownlist (in table), I was able to get the values of all dropdown lists of the document using this code:

from zipfile import ZipFile
from bs4 import BeautifulSoup

file_name = 'document.docx'

# open docx file as a zip file and store its relevant xml data
zip_file = ZipFile(file_name)
xml_data = zip_file.read('word/document.xml')
zip_file.close()

# parse the xml data with BeautifulSoup
soup = BeautifulSoup(xml_data, 'xml')

# look for all values of dropdown lists in the data and store them
list_of_value = []
dd_lists_content = soup.find_all('sdtContent')
for i in dd_lists_content:
    list_of_value.append(i.find('t').string)

现在,我不想只包含所有值的列表,而只想获取特定单元格中包含的某些下拉列表的值.由于我是 xml 的初学者,所以我真的不知道如何处理此问题.有什么办法可以使用python-docx做到这一点?

Now, instead of having a list of all values, I want to get only the values of some dropdown lists contained in specific cells. Since I am beginner with xml, I don't really know how to handle this problem. Is there any way to do this using python-docx?

这是我从一个文档中获得的 xml ,该文档包含一个带有两个单元格的表(一行,两列).第二个单元格(第二列)中有一个下拉列表.

Here is the xml I got from a document that contains one table with two cells (one row, two columns). There is a dropdown list in the second cell (second columns).

<w:body>
    <w:tbl>
        <w:tblPr>
            <w:tblStyle w:val="TableGrid"/>
            <w:tblW w:w="0" w:type="auto"/>
            <w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/>
        </w:tblPr>
        <w:tblGrid>
            <w:gridCol w:w="4530"/>
            <w:gridCol w:w="4531"/>
        </w:tblGrid>
        <w:tr w:rsidR="00D87065" w14:paraId="72A579CB" w14:textId="77777777" w:rsidTr="008A68C3">
            <w:tc>
                <w:tcPr>
                    <w:tcW w:w="4530" w:type="dxa"/>
                </w:tcPr>
                <w:p w14:paraId="6DE8A678" w14:textId="28D4E672" w:rsidR="00D87065" w:rsidRDefault="00D87065" w:rsidP="00D87065">
                    <w:r>
                        <w:t>Normal cell</w:t>
                    </w:r>
                </w:p>
            </w:tc>
            <w:sdt>
                <w:sdtPr>
                    <w:id w:val="834274196"/>
                    <w:placeholder>
                        <w:docPart w:val="38439BE74EB3458EB38183CFE71463D5"/>
                    </w:placeholder>
                    <w:dropDownList>
                        <w:listItem w:displayText="A value in a dropdown list" w:value="A value in a dropdown list"/>
                        <w:listItem w:displayText="Another value in a dropdown list" w:value="Another value in a dropdown list"/>
                    </w:dropDownList>
                </w:sdtPr>
                <w:sdtContent>
                    <w:tc>
                        <w:tcPr>
                            <w:tcW w:w="4531" w:type="dxa"/>
                        </w:tcPr>
                        <w:p w14:paraId="11472D67" w14:textId="43D0742A" w:rsidR="00D87065" w:rsidRDefault="00D87065" w:rsidP="00D87065">
                            <w:r>
                                <w:t>A value in a dropdown list</w:t>
                            </w:r>
                        </w:p>
                    </w:tc>
                </w:sdtContent>
            </w:sdt>
        </w:tr>
    </w:tbl>
    <w:p w14:paraId="336AB02A" w14:textId="276C1AFE" w:rsidR="0047588A" w:rsidRPr="0047588A" w:rsidRDefault="0047588A" w:rsidP="00D87065">
        <w:pPr>
            <w:pStyle w:val="Heading3"/>
        </w:pPr>
    </w:p>
    <w:sectPr w:rsidR="0047588A" w:rsidRPr="0047588A" w:rsidSect="00341D09">
        <w:pgSz w:w="11907" w:h="16840" w:code="9"/>
        <w:pgMar w:top="1418" w:right="1418" w:bottom="1418" w:left="1418" w:header="720" w:footer="720" w:gutter="0"/>
        <w:cols w:space="720"/>
        <w:docGrid w:linePitch="360"/>
    </w:sectPr>
</w:body>

在这种情况下,我希望能够在第二个单元格中搜索下拉列表的值,即 document.tables [0] .cells(0,1)并获取下拉列表的值" 作为输出.该信息包装在xml元素< w:t>下拉列表中的值</w:t> .

In this case, I want to be able to search for the value of the dropdown list in the second cell, i.e. document.tables[0].cells(0,1), and get 'A value of a dropdown list' as output. This information is wrapped in the xml element <w:t>A value in a dropdown list</w:t>.

推荐答案

我正在使用

i'm using the parsel library, so I can use xpath - easier to get to the value :

from parsel import Selector
sel = Selector(data,'xml')

#register namespace
sel.register_namespace("w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main" )

#path to dropdownlost value : 
path = "//w:sdtContent//w:t/text()"
outcome = sel.xpath(path).getall()

print(outcome)
['A value in a dropdown list']

请注意, parsel 是基于 lxml 构建的,使用IMO更容易.另外,请看一下xpath,因为如果您要使用xml的话它可能会很有利

note that parsel is built on lxml, just easier to use IMO. Also, have a look at xpath as it can be advantageous if u r going to be working with xml

这篇关于python-docx:读取表的特定单元格中的下拉列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆