如何使用python-docx将复选框表单插入.docx文件? [英] How can I insert a checkbox form into a .docx file using python-docx?

查看:44
本文介绍了如何使用python-docx将复选框表单插入.docx文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用python来实现自定义解析器,并使用该解析数据来格式化要在内部分发的word文档.到目前为止,所有格式都非常简单明了,但是我完全对如何在单个表格单元格中插入复选框感到困惑.

I've been using python to implement a custom parser and use that parsed data to format a word document to be distributed internally. All of the formatting has been straightforward and easy so far but I'm completely stumped on how to insert a checkbox into individual table cells.

我尝试在python-docx中使用python对象函数(使用 get_or_add_tcPr()等),这会导致MS Word在尝试打开文件时引发以下错误,无法打开文件xxxx,因为内容有问题详细信息:文件已损坏,无法打开."

I've tried using the python object functions within python-docx (using get_or_add_tcPr(), etc.) which causes MS Word to throw the following error when I try to open the file, "The file xxxx cannot be opened because there are problems with the contents Details: The file is corrupt and cannot be opened".

经过一段时间的努力,我转向了第二种方法,该方法涉及为输出文档操作word/document.xml文件.我已经检索到我认为是保存为 replacementXML 的复选框的正确xml,并将填充文本插入到单元格中,以用作可以被搜索和替换的标签, searchXML .以下内容似乎在Linux(Fedora 25)环境中使用python运行,但是当我尝试打开文档时word文档显示相同的错误,但是这次文档是可恢复的,并恢复为填充文本.我已经能够将其与手工制作的文档一起使用,并使用一个空的表格单元格,因此,我相信这应该是可能的.注意:我在 searchXML 变量中包含了表格单元的整个xml元素,但是我尝试使用正则表达式并缩短字符串.我不仅仅使用精确匹配,因为我知道这可能会因单元而异.

After struggling with this for a while I moved to a second approach involving manipulating the word/document.xml file for the output doc. I've retrieved what I believe to be the correct xml for a checkbox saved as replacementXML and have inserted filler text into the cells to act as a tag that can be searched and replaced, searchXML. The following seems to run using python in a linux (Fedora 25) environment but the word document displays the same errors when I try to open the document, however this time the document is recoverable and reverts back to the filler text. I've been able to get this to work with a manually made document and using an empty table cell, so I believe that this should be possible. NOTE: I've included the whole xml element for the table cell in the searchXML variable, but I've tried using regular expressions and shortening the string. Not just using an exact match as I know this could differ cell to cell.

searchXML = r'<w:tc><w:tcPr><w:tcW w:type="dxa" w:w="4320"/><w:gridSpan w:val="2"/></w:tcPr><w:p><w:pPr><w:jc w:val="right"/></w:pPr><w:r><w:rPr><w:sz w:val="16"/></w:rPr><w:t>IN_CHECKB</w:t></w:r></w:p></w:tc>'

def addCheckboxes(): 
    os.system("mkdir unzipped")
    os.system("unzip tempdoc.docx -d unzipped/")

    with open('unzipped/word/document.xml', encoding="ISO-8859-1") as file:
        filedata = file.read()

    rep_count = 0
    while re.search(searchXML, filedata):
        filedata = replaceXML(filedata, rep_count)
        rep_count += 1

    with open('unzipped/word/document.xml', 'w') as file:
        file.write(filedata)

    os.system("zip -r ../buildcfg/tempdoc.docx unzipped/*")
    os.system("rm -rf unzipped")

def replaceXML(filedata, rep_count):
    replacementXML = r'<w:tc><w:tcPr><w:tcW w:w="4320" w:type="dxa"/><w:gridSpan w:val="2"/></w:tcPr><w:p w:rsidR="00D2569D" w:rsidRDefault="00FD6FDF"><w:pPr><w:jc w:val="right"/></w:pPr><w:r><w:rPr><w:sz w:val="16"/>
                       </w:rPr><w:fldChar w:fldCharType="begin"><w:ffData><w:name w:val="Check1"/><w:enabled/><w:calcOnExit w:val="0"/><w:checkBox><w:sizeAuto/><w:default w:val="0"/></w:checkBox></w:ffData></w:fldChar>
                       </w:r><w:bookmarkStart w:id="' + rep_count + '" w:name="Check' + rep_count + '"/><w:r><w:rPr><w:sz w:val="16"/></w:rPr><w:instrText xml:space="preserve"> FORMCHECKBOX </w:instrText></w:r><w:r>
                       <w:rPr><w:sz w:val="16"/></w:rPr></w:r><w:r><w:rPr><w:sz w:val="16"/></w:rPr><w:fldChar w:fldCharType="end"/></w:r><w:bookmarkEnd w:id="' + rep_count + '"/></w:p></w:tc>'
    filedata = re.sub(searchXML, replacementXML, filedata, 1)

    rerturn filedata

我有一种强烈的感觉,那就是通过python-docx库执行此操作的方法要简单得多(而且是正确的!),但是由于某种原因,我似乎无法正确地做到这一点.

I have a strong feeling that there is a much simpler (and correct!) way of doing this through the python-docx library but for some reason I can't seem to get it right.

有没有一种方法可以轻松地将复选框字段插入MS Word文档中的表格单元格?如果是的话,我该怎么做?如果没有,那么有没有比处理.xml文件更好的方法了?

Is there a way to easily insert checkbox fields into a table cell in an MS Word doc? And if yes, how would I do that? If no, is there a better approach than manipulating the .xml file?

更新:我已经能够使用python-docx将XML成功地注入到文档中,但是没有出现复选框和添加的XML.

UPDATE: I've been able to inject XML into the document succesffuly using python-docx but the checkbox and added XML are not appearing.

我已将以下XML添加到表格单元格中:

I've added the following XML into a table cell:

<w:tc>
  <w:tcPr>
    <w:tcW w:type="dxa" w:w="4320"/>
    <w:gridSpan w:val="2"/>
  </w:tcPr>
  <w:p>
    <w:r>
      <w:bookmarkStart w:id="0" w:name="testName">
        <w:complexType w:name="CT_FFCheckBox">
          <w:sequence>
            <w:choice>
              <w:element w:name="size" w:type="CT_HpsMeasure"/>
              <w:element w:name="sizeAuto" w:type="CT_OnOff"/>
            </w:choice>
            <w:element w:name="default" w:type="CT_OnOff" w:minOccurs="0"/>
            <w:element w:name="checked" w:type="CT_OnOff" w:minOccurs="0"/>
          </w:sequence>
        </w:complexType>
      </w:bookmarkStart>
      <w:bookmarkEnd w:id="0" w:name="testName"/>
    </w:r>
  </w:p>
</w:tc>

通过使用以下python-docx代码:

by using the following python-docx code:

run = p.add_run()
tag = run._r
start = docx.oxml.shared.OxmlElement('w:bookmarkStart')
start.set(docx.oxml.ns.qn('w:id'), '0')
start.set(docx.oxml.ns.qn('w:name'), n)
tag.append(start)

ctype = docx.oxml.OxmlElement('w:complexType')
ctype.set(docx.oxml.ns.qn('w:name'), 'CT_FFCheckBox')
seq = docx.oxml.OxmlElement('w:sequence')
choice = docx.oxml.OxmlElement('w:choice')
el = docx.oxml.OxmlElement('w:element')
el.set(docx.oxml.ns.qn('w:name'), 'size')
el.set(docx.oxml.ns.qn('w:type'), 'CT_HpsMeasure')
el2 = docx.oxml.OxmlElement('w:element')
el2.set(docx.oxml.ns.qn('w:name'), 'sizeAuto')
el2.set(docx.oxml.ns.qn('w:type'), 'CT_OnOff')

choice.append(el)
choice.append(el2)

el3 = docx.oxml.OxmlElement('w:element')
el3.set(docx.oxml.ns.qn('w:name'), 'default')
el3.set(docx.oxml.ns.qn('w:type'), 'CT_OnOff')
el3.set(docx.oxml.ns.qn('w:minOccurs'), '0')
el4 = docx.oxml.OxmlElement('w:element')
el4.set(docx.oxml.ns.qn('w:name'), 'checked')
el4.set(docx.oxml.ns.qn('w:type'), 'CT_OnOff')
el4.set(docx.oxml.ns.qn('w:minOccurs'), '0')

seq.append(choice)
seq.append(el3)
seq.append(el4)

ctype.append(seq)
start.append(ctype)

end = docx.oxml.shared.OxmlElement('w:bookmarkEnd')
end.set(docx.oxml.ns.qn('w:id'), '0')
end.set(docx.oxml.ns.qn('w:name'), n)
tag.append(end)

似乎找不到找到未反映在输出文档中的XML的原因,但会根据我发现的内容进行更新.

Can't seem to find reasoning for the XML not being reflected in the output document but will update with whatever I find.

推荐答案

经过@scanny的大量挖掘和帮助,我终于能够完成此任务.

I've finally been able to accomplish this after lots of digging and help from @scanny.

可以使用以下功能将复选框插入 python-docx 中的任何段落.我要在表格的特定单元格中插入一个复选框.

Checkboxes can be inserted into any paragraph in python-docx using the following function. I am inserting a checkbox into specific cells in a table.

def addCheckbox(para, box_id, name):

  run = para.add_run()
  tag = run._r
  fldchar = docx.oxml.shared.OxmlElement('w:fldChar')
  fldchar.set(docx.oxml.ns.qn('w:fldCharType'), 'begin')

  ffdata = docx.oxml.shared.OxmlElement('w:ffData')
  name = docx.oxml.shared.OxmlElement('w:name')
  name.set(docx.oxml.ns.qn('w:val'), cb_name)
  enabled = docx.oxml.shared.OxmlElement('w:enabled')
  calconexit = docx.oxml.shared.OxmlElement('w:calcOnExit')
  calconexit.set(docx.oxml.ns.qn('w:val'), '0')

  checkbox = docx.oxml.shared.OxmlElement('w:checkBox')
  sizeauto = docx.oxml.shared.OxmlElement('w:sizeAuto')
  default = docx.oxml.shared.OxmlElement('w:default')

  if checked:
    default.set(docx.oxml.ns.qn('w:val'), '1')
  else:
    default.set(docx.oxml.ns.qn('w:val'), '0')

  checkbox.append(sizeauto)
  checkbox.append(default)
  ffdata.append(name)
  ffdata.append(enabled)
  ffdata.append(calconexit)
  ffdata.append(checkbox)
  fldchar.append(ffdata)
  tag.append(fldchar)

  run2 = para.add_run()
  tag2 = run2._r
  start = docx.oxml.shared.OxmlElement('w:bookmarkStart')
  start.set(docx.oxml.ns.qn('w:id'), str(box_id))
  start.set(docx.oxml.ns.qn('w:name'), name)
  tag2.append(start)

  run3 = para.add_run()
  tag3 = run3._r
  instr = docx.oxml.OxmlElement('w:instrText')
  instr.text = 'FORMCHECKBOX'
  tag3.append(instr)

  run4 = para.add_run()
  tag4 = run4._r
  fld2 = docx.oxml.shared.OxmlElement('w:fldChar')
  fld2.set(docx.oxml.ns.qn('w:fldCharType'), 'end')
  tag4.append(fld2)

  run5 = para.add_run()
  tag5 = run5._r
  end = docx.oxml.shared.OxmlElement('w:bookmarkEnd')
  end.set(docx.oxml.ns.qn('w:id'), str(box_id))
  end.set(docx.oxml.ns.qn('w:name'), name)
  tag5.append(end)

  return

fldData.text 对象似乎是随机的,但它是从生成的XML中带一个带有现有复选框的Word文档中提取的.如果不设置此文本,该功能将失败.我还没有确认,但是我听说过一种情况,开发人员可以随意更改字符串,但是一旦保存,它将恢复为原始生成的值.

The fldData.text object seems random but was taken from the generated XML form a word document with an existing checkbox. The function fails without setting this text. I have not confirmed but I have heard of one scenario where a developer was arbitrarily changing the string but once saved it would revert back to the original generated value.

这篇关于如何使用python-docx将复选框表单插入.docx文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆