python-docx的粗体,下划线和迭代 [英] Bold, underlining, and Iterations with python-docx

查看:1365
本文介绍了python-docx的粗体,下划线和迭代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序,以从ASCII文件中获取数据并将数据放置在Word文档中的适当位置,并仅使特定单词加粗并加下划线.我是Python的新手,但是我在Matlab编程方面拥有丰富的经验.我的代码是:

I am writing a program to take data from an ASCII file and place the data in the appropriate place in the Word document, and making only particular words bold and underlined. I am new to Python, but I have extensive experience in Matlab programming. My code is:

#IMPORT ASCII DATA AND MAKE IT USEABLE
#Alternatively Pandas - gives better table display results
import pandas as pd
data = pd.read_csv('203792_M-51_Niles_control_SD_ACSF.txt', sep=",", 
header=None)
#print data
#data[1][3]  gives value at particular data points within matrix
i=len(data[1])
print 'Number of Points imported =', i
#IMPORT WORD DOCUMENT
import docx  #Opens Python Word document tool
from docx import Document  #Invokes Document command from docx
document = Document('test_iteration.docx')  #Imports Word Document to Modify
t = len(document.paragraphs)  #gives the number of lines in document
print 'Total Number of lines =', t
#for paragraph in document.paragraphs:
   # print(para.text)  #Prints the text in the entire document
font = document.styles['Normal'].font
font.name = 'Arial'
from docx.shared import Pt
font.size = Pt(8)
#font.bold = True
#font.underline = True
for paragraph in document.paragraphs:
    if 'NORTHING:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'NORTHING: \t',  str(data[1][0])
        print paragraph.text   
    elif 'EASTING:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'EASTING: \t', str(data[2][0])
        print paragraph.text
    elif 'ELEV:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'ELEV: \t', str(data[3][0])
        print paragraph.text
    elif 'CSF:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'CSF: \t', str(data[8][0])
        print paragraph.text
    elif 'STD. DEV.:' in paragraph.text:
        #print paragraph.text
        paragraph.text = 'STD. DEV.: ', 'N: ', str(data[5][0]), '\t E: ', 
str(data[6][0]), '\t EL: ', str(data[7][0])
    print paragraph.text
#for paragraph in document.paragraphs:
   #print(paragraph.text)  #Prints the text in the entire document
#document.save('test1_save.docx') #Saves as Word Document after Modification

我的问题是如何仅使"NORTHING:"加粗并带有下划线:

My question is how to make only the "NORTHING:" bold and underlined in:

    paragraph.text = 'NORTHING: \t',  str(data[1][0])
    print paragraph.text 

因此,我编写了一个伪的查找并替换"命令,如果要替换的所有值都完全相同,则效果很好.但是,我需要用ASCII文件的第二个数组中的值替换第二段中的值,并用第三个数组中的值替换第三段中的值. (我必须使用查找和替换,因为要使文档的格式高级以使我可以在程序中复制,除非有一个程序可以读取Word文件并以Python脚本的形式写回程序...对它进行反向工程)

So I wrote a pseudo "find and replace" command that works great if all the values being replaced are the exactly same. However, I need to replace the values in the second paragraph with the values from the second array of the ASCII file, and the third paragraph with the values from the third array..etc. (I have to use find and replace because the formatting of the document is to advanced for me to replicate in a program, unless there is a program that can read the Word file and write the programming back as Python script...reverse engineer it.)

我仍在学习,因此代码对您而言似乎很粗糙.我只是想自动完成无聊的复制和粘贴过程.

I am still just learning, so the code may seem crude to you. I am just trying to automate this boring process of copy and pasting.

推荐答案

未经测试,但假设python-docx与python-pptx类似(应该是,它是由同一位开发人员维护的,对文档的粗略回顾表明与PPT/DOC文件的接口方式相同,使用相同的方法,等等.

Untested, but assuming python-docx is similar to python-pptx (it should be, it's maintained by the same developer, and a cursory review of the documentation suggests that the way it interfaces withthe PPT/DOC files is the same, uses the same methods, etc.)

为了操作段落或单词的子字符串,您需要使用run对象:

In order to manipulate substrings of paragraphs or words, you need to use the run object:

https://python-docx.readthedocs .io/zh-CN/latest/api/text.html#run-objects

实际上,这看起来像:

for paragraph in document.paragraphs:
    if 'NORTHING:' in paragraph.text:
        paragraph.clear()
        run = paragraph.add_run()
        run.text = 'NORTHING: \t'
        run.font.bold = True
        run.font.underline = True
        run = paragraph.add_run()
        run.text = str(data[1][0])    

从概念上讲,您为需要处理的段落/文本的每个 part 创建一个run实例.因此,首先我们用粗体字体创建一个run,然后添加另一个运行(我认为不会是粗体/下划线,但是如果只是将其设置为False).

Conceptually, you create a run instance for each part of the paragraph/text that you need to manipulate. So, first we create a run with the bolded font, then we add another run (which I think will not be bold/underline, but if it is just set those to False).

注意:最好将所有import语句放在模块顶部.

Note: it's preferable to put all of your import statements at the top of a module.

这可以通过使用映射对象(例如字典)来进行一些优化,您可以使用该对象将匹配值("NORTHING")与keys关联,并将段落文本的其余部分与values关联. 还未经测试

This can be optimized a bit by using a mapping object like a dictionary, which you can use to associate the matching values ("NORTHING") as keys and the remainder of the paragraph text as values. ALSO UNTESTED

import pandas as pd
from docx import Document  
from docx.shared import Pt

data = pd.read_csv('203792_M-51_Niles_control_SD_ACSF.txt', sep=",", 
header=None)
i=len(data[1])
print 'Number of Points imported =', i
document = Document('test_iteration.docx')  #Imports Word Document to Modify
t = len(document.paragraphs)  #gives the number of lines in document
print 'Total Number of lines =', t
font = document.styles['Normal'].font
font.name = 'Arial'
font.size = Pt(8)

# This maps the matching strings to the data array values
data_dict = {
    'NORTHING:': data[1][0],
    'EASTING:': data[2][0],
    'ELEV:': data[3][0],
    'CSF:': data[8][0],
    'STD. DEV.:': 'N: {0}\t E: {1}\t EL: {2}'.format(data[5][0], data[6][0], data[7][0])
    }

for paragraph in document.paragraphs:
    for k,v in data_dict.items():
        if k in paragraph.text:
            paragraph.clear()
            run = paragraph.add_run()
            run.text = k + '\t'
            run.font.bold = True
            run.font.underline = True
            run = paragraph.add_run()
            run.text = '{0}'.format(v)

这篇关于python-docx的粗体,下划线和迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆