字符串替换后无法打开word文档 [英] can't open word document after string replacements

查看:86
本文介绍了字符串替换后无法打开word文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,


我有一个包含图片和文字的word文档。这个文件

包含几个ABCDEF字符串,用作名称的占位符。

现在我想用列表中的名称替换这些出现的(成员)。我以
打开二进制模式下的输入和输出文件并执行

转换。但是,我无法打开生成的文件,Word只是

告诉我有错误。有人说我做错了什么?


哦,这种方法是pythonic吗? (我有很强的Java背景。)


问候,

antoine

导入os


members = somelist


os.chdir(somefolder)


doc = file(''ttt.doc'',' 'rb'')

docout = file(''ttt1.doc'',''wb'')


counter = 0


for doc in doc:

而line.find(''ABCDEF'')-1:

试试:

line = line.replace(''ABCDEF'',成员[柜台],1)

docout.write(行)

counter + = 1

除外:

docout.write(line.replace(''ABCDEF'','''',1))

else:

docout.write(行)


doc.close()

docout.close()

解决方案

Antoine De Groote写道:


我有一个包含图片和文字的word文档。这个文件

包含几个ABCDEF字符串,用作名称的占位符。

现在我想用列表中的名称替换这些出现的(成员)。我以
打开二进制模式下的输入和输出文件并执行

转换。但是,我无法打开生成的文件,Word只是

告诉我有错误。我有什么问题吗?



Word文档格式可能包含一些关于

段落等的长度信息。如果将字符串更改为另一个字符串

长度,此长度信息将不再与数据匹配,并且

文档结构将被冲洗。


可能的解决方案:

1.使用OLE自动化(在python win32包中)打开

Word中的文件并使用Word搜索和替换。然后你的脚本可以直接打印文件,无论如何你都可以打印。


2.将模板文档导出到RTF。这是一种文本格式,可以用Python更容易操作。


for doc in doc:


我不认为你在这里得到的实际上是你的一行文件。

由于格式的二进制特性,它是一个任意的块。


Daniel


Antoine De Groote写道:


你好,


我有一个包含图片和文字的文字文件。这个文件

包含几个ABCDEF字符串,用作名称的占位符。

现在我想用列表中的名称替换这些出现的(成员)。



你知道MS Word已经提供了这种功能吗?




以二进制模式打开输入和输出文件并执行

转换。但是,我无法打开生成的文件,Word只是

告诉我有错误。我有什么问题吗?



手工编辑未记录的二进制格式可能导致不良的
结果......


哦,无论如何这种方法是pythonic吗?



pythonic方法通常是开始寻找现有的

解决方案......在这种情况下,使用Word的内置功能和Python / COM

集成将是更好的选择恕我直言。


(我有一个强大的Java

背景。)



没有人完美! - )


问候,

antoine


import os


members = somelist


os.chdir(somefolder)


doc = file(''ttt.doc'',''rb'')

docout = file(''ttt1.doc'', ''wb'')


counter = 0

for doc in doc:



由于您以二进制文件的形式打开文件,因此您应该使用file.read()。

您是否想知道您的''line''是什么样的? - )


而line.find('' ABCDEF'') - 1:



..doc是二进制格式。您可以在*不是*文本内容的地方找到这样的字节序列'

内容。


try:

line = line.replace(''ABCDEF'',成员[柜台],1)

docout.write(line)



你在每次迭代时写回整个块。毫不奇怪

结果文件已损坏。


counter + = 1



seq = list(" abcd")

for indice,item in enumerate(seq):

print"%02d:%s" %(indice,item)


除了:

docout.write(line.replace('''ABCDEF'', '''',1))

else:

docout.write(line)


doc.close()

docout.close()




-

bruno desthuilliers

python -c" print''@''。join([''。''。join([w [:: - 1] for w in p.split(''。'')])for ''o **** @ xiludom.gro''中的
p.split(''@'')])"




Antoine De Groote写道:


你好,


我有一个包含图片的word文档和文字。这个文件

包含几个ABCDEF字符串,用作名称的占位符。

现在我想用列表中的名称替换这些出现的(成员)。我以
打开二进制模式下的输入和输出文件并执行

转换。但是,我无法打开生成的文件,Word只是

告诉我有错误。有人说我做错了什么?


哦,这种方法是pythonic吗? (我有很强的Java背景。)


问候,

antoine


import os


members = somelist


os.chdir(somefolder)


doc = file(''ttt.doc '',''rb'')

docout = file(''ttt1.doc'',''wb'')


counter = 0


for doc in line:

而line.find(''ABCDEF'')-1:

try:

line = line.replace(''ABCDEF'',成员[柜台],1)

docout.write(行)

柜台+ = 1

除外:

docout.write(line.replace(''ABCDEF'','''',1))

否则:

docout.write(行)

doc.close()

docout.close()



Errr ....我甚至不会尝试这样做;你怎么知道每一个

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''图像的一部分。正如你所说,这是二进制数据,所以你不能假设它。这样做是一个坏主意

(tm)。


如果你想做这样的事情,为什么不使用模板化的HTML,或者

可能模板化的PDF?或天堂禁止,Word的邮件合并设施?

(我认为MS Office文件是有效的自包含文件

系统,所以可能有一些模块输出在那里可以

读/写它们。


Jon。


Hi there,

I have a word document containing pictures and text. This documents
holds several ''ABCDEF'' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members). I
open both input and output file in binary mode and do the
transformation. However, I can''t open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Oh, and is this approach pythonic anyway? (I have a strong Java background.)

Regards,
antoine
import os

members = somelist

os.chdir(somefolder)

doc = file(''ttt.doc'', ''rb'')
docout = file(''ttt1.doc'', ''wb'')

counter = 0

for line in doc:
while line.find(''ABCDEF'') -1:
try:
line = line.replace(''ABCDEF'', members[counter], 1)
docout.write(line)
counter += 1
except:
docout.write(line.replace(''ABCDEF'', '''', 1))
else:
docout.write(line)

doc.close()
docout.close()

解决方案

Antoine De Groote wrote:

I have a word document containing pictures and text. This documents
holds several ''ABCDEF'' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members). I
open both input and output file in binary mode and do the
transformation. However, I can''t open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

The Word document format probably contains some length information about
paragraphs etc. If you change a string to another one of a different
length, this length information will no longer match the data and the
document structure will be hosed.

Possible solutions:
1. Use OLE automation (in the python win32 package) to open the file in
Word and use Word search and replace. Your script could then directly
print the document, which you probably have to do anyway.

2. Export the template document to RTF. This is a text format and can be
more easily manipulated with Python.

for line in doc:

I don''t think that what you get here is actually a line of you document.
Due to the binary nature of the format, it is an arbitrary chunk.

Daniel


Antoine De Groote wrote:

Hi there,

I have a word document containing pictures and text. This documents
holds several ''ABCDEF'' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members).

Do you know that MS Word already provides this kind of features ?

I
open both input and output file in binary mode and do the
transformation. However, I can''t open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Hand-editing a non-documented binary format may lead to undesirable
results...

Oh, and is this approach pythonic anyway?

The pythonic approach is usually to start looking for existing
solutions... In this case, using Word''s builtin features and Python/COM
integration would be a better choice IMHO.

(I have a strong Java
background.)

Nobody''s perfect !-)

Regards,
antoine
import os

members = somelist

os.chdir(somefolder)

doc = file(''ttt.doc'', ''rb'')
docout = file(''ttt1.doc'', ''wb'')

counter = 0

for line in doc:

Since you opened the file as binary, you should use file.read() instead.
Ever wondered what your ''lines'' look like ?-)

while line.find(''ABCDEF'') -1:

..doc is a binary format. You may find such a byte sequence in it''s
content in places that are *not* text content.

try:
line = line.replace(''ABCDEF'', members[counter], 1)
docout.write(line)

You''re writing back the whole chunk on each iteration. No surprise the
resulting document is corrupted.

counter += 1

seq = list("abcd")
for indice, item in enumerate(seq):
print "%02d : %s" % (indice, item)

except:
docout.write(line.replace(''ABCDEF'', '''', 1))
else:
docout.write(line)

doc.close()
docout.close()



--
bruno desthuilliers
python -c "print ''@''.join([''.''.join([w[::-1] for w in p.split(''.'')]) for
p in ''o****@xiludom.gro''.split(''@'')])"



Antoine De Groote wrote:

Hi there,

I have a word document containing pictures and text. This documents
holds several ''ABCDEF'' strings which serve as a placeholder for names.
Now I want to replace these occurences with names in a list (members). I
open both input and output file in binary mode and do the
transformation. However, I can''t open the resulting file, Word just
telling that there was an error. Does anybody what I am doing wrong?

Oh, and is this approach pythonic anyway? (I have a strong Java background.)

Regards,
antoine
import os

members = somelist

os.chdir(somefolder)

doc = file(''ttt.doc'', ''rb'')
docout = file(''ttt1.doc'', ''wb'')

counter = 0

for line in doc:
while line.find(''ABCDEF'') -1:
try:
line = line.replace(''ABCDEF'', members[counter], 1)
docout.write(line)
counter += 1
except:
docout.write(line.replace(''ABCDEF'', '''', 1))
else:
docout.write(line)

doc.close()
docout.close()

Errr.... I wouldn''t even attempt to do this; how do you know each
''line'' isn''t going to be split arbitarily, and that ''ABCDEF'' doesn''t
happen to be part of an image. As you''ve noted, this is binary data so
you can''t assume anything about it. Doing it this way is a Bad Idea
(tm).

If you want to do something like this, why not use templated HTML, or
possibly templated PDFs? Or heaven forbid, Word''s mail-merge facility?
(I think MS Office documents are effectively self-contained file
systems, so there is probably some module out there which can
read/write them).

Jon.


这篇关于字符串替换后无法打开word文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆