Python:如何修改Microsoft Office文件的元数据? [英] Python: How to Modify metadata of Microsoft Office files?

查看:391
本文介绍了Python:如何修改Microsoft Office文件的元数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何修改Microsoft Office文档的元数据?我找到了Jpg,PNG和PDF文件的结果数量.有人可以建议Office文件元数据的图书馆吗?

解决方案

对于较新的格式,它们通常只是压缩的xml,因此您可以使用标准库来解压缩和解析xml.以前发布了一些用于捕获文档创建者的代码,作为对stackoverflow的回答.

import zipfile, lxml.etree

# open zipfile
zf = zipfile.ZipFile('my_doc.docx')
# use lxml to parse the xml file we are interested in
doc = lxml.etree.fromstring(zf.read('docProps/core.xml'))
# retrieve creator
ns={'dc': 'http://purl.org/dc/elements/1.1/'}
creator = doc.xpath('//dc:creator', namespaces=ns)[0].text

对于较旧的格式,您可能需要查看 hach​​oir-元数据库

How can I modify Microsoft Office Document's Metadata? I found number of result for the Jpg, PNG and PDF file. Any one can suggest Libraries for Office files Metadata?

解决方案

For newer formats they are often just zipped xml, so you can use standard libs to unzip and parse the xml. Some code to grab the document creator was previously posted as an answer on stackoverflow.

import zipfile, lxml.etree

# open zipfile
zf = zipfile.ZipFile('my_doc.docx')
# use lxml to parse the xml file we are interested in
doc = lxml.etree.fromstring(zf.read('docProps/core.xml'))
# retrieve creator
ns={'dc': 'http://purl.org/dc/elements/1.1/'}
creator = doc.xpath('//dc:creator', namespaces=ns)[0].text

For older formats you might want to look at the hachoir-metadata library

这篇关于Python:如何修改Microsoft Office文件的元数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆