Python:在保留实体的同时解析XML文档 [英] Python: parsing XML document while preserving entities

查看:83
本文介绍了Python:在保留实体的同时解析XML文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想问一下,有什么已知的现有Python 2.x库用于解析具有内置DTD 而没有的XML文档,而不会自动扩展实体. (对那些好奇的人有问题的文件: JMdict .)

I wanted to ask what known existing Python 2.x libraries there are for parsing an XML document with built-in DTD without automatically expanding the entities. (File in question for those curious: JMdict.)

lxml似乎有一些不解析实体的选项,但是最后我尝试了,实体最终被转换为空白.我只是在Google上搜索了一下,发现pxdom是我可以尝试的另一种选择,但是由于它是纯Python,它似乎比我想要的要慢得多.

It seems lxml has some option for not parsing the entities, but last I tried, the entities just ended up being converted to blanks. I just googled this and found pxdom as another alternative which I may try, but since it's pure Python it seems far slower than I'd like.

还有别的吗?

推荐答案

用例似乎很不正常;不扩展实体似乎违反了解析器通常应该按照XML规范工作的方式.

It seems that the use case is rather abnormal; not expanding entities seems to go against the way parsers are generally supposed to work according to the XML spec.

所以,我认为最简单的也许就是这样.我已经通过re.finditer手动提取了标签,并制作了映射字典.从这里开始,只需扫描解析的输出并为我的应用做正确的事情.我认为我的用例足够好.

So, I think it's easiest to just kludge this perhaps. I've manually extracted the tags via re.finditer, and have made a dictionary of the mappings. From here, it's just a matter of scanning the parsed output and doing the right thing for my app. Good enough for my use case I think.

这篇关于Python:在保留实体的同时解析XML文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆