Python LXML解析SVG文件 [英] Python lxml parsing svg file
问题描述
我正在尝试从 http://kanjivg.tagaini.net/解析.svg文件,但我无法成功提取其中的信息.
I'm trying to parse .svg files from http://kanjivg.tagaini.net/ , but I can't successfully extract the information inside.
(完整文件) http://www.filedropper.com /0f9ab
0f9ab.svg
的一部分看起来像这样:
A part of 0f9ab.svg
looks like this:
<svg xmlns="http://www.w3.org/2000/svg" width="109" height="109" viewBox="0 0 109 109">
<g id="kvg:StrokePaths_0f9ab" style="fill:none;stroke:#000000;stroke-width:3;stroke-linecap:round;stroke-linejoin:round;">
<g id="kvg:0f9ab" kvg:element="嶺">
<g id="kvg:0f9ab-g1" kvg:element="山" kvg:position="top" kvg:radical="general">
<path id="kvg:0f9ab-s1" kvg:type="㇑a" d="M53.26,9.38c0.99,0.99,1.12,2.09,1.12,3.12c0,0.67,0.06,8.38,0.06,13.01"/>
<path id="kvg:0f9ab-s2" kvg:type="㇄a"
</g>
</g>
</g>
我的.py文件:
import lxml.etree as ET
svg = ET.parse('0f9ab.svg')
print(svg) # <lxml.etree._ElementTree object at 0x7f3a2f659ec8>
# AttributeError: 'lxml.etree._ElementTree' object has no attribute 'tag'
print(svg.tag)
# TypeError: 'lxml.etree._ElementTree' object is not subscriptable
print(svg[0])
# TypeError: 'lxml.etree._ElementTree' object is not iterable
for child in svg:
print(child)
# None
print(svg.find("./svg"))
# []
print(svg.findall("//g"))
# []
print(svg.xpath("//g"))
目的
我尝试了所有我可以想到的操作,但是没有任何东西可以使我从.svg文件中获取任何数据.
我想提取kvg:element="kanji"
中的汉字(日语字符)(在不同的深度级别).
Purpose
I tried all kinds of operations I could think of, but nothing gets me any data from the .svg file.
I want to extract the kanji (Japanese character) in kvg:element="kanji"
(which are at different depth levels).
- 为此使用了错误的软件包吗?
- 如果没有,如何从已解析的.svg文件中提取信息?
其他解决方案
- 我当然可以将文件读取为字符串并进行搜索
对于
kvg:element="
,但我想以适当的方式提取xml /svg. - 我以前使用过
xmltodict
,但是提取kvg:element
时我的代码变得非常混乱,因为它们的深度级别不同. - I could of course I could just read the file as a string and search
for
kvg:element="
, but I would like to proper way of extracting xml / svg. - I used
xmltodict
before, but my code became really messy extractingkvg:element
, because they were at different depth levels.
Other solution
推荐答案
.parse()
返回 ElementTree ,代表整个树.要查询单个节点,您需要一个元素,最有可能是树.
.parse()
returns an ElementTree, which represents the tree as a whole. To query individual nodes, you need an Element, most likely the root element of the tree.
使用以下代码替换部分代码:
Replace part of your code with this:
xml = ET.parse('0f9ab.svg')
svg = xml.getroot()
print(svg) # <lxml.etree._ElementTree object at 0x7f3a2f659ec8>
我想您会成功的.
还请注意,.findall()
需要相对路径,并且在您的情况下还需要名称空间限定符:
Note also that .findall()
requires a relative path and, in your case, a namespace qualifier:
print(svg.findall(".//{http://www.w3.org/2000/svg}g"))
这篇关于Python LXML解析SVG文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!