Python LXML解析SVG文件 [英] Python lxml parsing svg file

查看:544
本文介绍了Python LXML解析SVG文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 http://kanjivg.tagaini.net/解析.svg文件,但我无法成功提取其中的信息.

I'm trying to parse .svg files from http://kanjivg.tagaini.net/ , but I can't successfully extract the information inside.

(完整文件) http://www.filedropper.com /0f9ab

0f9ab.svg的一部分看起来像这样:

A part of 0f9ab.svg looks like this:

<svg xmlns="http://www.w3.org/2000/svg" width="109" height="109" viewBox="0 0 109 109">
<g id="kvg:StrokePaths_0f9ab" style="fill:none;stroke:#000000;stroke-width:3;stroke-linecap:round;stroke-linejoin:round;">
<g id="kvg:0f9ab" kvg:element="嶺">
    <g id="kvg:0f9ab-g1" kvg:element="山" kvg:position="top" kvg:radical="general">
        <path id="kvg:0f9ab-s1" kvg:type="㇑a" d="M53.26,9.38c0.99,0.99,1.12,2.09,1.12,3.12c0,0.67,0.06,8.38,0.06,13.01"/>
        <path id="kvg:0f9ab-s2" kvg:type="㇄a"
    </g>
</g>
</g>

我的.py文件:

import lxml.etree as ET

svg = ET.parse('0f9ab.svg')
print(svg)  # <lxml.etree._ElementTree object at 0x7f3a2f659ec8>

# AttributeError: 'lxml.etree._ElementTree' object has no attribute 'tag'
print(svg.tag)

# TypeError: 'lxml.etree._ElementTree' object is not subscriptable
print(svg[0])

# TypeError: 'lxml.etree._ElementTree' object is not iterable
for child in svg:
    print(child)

# None
print(svg.find("./svg"))

# []
print(svg.findall("//g"))

# []
print(svg.xpath("//g"))

目的

我尝试了所有我可以想到的操作,但是没有任何东西可以使我从.svg文件中获取任何数据. 我想提取kvg:element="kanji"中的汉字(日语字符)(在不同的深度级别).

Purpose

I tried all kinds of operations I could think of, but nothing gets me any data from the .svg file. I want to extract the kanji (Japanese character) in kvg:element="kanji" (which are at different depth levels).

  1. 为此使用了错误的软件包吗?
  2. 如果没有,如何从已解析的.svg文件中提取信息?

其他解决方案

  • 我当然可以将文件读取为字符串并进行搜索 对于kvg:element=",但我想以适当的方式提取xml /svg.
  • 我以前使用过xmltodict,但是提取kvg:element时我的代码变得非常混乱,因为它们的深度级别不同.
  • Other solution

    • I could of course I could just read the file as a string and search for kvg:element=", but I would like to proper way of extracting xml / svg.
    • I used xmltodict before, but my code became really messy extracting kvg:element, because they were at different depth levels.
    • 推荐答案

      .parse()返回 ElementTree ,代表整个树.要查询单个节点,您需要一个元素,最有可能是树.

      .parse() returns an ElementTree, which represents the tree as a whole. To query individual nodes, you need an Element, most likely the root element of the tree.

      使用以下代码替换部分代码:

      Replace part of your code with this:

      xml = ET.parse('0f9ab.svg')
      svg = xml.getroot()
      print(svg)  # <lxml.etree._ElementTree object at 0x7f3a2f659ec8>
      

      我想您会成功的.

      还请注意,.findall()需要相对路径,并且在您的情况下还需要名称空间限定符:

      Note also that .findall() requires a relative path and, in your case, a namespace qualifier:

      print(svg.findall(".//{http://www.w3.org/2000/svg}g"))
      

      这篇关于Python LXML解析SVG文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆