Python LXML-获取标签文本的索引 [英] Python lxml - get index of tag's text

查看:387
本文介绍了Python LXML-获取标签文本的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个格式类似于docx的xml文件,即:

I have an xml-file with a format similar to docx, i.e.:

<w:r>
  <w:rPr>
    <w:sz w:val="36"/>
    <w:szCs w:val="36"/>
  </w:rPr>
  <w:t>BIG_TEXT</w:t>
</w:r>

我需要在源xml中获得"BIG_TEXT"的索引,例如:

I need to get an index of "BIG_TEXT" in source xml, like:

from lxml import etree
text = open('/devel/tmp/doc2/word/document.xml', 'r').read()

root = etree.XML(text)

start = 0
for e in root.iter("*"):
    if e.text:
        offset = text.index(e.text, start)
        l = len(e.text)
        print 'Text "%s" at offset %s and len=%s' % (e.text, offset, l)
        start = offset + l

我可以从当前索引+ len(text)的位置开始新的搜索,但是还有另一种方法吗?元素可以有一个字符,例如w.它将找到w的索引,但不会找到标签文本w的索引.

I can start a new search from position of current index + len(text), but is there another way? Element may have one character, w for example. It will find index of w, but not index of tag text w.

推荐答案

我一直在寻找类似的解决方案(在大型xml文件中为节点建立索引以便快速查找).

I was looking for a similar solution (indexing nodes in a big xml file for fast lookup).

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆