在写入时读取 XML 文件(在 Python 中) [英] Read XML file while it is being written (in Python)

查看:20
本文介绍了在写入时读取 XML 文件(在 Python 中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须监控一个整天运行的工具正在写入的 XML 文件.但 XML 文件仅在一天结束时正确完成并关闭.

I have to monitor an XML file being written by a tool running all the day. But the XML file is properly completed and closed only at the end of the day.

与 XML 流处理相同的约束:

Same constraints as XML stream processing:

  1. 即时解析不完整的 XML 文件并触发操作
  2. 跟踪文件中的最后一个位置,以避免从头开始再次处理它

需要在 Python 中使用 BeautifulSoup 将 XML 文件作为流读取的答案,slezica 建议 xml.sax, xml.etree.ElementTreecElementTree.但我尝试使用 没有成功xml.etree.ElementTreecElementTree.还有xml.dom, xml.parsers.expatlxml 但我看不到支持对于即时解析".

On answer of Need to read XML files as a stream using BeautifulSoup in Python, slezica suggests xml.sax, xml.etree.ElementTree and cElementTree. But no success with my attempts to use xml.etree.ElementTree and cElementTree. There are also xml.dom, xml.parsers.expat and lxml but I do not see support for "on-the-fly parsing".

我需要更明显的例子...

I need more obvious examples...

我目前在 Linux 上使用 Python 2.7,但我将迁移到 Python 3.x => 还请提供有关 Python 3.x 新功能的提示.我还使用 watchdog 来检测 XML 文件修改 => 可选,重用 watchdog 机制.也可选择支持 Windows.

I am currently using Python 2.7 on Linux, but I will migrate to Python 3.x => please also provide tips on new Python 3.x features. I also use watchdog to detect XML file modifications => Optionally, reuse the watchdog mechanism. Optionally support also Windows.

请提供易于理解/维护的解决方案.如果太复杂,我可以只使用 tell()/seek() 在文件中移动,在原始 XML 中使用愚蠢的文本搜索,最后提取值使用基本的正则表达式.

Please provide easy to understand/maintain solutions. If it is too complex, I may just use tell()/seek() to move within the file, use stupid text search in the raw XML and finally extract the values using basic regex.

XML 示例:

<dfxml xmloutputversion='1.0'>
   <creator version='1.0'>
     <program>TCPFLOW</program>
     <version>1.4.6</version>
   </creator>
   <configuration>
     <fileobject>
       <filename>file1</filename>
       <filesize>288</filesize>
       <tcpflow packets='12' srcport='1111' dstport='2222' family='2' />
     </fileobject>
     <fileobject>
       <filename>file2</filename>
       <filesize>352</filesize>
       <tcpflow packets='12' srcport='3333' dstport='4444' family='2' />
     </fileobject>
     <fileobject>
       <filename>file3</filename>
       <filesize>456</filesize>
       ...
       ...

<小时>

使用 SAX 的第一次测试失败:


First test using SAX failed:

import xml.sax

class StreamHandler(xml.sax.handler.ContentHandler):
    def startElement(self, name, attrs):
        print 'start: name=', name
    def endElement(self, name):
        print 'end:   name=', name
        if name == 'root':
            raise StopIteration

if __name__ == '__main__':
    parser = xml.sax.make_parser()
    parser.setContentHandler(StreamHandler())
    with open('f.xml') as f:
        parser.parse(f)

壳牌:

$ while read line; do echo $line; sleep 1; done <i.xml >f.xml &
...
$ ./test-using-sax.py
start: name= dfxml
start: name= creator
start: name= program
end:   name= program
start: name= version
end:   name= version
Traceback (most recent call last):
  File "./test-using-sax.py", line 17, in <module>
    parser.parse(f)
  File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib64/python2.7/xml/sax/xmlreader.py", line 125, in parse
    self.close()
  File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 220, in close
    self.feed("", isFinal = 1)
  File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 214, in feed
    self._err_handler.fatalError(exc)
  File "/usr/lib64/python2.7/xml/sax/handler.py", line 38, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: report.xml:15:0: no element found

推荐答案

发布我的问题三个小时后,没有收到答案.但我终于实现了我正在寻找的简单示例.

Three hours after posting my question, no answer received. But I have finally implemented the simple example I was looking for.

我的灵感来自 saajanswer 并且基于 xml.saxwatchdog.

My inspiration is from saaj's answer and is based on xml.sax and watchdog.

from __future__ import print_function, division
import time
import watchdog.events
import watchdog.observers
import xml.sax

class XmlStreamHandler(xml.sax.handler.ContentHandler):
  def startElement(self, tag, attributes):
    print(tag, 'attributes=', attributes.items())
    self.tag = tag
  def characters(self, content):
    print(self.tag, 'content=', content)

class XmlFileEventHandler(watchdog.events.PatternMatchingEventHandler):
  def __init__(self):
    watchdog.events.PatternMatchingEventHandler.__init__(self, patterns=['*.xml'])
    self.file = None
    self.parser = xml.sax.make_parser()
    self.parser.setContentHandler(XmlStreamHandler())
  def on_modified(self, event):
    if not self.file:
      self.file = open(event.src_path)
    self.parser.feed(self.file.read())

if __name__ == '__main__':
  observer = watchdog.observers.Observer()
  event_handler = XmlFileEventHandler()
  observer.schedule(event_handler, path='.')
  try:
    observer.start()
    while True:
      time.sleep(10)
  finally:
    observer.stop()
    observer.join()

在脚本运行时,不要忘记touch一个XML文件,或者使用以下命令模拟即时写入:

While the script is running, do not forget to touch one XML file, or simulate the on-the-fly writing using the following command:

while read line; do echo $line; sleep 1; done <in.xml >out.xml &

这篇关于在写入时读取 XML 文件(在 Python 中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆