在编写XML文件时读取它(使用Python) [英] Read XML file while it is being written (in Python)
问题描述
我必须监视全天运行的工具编写的XML文件.但是XML文件只能在一天结束时正确完成和关闭.
I have to monitor an XML file being written by a tool running all the day. But the XML file is properly completed and closed only at the end of the day.
与XML流处理的约束相同:
Same constraints as XML stream processing:
- 动态分析不完整的XML文件并触发操作
- 跟踪文件中的最后位置,以避免从头开始再次处理
在回答需要使用Python中的BeautifulSoup以流形式读取XML文件时,slezica 建议 xml.sax
, xml.etree.ElementTree
和cElementTree
.但是我尝试使用 xml.etree.ElementTree
和cElementTree
.还有 xml.dom
, xml.parsers.expat
和的支持.
On answer of Need to read XML files as a stream using BeautifulSoup in Python, slezica suggests xml.sax
, xml.etree.ElementTree
and cElementTree
. But no success with my attempts to use xml.etree.ElementTree
and cElementTree
. There are also xml.dom
, xml.parsers.expat
and lxml
but I do not see support for "on-the-fly parsing".
我需要更明显的例子...
I need more obvious examples...
我目前在Linux上使用Python 2.7,但是我将迁移到Python 3.x =>请同时提供有关Python 3.x新功能的提示.我还使用 watchdog
检测XML文件修改=>(可选)重用 watchdog
机制. (可选)还支持Windows.
I am currently using Python 2.7 on Linux, but I will migrate to Python 3.x => please also provide tips on new Python 3.x features. I also use watchdog
to detect XML file modifications => Optionally, reuse the watchdog
mechanism. Optionally support also Windows.
请提供易于理解/维护的解决方案.如果太复杂,我可以只使用tell()
/seek()
在文件内移动,在原始XML中使用愚蠢的文本搜索,最后使用基本的正则表达式提取值.
Please provide easy to understand/maintain solutions. If it is too complex, I may just use tell()
/seek()
to move within the file, use stupid text search in the raw XML and finally extract the values using basic regex.
XML示例:
<dfxml xmloutputversion='1.0'>
<creator version='1.0'>
<program>TCPFLOW</program>
<version>1.4.6</version>
</creator>
<configuration>
<fileobject>
<filename>file1</filename>
<filesize>288</filesize>
<tcpflow packets='12' srcport='1111' dstport='2222' family='2' />
</fileobject>
<fileobject>
<filename>file2</filename>
<filesize>352</filesize>
<tcpflow packets='12' srcport='3333' dstport='4444' family='2' />
</fileobject>
<fileobject>
<filename>file3</filename>
<filesize>456</filesize>
...
...
使用SAX进行的首次测试失败:
First test using SAX failed:
import xml.sax
class StreamHandler(xml.sax.handler.ContentHandler):
def startElement(self, name, attrs):
print 'start: name=', name
def endElement(self, name):
print 'end: name=', name
if name == 'root':
raise StopIteration
if __name__ == '__main__':
parser = xml.sax.make_parser()
parser.setContentHandler(StreamHandler())
with open('f.xml') as f:
parser.parse(f)
外壳:
$ while read line; do echo $line; sleep 1; done <i.xml >f.xml &
...
$ ./test-using-sax.py
start: name= dfxml
start: name= creator
start: name= program
end: name= program
start: name= version
end: name= version
Traceback (most recent call last):
File "./test-using-sax.py", line 17, in <module>
parser.parse(f)
File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib64/python2.7/xml/sax/xmlreader.py", line 125, in parse
self.close()
File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 220, in close
self.feed("", isFinal = 1)
File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 214, in feed
self._err_handler.fatalError(exc)
File "/usr/lib64/python2.7/xml/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: report.xml:15:0: no element found
推荐答案
发布问题三个小时后,没有收到答案.但是我终于实现了我想要的简单示例.
Three hours after posting my question, no answer received. But I have finally implemented the simple example I was looking for.
我的灵感来自 saaj 的 xml.sax
和 watchdog
.
My inspiration is from saaj's answer and is based on xml.sax
and watchdog
.
from __future__ import print_function, division
import time
import watchdog.events
import watchdog.observers
import xml.sax
class XmlStreamHandler(xml.sax.handler.ContentHandler):
def startElement(self, tag, attributes):
print(tag, 'attributes=', attributes.items())
self.tag = tag
def characters(self, content):
print(self.tag, 'content=', content)
class XmlFileEventHandler(watchdog.events.PatternMatchingEventHandler):
def __init__(self):
watchdog.events.PatternMatchingEventHandler.__init__(self, patterns=['*.xml'])
self.file = None
self.parser = xml.sax.make_parser()
self.parser.setContentHandler(XmlStreamHandler())
def on_modified(self, event):
if not self.file:
self.file = open(event.src_path)
self.parser.feed(self.file.read())
if __name__ == '__main__':
observer = watchdog.observers.Observer()
event_handler = XmlFileEventHandler()
observer.schedule(event_handler, path='.')
try:
observer.start()
while True:
time.sleep(10)
finally:
observer.stop()
observer.join()
脚本运行时,请不要忘记touch
一个XML文件,或者使用以下命令模拟即时写入:
While the script is running, do not forget to touch
one XML file, or simulate the on-the-fly writing using the following command:
while read line; do echo $line; sleep 1; done <in.xml >out.xml &
这篇关于在编写XML文件时读取它(使用Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!