XML 数据中某些特殊字符的“ParseError: not well-formed"错误 [英] 'ParseError: not well-formed' Error for some Special characters in XML Data

查看:177
本文介绍了XML 数据中某些特殊字符的“ParseError: not well-formed"错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码来清理日志文件以从中获取 XML(日志文件格式不正确且没有 root),然后解析并执行其他功能.清理工作,但 XML Parser 为包含一些特殊字符的一些 xml 数据抛出错误.我的代码如下:

I have the following code to cleanup the log file to get XML out of it (log file is not well formatted and doesn't have root) and then parse and perform other functions. Clean up works, but But XML Parser is throwing me error for some xml data which contain some special characters. My code is as below:

with open(log_file, 'r') as fr, open('XMLinLog2.xml', 'w') as fw:
    fw.write("<document>\n")

    for line in fr:
        if line.strip().startswith('<'):
            fw.write('\t' + line)
    fw.write("\n</document>")

# --- Parsing Log files after cleanup ---

doc = ET.parse('XMLinLog2.xml')

日志文件中抛出错误的 xml 数据是为了;(1) Ops Désactivée 23:59 和 (2) [ mono @ 90° >>+1在日志文件中清理后显示为 Ops D sactiv e 23:59[ mono @ 90 >>+1 分别.所以我发现 字符导致了问题.问题:

The xml data in log file which throws me error is for; (1) Ops Désactivée 23:59 and (2) [ mono @ 90° >> +1 which after cleanup in the log file is shown as Ops D�sactiv�e 23:59 and [ mono @ 90� >> +1 respectively. So I figured out � character is causing issues. Question:

  1. 我该如何处理这个错误?
  2. 如果我需要打印这些数据,我该如何正确打印它们?我不想打印 .因为我认为每当我有法语文本输入 é 时它都会抛出错误.

这里有完整的错误:Raceback(最近一次通话):文件C:/Users/PycharmProjects/IMSS_TestHarness/Libraries/try.py",第 23 行,在doc = ET.parse('XMLinLog2.xml')文件C:\Users\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py",第 1202 行,解析tree.parse(源代码,解析器)文件C:\Users\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py",第 595 行,解析中self._root = parser._parse_whole(source)xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3299, column 22

Full error here: raceback (most recent call last): File "C:/Users/PycharmProjects/IMSS_TestHarness/Libraries/try.py", line 23, in doc = ET.parse('XMLinLog2.xml') File "C:\Users\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1202, in parse tree.parse(source, parser) File "C:\Users\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 595, in parse self._root = parser._parse_whole(source) xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3299, column 22

进程以退出代码 1 结束

Process finished with exit code 1

日志文件:

1.  2020-08-03 15:59:54.635 (72 ,Effective Commit) Info          Sending:
<U_DisplayCommand>
  <DestinationId>5035</DestinationId>
  <DisplayId>1</DisplayId>
  <LineTextEnglish>
    <Line>Ops Disabled 23:59 N</Line>
  </LineTextEnglish>
  <LineTextFrench>
    **<Line>Ops Désactivée 23:59</Line>**
  </LineTextFrench>
</U_DisplayCommand>

<U_DisplayCommand>
  <DestinationId>5085</DestinationId>
  <DisplayId>1</DisplayId>
  <LineTextEnglish>
    <Line>Vaudreuil-Dori P123A</Line>
    <Line>[ mono @ 90° &gt;&gt; +1</Line>
  </LineTextEnglish>
  <LineTextFrench>
    <Line>Vaudreuil-Dori P123A</Line>
    <Line>[ mono @ 90° &gt;&gt; +1</Line>
  </LineTextFrench>
</U_DisplayCommand>

提前致谢.

推荐答案

实际上添加编码对我有用.

Actually adding encoding worked for me.

with open(log_file, 'r') as fr, open('XMLinLog2.xml', 'w', encoding='utf-8') as fw:
    fw.write("<document>\n")

    for line in fr:
        if line.strip().startswith('<'):
            fw.write('\t' + line)
    fw.write("\n</document>")

# --- Parsing Log files after cleanup ---

doc = ET.parse('XMLinLog2.xml')

这篇关于XML 数据中某些特殊字符的“ParseError: not well-formed"错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆