BeautifulSoup不会解析从本地文件加载的XML [英] BeautifulSoup doesn't parse XML loaded from local file
问题描述
当尝试从本地加载的文件中解析(查找元素)XML
时,使用BeautifulSoup
的我的Python
脚本获取None
:
My Python
script utilizing BeautifulSoup
gets None
when attempting to parse (find an element from) XML
from a locally loaded file:
xmlData = None
with open('conf//test2.xml', 'r') as xmlFile:
xmlData = xmlFile.read()
# this creates a soup object out of xmlData,
# which is properly loaded from file above
xmlSoup = BeautifulSoup(xmlData, "html.parser")
# this resolves to None
subElemX = xmlSoup.root.singleelement.find('subElementX', recursive=False)
文件:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
<singleElement>
<subElementX>XYZ</subElementX>
</singleElement>
<repeatingElement id="1"/>
<repeatingElement id="2"/>
</root>
我也有一个REST GET服务,该服务返回相同的XML,但是当我使用 requests.get
,它可以很好地解析:
I also have a REST GET service that returns the same XML but when I read that using requests.get
, it is parsed fine:
resp = requests.get(serviceURL, headers=headers)
respXML = resp.content.decode("utf-8")
restSoup = BeautifulSoup(respXML, "html.parser")
为什么它与REST响应一起工作,而不与从本地文件中读出的数据一起工作?
Why does it work with the REST response and not with the data read out of a local file?
更新:虽然我了解python区分大小写,并且是单个 e lement!=单个 E 元素,但在解析时忽略大小写网络服务.
UPDATE: While I understand that python is case sensitive and singleelement !=singleElement, the case is disregarded when parsing the web service.
推荐答案
要使其工作的两件事:
- 将功能从
html.parser
更改为xml
(您正在解析XML数据,XML!= HTML) - 将
singleelement
更改为singleElement
- change the features from
html.parser
toxml
(you are parsing XML data, XML != HTML) - change
singleelement
tosingleElement
已应用更改(对我有用)
Changes applied (works for me):
xmlSoup = BeautifulSoup(xmlData, "xml")
subElemX = xmlSoup.root.singleElement.find('subElementX', recursive=False)
print(subElemX) # prints <subElementX>XYZ</subElementX>
这篇关于BeautifulSoup不会解析从本地文件加载的XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!