AttributeError: 'xml.etree.ElementTree.Element' 对象没有属性 'encode' [英] AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'
问题描述
我正在尝试制作桌面通知程序,为此我正在从网站上抓取新闻.当我运行程序时,出现以下错误.
news[child.tag] = child.encode('utf8')AttributeError: 'xml.etree.ElementTree.Element' 对象没有属性 'encode'
我该如何解决?我对此完全陌生.我尝试寻找解决方案,但没有一个对我有用.
这是我的代码:
导入请求导入 xml.etree.ElementTree 作为 ET# 新闻RSS提要的网址RSS_FEED_URL = "http://www.hindustantimes.com/rss/topnews/rssfeed.xml"定义加载RSS():'''用于加载 RSS 提要的实用函数'''# 创建 HTTP 请求响应对象resp = requests.get(RSS_FEED_URL)# 返回响应内容返回相应的内容def parseXML(rss):'''解析 XML 格式 rss 提要的实用函数'''# 创建元素树根对象root = ET.fromstring(rss)# 为新闻项目创建空列表新闻项目 = []# 迭代新闻项目对于 root.findall('./channel/item') 中的项目:新闻 = {}# 迭代 item 的子元素对于项目中的孩子:# 对命名空间对象内容的特殊检查:媒体如果 child.tag == '{http://search.yahoo.com/mrss/}content':新闻['媒体'] = child.attrib['url']别的:新闻[child.tag] = child.encode('utf8')newsitems.append(新闻)# 返回新闻条目列表返回新闻def topStories():'''生成和返回新闻项目的主要功能'''# 加载 RSS 提要RSS = 加载RSS()# 解析 XML新闻项目 = parseXML(rss)返回新闻
您正在尝试将 str
转换为 bytes
,然后将这些字节存储在字典中.问题是您要执行此操作的对象是xml.etree.ElementTree.Element
,不是 str
.
您可能想从该元素内部或周围获取文本,然后encode()
那个.文档建议使用itertext()
方法:
''.join(child.itertext())
这将计算为 str
,然后您可以encode()
.
请注意text
和 tail
属性可能不包含文本(强调):
它们的值通常是字符串但可以是任何特定于应用程序的对象.
如果要使用这些属性,则必须处理 None
或非字符串值:
head = '' if child.text is None else str(child.text)tail = '' 如果 child.text 是 None else str(child.text)# 用头和尾做点什么...
即使这样还不够.如果 text
或 tail
包含一些意外的 bytes
对象(或完全错误)编码,这将引发 UnicodeEncodeError
.
字符串与字节
我建议将文本保留为 str
,并且根本不对其进行编码.将文本编码为 bytes
对象是将其写入二进制文件、网络套接字或其他硬件之前的最后一步.
有关字节和字符之间差异的更多信息,请参阅 Ned Batchelder 的"实用的 Unicode,或者,我该如何止痛?"(36 分钟来自 PyCon US 2012 的视频).他涵盖了 Python 2 和 3.
示例输出
使用 child.itertext()
方法,并且不对字符串进行编码,我从 topStories() 得到了一个看起来合理的字典列表代码>:
I'm trying to make a desktop notifier, and for that I'm scraping news from a site. When I run the program, I get the following error.
news[child.tag] = child.encode('utf8')
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'
How do I resolve it? I'm completely new to this. I tried searching for solutions, but none of them worked for me.
Here is my code:
import requests
import xml.etree.ElementTree as ET
# url of news rss feed
RSS_FEED_URL = "http://www.hindustantimes.com/rss/topnews/rssfeed.xml"
def loadRSS():
'''
utility function to load RSS feed
'''
# create HTTP request response object
resp = requests.get(RSS_FEED_URL)
# return response content
return resp.content
def parseXML(rss):
'''
utility function to parse XML format rss feed
'''
# create element tree root object
root = ET.fromstring(rss)
# create empty list for news items
newsitems = []
# iterate news items
for item in root.findall('./channel/item'):
news = {}
# iterate child elements of item
for child in item:
# special checking for namespace object content:media
if child.tag == '{http://search.yahoo.com/mrss/}content':
news['media'] = child.attrib['url']
else:
news[child.tag] = child.encode('utf8')
newsitems.append(news)
# return news items list
return newsitems
def topStories():
'''
main function to generate and return news items
'''
# load rss feed
rss = loadRSS()
# parse XML
newsitems = parseXML(rss)
return newsitems
You're trying to convert a str
to bytes
, and then store those bytes in a dictionary.
The problem is that the object you're doing this to is an
xml.etree.ElementTree.Element
,
not a str
.
You probably meant to get the text from within or around that element, and then encode()
that.
The docs
suggests using the
itertext()
method:
''.join(child.itertext())
This will evaluate to a str
, which you can then encode()
.
Note that the
text
and tail
attributes
might not contain text
(emphasis added):
Their values are usually strings but may be any application-specific object.
If you want to use those attributes, you'll have to handle None
or non-string values:
head = '' if child.text is None else str(child.text)
tail = '' if child.text is None else str(child.text)
# Do something with head and tail...
Even this is not really enough.
If text
or tail
contain bytes
objects of some unexpected
(or plain wrong)
encoding, this will raise a UnicodeEncodeError
.
Strings versus Bytes
I suggest leaving the text as a str
, and not encoding it at all.
Encoding text to a bytes
object is intended as the last step before writing it to a binary file, a network socket, or some other hardware.
For more on the difference between bytes and characters, see Ned Batchelder's "Pragmatic Unicode, or, How Do I Stop the Pain?" (36 minute video from PyCon US 2012). He covers both Python 2 and 3.
Example Output
Using the child.itertext()
method, and not encoding the strings, I got a reasonable-looking list-of-dictionaries from topStories()
:
[
...,
{'description': 'Ayushmann Khurrana says his five-year Bollywood journey has '
'been "a fun ride"; adds success is a lousy teacher while '
'failure is "your friend, philosopher and guide".',
'guid': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
'link': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
'media': 'http://www.hindustantimes.com/rf/image_size_630x354/HT/p2/2017/06/26/Pictures/actor-ayushman-khurana_24f064ae-5a5d-11e7-9d38-39c470df081e.JPG',
'pubDate': 'Mon, 26 Jun 2017 10:50:26 GMT ',
'title': "I am a hardcore realist, and that's why I feel my journey "
'has been a joyride: Ayushmann...'},
]
这篇关于AttributeError: 'xml.etree.ElementTree.Element' 对象没有属性 'encode'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!