AttributeError: 'xml.etree.ElementTree.Element' 对象没有属性 'encode' [英] AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'

查看:37
本文介绍了AttributeError: 'xml.etree.ElementTree.Element' 对象没有属性 'encode'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试制作桌面通知程序,为此我正在从网站上抓取新闻.当我运行程序时,出现以下错误.

news[child.tag] = child.encode('utf8')AttributeError: 'xml.etree.ElementTree.Element' 对象没有属性 'encode'

我该如何解决?我对此完全陌生.我尝试寻找解决方案,但没有一个对我有用.

这是我的代码:

导入请求导入 xml.etree.ElementTree 作为 ET# 新闻RSS提要的网址RSS_FEED_URL = "http://www.hindustantimes.com/rss/topnews/rssfeed.xml"定义加载RSS():'''用于加载 RSS 提要的实用函数'''# 创建 HTTP 请求响应对象resp = requests.get(RSS_FEED_URL)# 返回响应内容返回相应的内容def parseXML(rss):'''解析 XML 格式 rss 提要的实用函数'''# 创建元素树根对象root = ET.fromstring(rss)# 为新闻项目创建空列表新闻项目 = []# 迭代新闻项目对于 root.findall('./channel/item') 中的项目:新闻 = {}# 迭代 item 的子元素对于项目中的孩子:# 对命名空间对象内容的特殊检查:媒体如果 child.tag == '{http://search.yahoo.com/mrss/}content':新闻['媒体'] = child.attrib['url']别的:新闻[child.tag] = child.encode('utf8')newsitems.append(新闻)# 返回新闻条目列表返回新闻def topStories():'''生成和返回新闻项目的主要功能'''# 加载 RSS 提要RSS = 加载RSS()# 解析 XML新闻项目 = parseXML(rss)返回新闻

解决方案

您正在尝试将 str 转换为 bytes,然后将这些字节存储在字典中.问题是您要执行此操作的对象是xml.etree.ElementTree.Element,不是 str.

您可能想从该元素内部或周围获取文本,然后encode() 那个.文档建议使用itertext()方法:

''.join(child.itertext())

这将计算为 str,然后您可以encode().

请注意texttail 属性可能不包含文本(强调):

<块引用>

它们的值通常是字符串但可以是任何特定于应用程序的对象.

如果要使用这些属性,则必须处理 None 或非字符串值:

head = '' if child.text is None else str(child.text)tail = '' 如果 child.text 是 None else str(child.text)# 用头和尾做点什么...

即使这样还不够.如果 texttail 包含一些意外的 bytes 对象(或完全错误)编码,这将引发 UnicodeEncodeError.

字符串与字节

我建议将文本保留为 str,并且根本不对其进行编码.将文本编码为 bytes 对象是将其写入二进制文件、网络套接字或其他硬件之前的最后一步.

有关字节和字符之间差异的更多信息,请参阅 Ned Batchelder 的"实用的 Unicode,或者,我该如何止痛?"(36 分钟来自 PyCon US 2012 的视频).他涵盖了 Python 2 和 3.

示例输出

使用 child.itertext() 方法,并且不对字符串进行编码,我从 topStories() 得到了一个看起来合理的字典列表:

<预><代码>[...,{'description': 'Ayushmann Khurrana 说他的五年宝莱坞之旅已经''一直是一次有趣的旅程";增加成功是一个糟糕的老师,而 ''失败是你的朋友、哲学家和向导".','guid':'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html','链接':'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html','媒体':'http://www.hindustantimes.com/rf/image_size_630x354/HT/p2/2017/06/26/Pictures/actor-ayushman-khurana_24f064ae-5a5d-11e7-9d38-38c470df0',JPG0'pubDate': '2017 年 6 月 26 日星期一 10:50:26 GMT','title':我是一个铁杆现实主义者,这就是为什么我&thinsp;感受我的旅程"'一直很开心:Ayushmann...'},]

I'm trying to make a desktop notifier, and for that I'm scraping news from a site. When I run the program, I get the following error.

news[child.tag] = child.encode('utf8')
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'

How do I resolve it? I'm completely new to this. I tried searching for solutions, but none of them worked for me.

Here is my code:

import requests
import xml.etree.ElementTree as ET


# url of news rss feed
RSS_FEED_URL = "http://www.hindustantimes.com/rss/topnews/rssfeed.xml"


def loadRSS():
    '''
    utility function to load RSS feed
    '''
    # create HTTP request response object
    resp = requests.get(RSS_FEED_URL)
    # return response content
    return resp.content


def parseXML(rss):
    '''
    utility function to parse XML format rss feed
    '''
    # create element tree root object
    root = ET.fromstring(rss)
    # create empty list for news items
    newsitems = []
    # iterate news items
    for item in root.findall('./channel/item'):
        news = {}
        # iterate child elements of item
        for child in item:
            # special checking for namespace object content:media
            if child.tag == '{http://search.yahoo.com/mrss/}content':
                news['media'] = child.attrib['url']
            else:
                news[child.tag] = child.encode('utf8')
        newsitems.append(news)
    # return news items list
    return newsitems


def topStories():
    '''
    main function to generate and return news items
    '''
    # load rss feed
    rss = loadRSS()
    # parse XML
    newsitems = parseXML(rss)
    return newsitems

解决方案

You're trying to convert a str to bytes, and then store those bytes in a dictionary. The problem is that the object you're doing this to is an xml.etree.ElementTree.Element, not a str.

You probably meant to get the text from within or around that element, and then encode() that. The docs suggests using the itertext() method:

''.join(child.itertext())

This will evaluate to a str, which you can then encode().

Note that the text and tail attributes might not contain text (emphasis added):

Their values are usually strings but may be any application-specific object.

If you want to use those attributes, you'll have to handle None or non-string values:

head = '' if child.text is None else str(child.text)
tail = '' if child.text is None else str(child.text)
# Do something with head and tail...

Even this is not really enough. If text or tail contain bytes objects of some unexpected (or plain wrong) encoding, this will raise a UnicodeEncodeError.

Strings versus Bytes

I suggest leaving the text as a str, and not encoding it at all. Encoding text to a bytes object is intended as the last step before writing it to a binary file, a network socket, or some other hardware.

For more on the difference between bytes and characters, see Ned Batchelder's "Pragmatic Unicode, or, How Do I Stop the Pain?" (36 minute video from PyCon US 2012). He covers both Python 2 and 3.

Example Output

Using the child.itertext() method, and not encoding the strings, I got a reasonable-looking list-of-dictionaries from topStories():

[
  ...,
  {'description': 'Ayushmann Khurrana says his five-year Bollywood journey has '
                  'been "a fun ride"; adds success is a lousy teacher while '
                  'failure is "your friend, philosopher and guide".',
    'guid': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
    'link': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
    'media': 'http://www.hindustantimes.com/rf/image_size_630x354/HT/p2/2017/06/26/Pictures/actor-ayushman-khurana_24f064ae-5a5d-11e7-9d38-39c470df081e.JPG',
    'pubDate': 'Mon, 26 Jun 2017 10:50:26 GMT ',
    'title': "I am a hardcore realist, and that's why I&thinsp;feel my journey "
             'has been a joyride: Ayushmann...'},
]

这篇关于AttributeError: 'xml.etree.ElementTree.Element' 对象没有属性 'encode'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆