使用 Python 3.x 基本获取 URL 的 HTML 正文 [英] Basic fetching of a URL's HTML body with Python 3.x

查看：33 发布时间：2021/9/14 20:39:53 python url urllib2

本文介绍了使用 Python 3.x 基本获取 URL 的 HTML 正文的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是 Python 新手.我对 Python 2.x 中的旧 urllib 和 urllib2 与 Python 3 中的新 urllib 之间的差异感到有些困惑，除此之外，我不确定何时需要在将数据发送到 urlopen 之前对其进行编码.

I'm a Python newbie. I have been a little confused by the differences between the old urllib and urllib2 in Python 2.x and the new urllib in Python 3, and among other things I'm not sure when data needs to be encoded before being sent to urlopen.

我一直在尝试使用 POST 获取 url 的 html 正文，以便我可以发送参数.该网页显示某个国家/地区在特定日期的特定小时内的日照数据.我试过没有编码/解码，打印输出是一串以 b 开头的字节.然后我尝试的代码是

I have been trying to fetch the html body of a url, using a POST so that I can send parameters. The webpage displays sunshine data for a country over a particular hour of a given day. I have tried without encoding/decoding and the printout is a string of bytes with b at the beginning. The code I then tried was

import urllib.request, urllib.parse, urllib.error

def scrape(someurl):

    try:

        values = {'LANG': 'en',
                  'DATE' : '1303160400',
                  'CONT' : 'euro',
                  'LAND' : 'UK',
                  'KEY' : 'UK',
                  'SORT': '2',
                  'INT' : '06',
                  'TYPE' : 'sonnestd',
                  'ART' : 'karte',
                  'RUBRIK' : 'akt',
                  'R': '310',
                  'CEL': 'C'}

        data = urllib.parse.urlencode(values)
        data = data.encode("utf-8")
        response = urllib.request.urlopen(someurl, data)
        html = response.read().decode("utf-8")
        print(html)

    except urllib.error.HTTPError as e:
        print(e.code)
        print(e.read())

myscrape = scrape("http://www.weatheronline.co.uk/weather/maps/current")

错误是

Traceback (most recent call last):
  File "/Users/Me/Desktop/weather.py", line 57, in <module>
    myscrape = scrape("http://www.weatheronline.co.uk/weather/maps/current")
  File "/Users/Me/Desktop/weather.py", line 37, in scrape
    html = response.read().decode("utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 10: invalid start byte

如果没有编码/解码，无论如何我都会得到一个可疑的短字节字符串，所以我想知道请求是否以其他方式失败

Without encoding/decoding I get a suspiciously short string of bytes anyway, so I wonder whether the request is failing in some other way

b'GIF89a\x01\x00\x01\x00\x80\x00\x00\x00\x00\x00\x00\x00\x00!\xf9\x04\x01\x00\x00\x00\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x02D\x01\x00;'

使用 Python 3.x 基本获取 URL 的 HTML 正文 [英] Basic fetching of a URL's HTML body with Python 3.x

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 Python 3.x 基本获取 URL 的 HTML 正文 [英] Basic fetching of a URL&#39;s HTML body with Python 3.x

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

使用 Python 3.x 基本获取 URL 的 HTML 正文 [英] Basic fetching of a URL's HTML body with Python 3.x

登录关闭